New York lawmakers approve bill limiting AI crawling of news articles

State lawmakers in New York have approved a bill intended to curb the use of automated web crawlers that conceal their identities while collecting content from news organizations.

The measure, A.11292/S.9934A, passed both the State Assembly and Senate and now awaits further action. If enacted, it would prohibit the use of so-called “stealth crawlers” that access websites operated by newspapers, broadcasters and other journalism providers without properly identifying themselves.

Industry groups contend many crawlers disguise their identities, making it difficult for publishers and broadcasters to determine who is accessing their content and how it is being used. Those crawlers then usurp news and information — either partially or entirely — to train large language models (LLMs) associated with a variety of artificial intelligence products, including chat bots that provide answers to questions while depriving news organizations, the source of the information, of revenue.

The legislation, if passed into law, will require crawlers to accurately identify themselves when accessing covered news websites and would establish a private right of action allowing publishers and broadcasters to pursue legal remedies against violators.

“News publishers invest substantial resources of labor, skill, and capital in producing original journalism,” Diane Kennedy, the President of the New York News Publishers Association, said in a statement this week. “The proliferation of stealth crawlers — automated bots that access news sites without identifying themselves or disclosing their purposes — enables technology companies and other actors to access the fruits of that investment without consent or transparency.”

Kennedy said the bill would establish new transparency requirements while giving journalism providers a mechanism to enforce them.

Broadcasters also backed the proposal: David Donovan, the President of the New York State Broadcasters Association, said local TV and radio stations increasingly face large volumes of automated traffic seeking access to news content.

“By protecting broadcast news operations from unauthorized access by Big Tech, the legislation ensures the economic foundations of producing original, local news by broadcast stations throughout the Empire State,” Donovan noted on Monday. “It prohibits using stealth crawlers to extract a broadcaster’s news content without permission or payment.”

Donovan said the legislation requires disclosures when AI systems use crawlers to gather content from broadcasters, too — not just newspapers and other online sources of news.

Publishing groups argue the problem extends beyond content collection. They say large volumes of bot traffic can create technical and financial burdens for news organizations by increasing server loads and infrastructure costs.

Danielle Coffey, President and Chief Executive Officer of the News/Media Alliance, said publishers are seeing growing levels of automated traffic from bots seeking access to news content.

“Right now, news websites are drowning in bot traffic,” Coffey said. “Bad bots are disguising their identities to overload publisher servers and access the quality content on our sites, hurting our ability to serve readers.”

She described the legislation as a “common-sense solution” that would provide greater transparency and accountability while helping publishers protect their content and operations.

Supporters also praised the bill’s sponsors, Assembly Member Steven Otis, who chairs the Assembly Science and Technology Committee, and State Senator Mike Gianaris.

The measure comes as publishers, broadcasters and technology companies continue to debate how AI systems should access, use and compensate creators for online content. News organizations have increasingly sought legal, regulatory and legislative solutions as AI developers expand their use of web-scraped material.

The New York Times and the Chicago Tribune are among some of the largest news organizations to sue the developers of AI chatbots, arguing that their materials were used to train LLMs without payment or other authorization. Last month, CNN sued Perplexity, the developer of an AI-based platform, on allegations that the company illegally obtained and distributed its copyrighted materials.

New York lawmakers approve bill limiting AI crawling of news articles

More Stories

Get free breaking news alerts and twice-weekly digests delivered to your inbox.

We do not share your e-mail address with third parties; you can unsubscribe at any time.

TheDesk.net offers the latest news, analysis and commentary on the business of streaming media, broadcast TV and radio, advertising, measurement, journalism, tech and policy.