Google has announced a new control in its robots.txt indexing file that would let publishers decide whether their content will “help improve Bard and Vertex AI generative APIs, including future generations of models that power those products.” The control is a crawler called Google-Extended, and publishers can add it to the file in their site’s documentation to tell Google not to use it for those two APIs. In its announcement, the company’s vice president of “Trust” Danielle Romain said it’s “heard from web publishers that they want greater choice and control over how their content is used for emerging generative AI use cases.”
Romain added that Google-Extended “is an important step in providing transparency and control that we believe all providers of AI models should make available.” As generative AI chatbots grow in prevalence and become more deeply integrated into search results, the way content is digested by things like Bard and Bing AI has been of concern to publishers.
While those systems may cite their sources, they do aggregate information that originates from different websites and present it to the users within the conversation. This might drastically reduce the amount of traffic going to individual outlets, which would then significantly impact things like ad revenue and entire business models.
Google said that when it comes to training AI models, the opt-outs will apply to the next generation of models for Bard and Vertex AI. Publishers looking to keep their content out of things like Search Generative Experience (SGE) should continue to use the Googlebot user agent and the NOINDEX meta tag in the robots.txt document to do so.
Romain points out that “as AI applications expand, web publishers will face the increasing complexity of managing different uses at scale.” This year has seen an explosion in the development of tools based on generative AI, and with search being such a huge way people discover content, the state of the internet looks set to undergo a huge shift. Google’s addition of this control is not only timely, but indicates it’s thinking about the way its products will impact the web.
Update, September 28 at 5:36pm ET: This article was updated to add more information about how publishers can keep their content out of Google’s search and AI results and training.