Google Can Train Search AI on Content Without Publisher Consent

In a declaration in the business’s ongoing antitrust case against the US Justice Department, a Google DeepMind executive reportedly disclosed that Google Search products can use content from publishers even if they have chosen not to participate in artificial intelligence (AI) training. The executive emphasised that the content is not utilized in DeepMind’s AI models, explaining that content for search is controlled by a separate mechanism that uses the robots.txt web norms.

Eli Collins, Google DeepMind’s Vice President of Product, verified in a Bloomberg article that the guidelines for respecting publishers’ choices to forego AI training differ for DeepMind’s AI models and the company’s Search products.

According to a document allegedly presented by Diana Aguilar, the attorney for the Department of Justice in the antitrust lawsuit, 80 billion of the 160 billion tokens used to train Google’s AI models came from content that publishers had chosen not to utilize for AI training. Collins allegedly retorted that after a publisher opts out of AI training, DeepMind’s models do not utilize the content.

When asked if “the search org has the ability to train on the data that publishers had opted out of training,” DeepMind VP Eli Collins responded, “Correct — for use in search.” However, Bloomberg notes that this opt-out is limited to DeepMind models.

However, Collins said that this was “correct” as long as the use case remained within Search when Aguilar allegedly asked if the Gemini AI model could use the same content if it was placed inside the Search product. Notably, this would include the Gemini models that drive Google’s freshly introduced AI Mode and AI Overviews.

This indicates that conventional opt-out techniques are insufficient to prevent Google from utilizing publisher content. In June 2023, the tech giant revised its privacy policy to include the statement that it will train its language models using all publicly accessible Internet data. Any website without a paywall or required sign-up pages that limit public access is considered freely available Internet data in this context.

The guidelines for Search-based AI tools are different, according to a Google representative who later told Bloomberg, since publishers can “only decline having their data used in Search AI if they opt out of being indexed for search.” This can be accomplished by publishers by turning off the robots.txt web standard, which gives Google’s crawler bots access to the content so they can index it in search results.

This would, however, also guarantee that these webpages would not appear when a user searches for a topic using Google. Publishers are essentially forced to agree to the corporation using the data to train its AI models.

The goal of the ongoing antitrust litigation is to establish Google’s dominance in the search and artificial intelligence markets. The Department of Justice is urging US District Judge Amit Mehta, who is overseeing the case, to compel the internet giant to offer for sale Google Chrome and to disclose the data it uses to produce search results. But for the company’s AI products, no such solution has been proposed.

Tags: ai search deepmind google journalist

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Google Can Train Search AI on Content Without Publisher Consent

Akinola Ajibola

BROWSE BY CATEGORIES

Receive top tech news directly in your inbox

Freshly Squeezed

Browse Archives

Quick Links

Google Can Train Search AI on Content Without Publisher Consent

Related Reading

Akinola Ajibola

BROWSE BY CATEGORIES

Receive top tech news directly in your inbox

Freshly Squeezed

Browse Archives

Quick Links

Discover more from TechBooky