Legal Battle Between Google and SerpApi Heats Up
The fight over web scraping and AI training data is getting more intense. In December, Google took action against SerpApi, a company that offers an API allowing users to scrape search results in a way that mimics human searches. Google claimed SerpApi was bypassing security measures to access its search data, which is often used to train AI models without site owner permission. Now, SerpApi is pushing back. It recently filed a motion in a California court to dismiss Google’s lawsuit, arguing that Google is trying to misuse the Digital Millennium Copyright Act to stop others from scraping the web at large.
SerpApi’s Defense and Legal Uncertainty
SerpApi’s spokesperson said, “Google thinks it owns the internet,” suggesting that the lawsuit is a way for Google to control data access. The company argues that no one owns the internet, and the law supports a free flow of information. However, legal experts say the situation isn’t clear-cut. While simple facts can’t be copyrighted, compilations like encyclopedias or phone books might have some copyright protection, especially in how they organize information. This raises questions about whether Google’s search results and summaries are protected works or just raw facts.
IP lawyer Kirk Sigmon explained that if Google’s search results are considered copyrighted, then SerpApi’s scraping efforts could face a tougher legal fight. The core issue revolves around whether the generated search snippets and summaries are protected by copyright law or simply factual data. This open legal question means courts will have to decide if scraping such information infringes on copyright or if it’s fair use, especially since search engines and scraping tools have been evolving for years.
Industry Shift and Changing Tactics
Some industry insiders see the lawsuit as outdated. Martin Jeffrey, who runs an AI search consultancy, said that web crawling and scraping have changed a lot recently. He pointed out that there’s been a surge in search traffic from China routed through Singapore to hide its origin. AppleBot, Apple’s search bot, has also ramped up its activity. Jeffrey believes that the days when SerpApi had a dominant role are over, as the industry has moved on.
He also warned that increased scraping can reveal sensitive or hidden information on poorly maintained websites, especially those built on WordPress or with outdated security. This could lead to businesses unintentionally exposing proprietary data or internal messages that AI models might use. Meanwhile, Google seems to be leading the mass scraping effort, but other AI companies are changing their approaches. Companies like OpenAI and Anthropic used to rely heavily on scraping data from the web, but they’re now reducing their dependence. ChatGPT, for example, still scrapes but to a lesser extent, and Anthropic has scaled back on scraping as well.
Overall, the landscape of data collection for AI training is shifting. The legal battles are just one part of a broader industry evolution, where companies are exploring new ways to gather information while navigating complex laws and ethical considerations. The outcome of this case could influence how AI models are trained and how web data is accessed in the future.















What do you think?
It is nice to know your opinion. Leave a comment.