Reddit Battles Big Tech’s Data Brokers. The Lawsuit Against Perplexity, SerpAPI, and More Web Data Scraping Companies.

“Symbolic image depicting Reddit’s legal confrontation with AI and data-scraping firms, highlighting the clash between human conversation data and Big Tech’s AI ambitions.”

Reddit Sets Legal Trap as Data-Scraping War Intensifies

Post Summary

  • Reddit has filed a lawsuit against AI search engine Perplexity and data brokers SerpApi, Oxylabs, and AWMProxy, accusing them of an “industrial-scale” scheme to illegally scrape user content from Google’s search results.
  • The lawsuit, filed on October 22, 2025, alleges the companies bypassed technical safeguards to access Reddit’s valuable conversational data without permission or payment.
  • Reddit claims it set a trap—a “marked bill” post visible only to Google’s crawler—which later appeared in Perplexity’s search results, proving the data was scraped from Google.
  • The legal battle highlights a growing conflict between content platforms and AI developers over who controls and profits from the vast archives of human conversation online.

Reddit, the sprawling online forum, has taken the gloves off in its fight against data scraping. The company is accusing Perplexity AI, SerpApi, Oxylabs, and AWMProxy of running a massive operation to siphon millions of user posts. According to a lawsuit filed October 22, 2025, in Manhattan federal court, the companies allegedly worked together to bypass Reddit’s protections by scraping its content directly from Google’s search results.

This lawsuit is Reddit’s second major legal move against an AI firm in just four months, following a similar action against Anthropic in June. It throws a spotlight on the widening chasm between social platforms that host user-generated content and the AI companies that need that data to train their models. While Reddit has struck lucrative licensing deals with giants like OpenAI and Google, it argues that others, like Perplexity, are essentially freeloading off its most precious resource—authentic human conversation.

The Lawsuit’s Central Allegation: Scraping at “Industrial Scale”

At the heart of Reddit’s complaint is the claim that the defendants “devised a scheme” to get around its defenses. Instead of hitting Reddit’s site directly, they allegedly masked their identities and scraped Reddit content from Google’s search result pages, bypassing security measures from both companies.

Ben Lee, Reddit’s Chief Legal Officer, didn’t mince words. “Scrapers bypass technological protections to steal data, then sell it to clients hungry for training material,” he said in a statement. “Reddit is a prime target because it’s one of the largest and most dynamic collections of human conversation ever created.”

In the legal filing, Reddit’s lawyers painted a vivid picture, comparing the defendants to “would-be bank robbers” who, finding the vault impenetrable, decide to hijack the armored truck instead. In this analogy, Google’s search results are the armored truck, filled with Reddit’s valuable data.

Reddit alleges an industrial-scale scheme to scrape its content via Google search results.

The Marked Bill: Reddit’s High-Tech Sting Operation

To prove its case, Reddit laid a clever trap. According to the lawsuit, the company created a test post that was configured to be visible only to Google’s web crawler and was inaccessible anywhere else online. It was, in effect, a digital “marked bill.”

The bait worked. Within hours, queries made to Perplexity’s AI search engine started returning information from the hidden post. Reddit argues this was the smoking gun, as the only way Perplexity could have found the content was by scraping it from Google’s search index. The lawsuit puts it bluntly: “The only way that Perplexity could have obtained that Reddit content… is if it and/or its Co-Defendants scraped Google SERPs for that Reddit content.”

The fallout from a cease-and-desist letter sent to Perplexity was also telling. After receiving the warning, citations to Reddit in Perplexity’s results suddenly shot up forty-fold, a spike Reddit claims is further evidence of the scraping operation. You can read more about the sting in this detailed breakdown from Business Insider.

Recommended Tech

With stories like this, it’s natural to worry about who is accessing your online data. For users concerned about their digital footprint and privacy, The TechBull recommends an all-in-one protection service. Aura helps you monitor who has your data, secures your online accounts, and protects you from identity theft, giving you peace of mind in an age where data scraping is rampant.

How Big Tech’s Data Brokers Operate—And Who’s Involved

The lawsuit names a handful of companies that act as middlemen in the data economy:

  • Perplexity AI: A San Francisco-based AI search engine and the alleged end-user of the scraped data. The company has been developing tools like its Comet Agentic Browser, which relies on vast amounts of information to provide answers.
  • SerpApi: A Texas startup that sells structured data from search engine results and openly lists Perplexity as one of its clients.
  • Oxylabs UAB: A Lithuanian data-scraping firm. The company called the lawsuit “shocking and disappointing,” arguing that public data should not be treated as proprietary.
  • AWMProxy: A web domain, formerly a Russian botnet, also implicated in the scheme.

Perplexity’s Public Defense

Perplexity has pushed back against the allegations. In a post on Reddit, the company stated, “We summarize Reddit discussions, and we cite Reddit threads in answers, just like people share links to posts here all the time. Perplexity doesn’t train foundation models!” A spokesperson, Jesse Dwyer, added, “We will not tolerate threats against openness and the public interest.”

However, Reddit’s lawsuit challenges this defense with its technical evidence, suggesting Perplexity actively sought scraped data after being blocked from accessing Reddit directly. This back-and-forth is covered well by Search Engine Journal.

The lawsuit pits Reddit against AI search engine Perplexity over data rights.

Legal and Industry Stakes

This fight isn’t just about Reddit. It’s part of a larger trend where content creators are pushing back against AI companies. Publishers like The New York Times and Getty Images are already suing for copyright infringement. Social networks such as Meta, X, and LinkedIn have also taken legal action to stop scrapers. The case touches on broader issues, including emerging AI safety laws that regulate how data is used.

The core question is: who owns public conversation? Denas Grybauskas, Oxylabs’ Chief Governance Officer, framed it this way: “No company should claim ownership of public data that does not belong to them. It is possible that it is just an attempt to sell the same public data at an inflated price.”

Get the latest tech updates and insights directly in your inbox.

The TechBull CRM Fields

What Reddit Wants—And What Comes Next

Reddit is seeking financial damages, a permanent injunction to stop the scraping, and a ban on the defendants using or selling any Reddit data they’ve already collected. In response, both SerpApi and Oxylabs have stated they will “vigorously defend” themselves in court.

The outcome of this case could set a huge precedent for how user-generated content is monetized in the age of AI. It pits the idea of an “open internet” against the commercial reality that mass human discourse has become an incredibly valuable commodity.

As AI’s appetite for data continues to grow, legal clashes like this one will ultimately decide who gets to profit from our collective online conversations—and who gets to build the future of artificial intelligence.

Related posts

OpenAI’s ChatGPT Atlas Will Change Everything About Work, and, Sadly, Your Web Privacy.

“I Was Trapped Upright in Bed for Hours.” Are We Trusting Tech Too Much for Our Own Safety?

How Amazon’s Cloud Infrastructure Failure Exposed the Dangerous Reality of Big Tech Dependence

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Read More