Why the FTC Says “Public Data” Is Not Free - And Why Your Business Needs to Take This Seriously
By Ramyar Daneshgar
Security Engineer | USC Viterbi School of Engineering
Disclaimer: This article is for educational purposes only and does not constitute legal advice.
1. The FTC’s Signal: Scraping Public Info Can Be Illegal - Not a Loophole
For decades many companies acted under the assumption that if data is publicly visible on the web, it can be freely harvested, aggregated, sold, or used - without legal consequence. That assumption is now dangerously outdated.
Over recent years, the FTC has repeatedly targeted companies that collected or sold public data - particularly when that data includes biometric identifiers, precise location information, or detailed consumer behavior - as unfair or deceptive trade practices under §5 of the FTC Act.
The notable settlement with Everalbum, Inc. is an early example. Everalbum’s “Ever” app promised users that facial-recognition features would remain off unless users opted in. In fact, according to the FTC’s complaint, Everalbum default-enabled facial recognition for many users and retained images indefinitely, even after account deactivation. The company then used those images to train its facial-recognition model. The FTC ordered Everalbum not only to delete the photos and algorithmic models derived from them, but also to obtain express consent before processing biometric data in the future. (Federal Trade Commission)
That enforcement action demonstrates two things: first, common assertions like “the user posted it publicly” or “no personal identifiers” are meaningless if the data includes sensitive personal characteristics like a face; second, the FTC views scraped biometric data as more than just personal data - as a misuse of sensitive, potentially identity-impacting data. (Loeb)
More recently, the FTC has targeted data brokers that collected and sold sensitive location and browsing data. For example, in December 2024 the FTC filed complaints against Mobilewalla, Gravy Analytics (and its subsidiary), and Venntel - alleging that they unlawfully harvested and sold consumers’ location histories tied to medical centers, religious facilities, and other sensitive sites, often without user consent or knowledge. (Federal Trade Commission)
In sum, the FTC’s consistent pattern is clear: if you or your vendor scrape public web data that includes sensitive personal data - biometric markers, location traces, browsing history, sensitive behavioral signals - you can face enforcement. Public visibility does not equal consent.
2. The Legal Foundations: Why Scraping + Sensitive Data = Liability
A. Section 5 of the FTC Act - Unfair or Deceptive Practices
Under the FTC Act, the FTC may act against practices that are “unfair or deceptive.” Using publicly visible data to compile new dossiers, infer sensitive attributes, resell for profiling, or build AI/training databases without proper consumer notice or consent qualifies.
With companies like Everalbum, the FTC treated misrepresenting data collection and retention - particularly regarding biometric data - as deceptive. The company had promised limited use only if users opted in, but activated the features by default. That discrepancy triggered the enforcement action. (Federal Register)
B. Sensitive Data as a Special Risk Category
The FTC recognizes that certain categories of data - biometric data (face geometry, voiceprint), precise geolocation tied to health visits or religious centers, browsing history, and data involving minors - create heightened privacy harm and risk. Its recent complaints against data brokers allege that tracking or selling such data constitutes “unfair practices,” even where individual identifiers like name or SSN are absent. (Federal Trade Commission)
In the Mobilewalla / Gravy Analytics cases, the FTC highlighted how location tags linked to “health clinics, churches, union offices, shelters, prisons, and other sensitive places” raise serious threats - from surveillance, profiling, to potential abuse by third parties. (CyberScoop)
C. The Risk of Re-identification
Even if a dataset is “anonymized” or lacks conventional PII (name, SSN), the FTC recognizes that data combining location, behavioral signals, and device identifiers can often be re-identified. In the FTC’s 2024 proposed complaints against companies like Avast, X-Mode and InMarket, the agency criticized reliance on “anonymization” alone, because modern re-identification techniques often defeat those protections. (Federal Trade Commission)
3. What This Means for Your Business - Not Just the Original Scraper
Many business owners assume that liability ends with the vendor who scraped the data. That assumption is no longer safe.
If your company:
- purchases datasets from data brokers,
- uses AI models trained on publicly scraped data,
- ingests location or behavioral datasets for marketing, risk scoring, or analytics,
- integrates third-party data into your operations,
then you may be directly liable under the same laws the FTC enforces - regardless of whether you performed the scraping.
The logic is this:
- You benefit from data that was collected from public websites.
- The FTC deems that collection unfair (or deceptive) when it involves sensitive data without consent.
- If you continue to use, retain, or resell that data, you expose your company to enforcement risk.
This downstream liability has already been reinforced by FTC statements: companies that “buy, resell, or analyze” such data are responsible for verifying that collection was lawful. (Federal Trade Commission)
4. Enforcement Is Real - Not Theoretical. Companies Already Penalized.
Everalbum - First Major Biometric Data Settlement
In 2021 the FTC finalized a settlement with Everalbum based on its misuse of photos and facial-recognition algorithms built on public and private image data. Everalbum was required to delete all facial recognition models and forced to obtain explicit consumer consent before processing biometric data again. (Federal Trade Commission)
This was the first major action signaling that biometric data - even if publicly visible - cannot be harvested and reused without explicit user consent. The settlement set a meaningful precedent for future AI- and data-broker related enforcement. (Loeb)
Data Broker Crackdowns: Mobilewalla, Gravy Analytics, Venntel
In December 2024, the FTC filed formal complaints accusing Mobilewalla, Gravy Analytics, and Venntel of illegally collecting and selling sensitive location data - data that traced consumers’ visits to churches, medical centers, prisons, and other sensitive sites. The complaint highlights how the companies’ practices could enable surveillance, profiling, and exploitation of vulnerable populations. (CyberScoop)
As part of their orders, the companies are barred from selling or sharing precise location data linked to sensitive places unless required by law enforcement or national security. Companies that bought data from those brokers must now consider whether they have used illicitly collected data and may face similar consequences. (CyberScoop)
Proposed Orders Against Data Firms Selling Browsing, Location or Behavioral Data
In early 2024, the FTC issued proposed orders targeting firms such as Avast, X-Mode, and InMarket, accusing them of collecting, retaining, and selling browsing history, geolocation, and device identifiers without consent. The FTC argued that these practices create “substantial consumer risk” and require robust privacy programs including deletion rights, use restrictions, and audit mechanisms. (Federal Trade Commission)
This shows regulators are not only penalizing facial recognition or location data brokers; they are systematically targeting any firm that treats browsing, location, or device metadata as freely harvestable and tradable.
5. What Business Owners Must Do Immediately - Detailed Action Plan
If your company uses any third-party data, buys datasets, or relies on AI models trained on external data, you need to treat data sourcing as a core compliance issue — not a marketing or growth opportunity.
Step 1: Inventory All Data Inputs and Vendors with Audit Trails
Create a comprehensive inventory that identifies:
- Each external dataset you purchase or ingest.
- All third-party vendors, data brokers, AI providers, and their upstream suppliers (subprocessors).
- The type of data involved: biometric (face, voice), location, browsing history, device identifiers, children’s data, health-related location data, sensitive behavioral data.
- Whether the vendor provided documentation of lawful data sourcing, consent mechanisms, and data provenance.
If any vendor cannot prove the data was collected with valid consent and compliance with FTC’s standards, assume the data is tainted.
Step 2: Add Contractual Protections - Don’t Rely on Good Faith Alone
Revise vendor contracts to include:
- Warranties that all data was collected legally and in compliance with applicable laws.
- Indemnification obligations if a vendor’s data sourcing draws investigation or enforcement.
- Rights to demand data deletion, including derivative data (models, segments, analytics) if the original data is deemed unlawfully collected.
- Audit and transparency requirements, including the vendor’s full subprocessor list and data provenance records.
This turns data sourcing from a “vendor black box” into a “governed risk boundary.”
Step 3: Build a Data Governance & Compliance Program Around Sensitive Data Use
Treat any data classified as “sensitive” (biometric, location, health, children, browsing profiles) as high-risk. For such data:
- Require explicit end-user consent or data subject consent (where applicable).
- Avoid using or storing data merely because it is “public.”
- Apply data minimization: only collect or retain what you need.
- Enact clear deletion policies and retention limits.
- Document all processing, sharing, and model training that uses such data.
Step 4: Scrutinize AI Models - Especially Publicly Trained or Community Datasets
If you license or deploy AI models trained by third parties:
- Require documentation of how and from what data the model was trained.
- Verify whether training data included public-web scraping or sensitive personal data.
- If sourcing is uncertain, conduct a risk review: consider refusing to deploy, or require the vendor to retrain without illicit data.
- Ensure any derived models or analytics cannot be sold or redistributed without re-verifying data source compliance.
Step 5: Prepare for Legal, Regulatory, or Litigation Risk
Incorporate data-sourcing risk into your company’s broader legal and compliance strategy. That includes:
- Board-level awareness: data procurement is a material risk, not just marketing or operations.
- Incident response playbook: include scenarios where data vendors are investigated or forced to delete data, and what that means for you as a downstream user.
- Regulatory readiness: document your due diligence, retention, consent, usage, and deletion policies proactively.
6. Why This Is Not “Speculative Risk” - It’s Live, Escalating, and Very Real
Between the Everalbum biometric settlement, recent FTC enforcement actions against data brokers, and pending orders against browsing-data firms, regulators have made clear that they consider public-data scraping to be a core target.
This is not theoretical. The world is already seeing companies forced to delete facial-recognition models, pay damages, and shut down data resale operations. If that can happen to a photo-storage app or a mobile-data broker, it can happen to any business using their data.
Because the threshold for violation is not “did we obtain names or SSNs” but “did we collect sensitive data without valid consent or notice,” the same violations can apply to datasets that many businesses believe are “safe” - browsing data, location logs, device metadata, or publicly posted social media content.
If you continue to treat “publicly visible” as equivalent to “publicly usable,” you are operating under a false assumption that regulators no longer accept.
7. The Bottom Line - Without Data Provenance and Compliance, Your AI and Data Strategy Is a Sword Over Your Own Head
For business owners, the message could not be clearer:
- Public web data is no longer a free-for-all resource.
- Sensitive data (biometric, location, device, browsing, children, health context) comes with legal obligations - regardless of its public availability.
- Using or buying data or models built on scraping without proper provenance can expose you to FTC enforcement, deletion orders, civil liability, reputational damage, and regulatory scrutiny.
At this point, data sourcing should be treated like a core risk control, no different from vendor security, supply-chain risk, or financial audits.