How we track every AI case — and how we stay accurate.

Accuracy is the only moat in legal data. Here is exactly how our database gets built, verified, and corrected.

Primary data sources

Every case record in our database begins with a primary source — a docket entry, court opinion, or filed pleading. We do not paraphrase holdings. We do not cite press reports as ground truth. The pipeline is built around seven layers:

  1. CourtListener Docket & Search Alerts — for every known AI case, we subscribe to docket alerts so new filings fire a webhook within minutes. Standing search queries catch net-new filings matching AI keywords.
  2. PACER RSS feeds — 24-hour lookback on 7 priority federal courts (SDNY, NDCA, CDCA, DDC, EDTX, WDTX, DDE) polled every 15 minutes as redundancy.
  3. Apify scrapers for state courts (CA, NY, DE, TX, FL, IL, MA, WA) and structured law-firm trackers (Taylor Wessing, Bird & Bird, Orrick) for international coverage.
  4. Google Alerts + Feedly + X — 30+ standing queries and a curated journalist/firm list pipe items into the same pending-review queue.
  5. LLM triage — every incoming item is classified and drafted by an LLM. LLM output is always status = draft. Never auto-published.
  6. Editorial review — a human editor verifies every field against the primary docket link before a record changes status to published.
  7. Publish pipeline — on publish, affected pages rebuild, sitemaps regenerate, and alerts fire to subscribers tracking the case.

Accuracy targets

Different court systems have different data quality realities. We are transparent about this:

LayerTarget accuracySource
Federal (US District + Circuit + SCOTUS)95–99%CourtListener/PACER
State courts (CA, NY, DE, TX)80–85%Apify + state portals
International (UK, DE, FR)70–85%Law firm trackers + press
Case status/phase90%+Docket entries
Settlement amounts60–75%Labeled "claimed" vs "awarded"

Our accuracy guardrails

  • No LLM auto-publish, ever. Every word on the site has human review before publication.
  • Weekly re-verification cron pings every active case's docket and refreshes last_verified_at. Staleness is fatal on legal content.
  • Rulings quoted verbatim with page citations. We do not summarize holdings in our own words.
  • "Claimed" vs "awarded" labels are always explicit on damages figures.
  • Current docket number always linked; refiled or consolidated cases preserve history in the timeline.
  • Visible correction policy. Every case page has a "Report an error" button. We acknowledge corrections within 24 hours and log them publicly.
  • Weekly sample audit. We randomly sample 10 cases, verify against primary sources, and log the error rate below.

Public audit log

Rolling 8-week error rate across 10-case random samples. Lower is better.

WeekSample sizeErrors foundError rateStatus
Apr 18, 20261000%Clean
Apr 11, 202610110%Corrected
Apr 04, 20261000%Clean
Mar 28, 20261000%Clean
Mar 21, 202610110%Corrected

Corrections & contact

Spotted an error? Email corrections@ailawsuittracker.com or use the "Report an error" button on any case page. Every correction is logged with a public diff and re-verified by an editor different from the original.

Editorial independence

We do not editorialize on active cases. We state facts, link to sources, and let readers draw conclusions. We do not accept payment from parties or firms to influence how a case is covered. Sponsorships in the newsletter are clearly labeled and never determine editorial decisions.