How we track every AI case — and how we stay accurate.
Accuracy is the only moat in legal data. Here is exactly how our database gets built, verified, and corrected.
Primary data sources
Every case record in our database begins with a primary source — a docket entry, court opinion, or filed pleading. We do not paraphrase holdings. We do not cite press reports as ground truth. The pipeline is built around seven layers:
- CourtListener Docket & Search Alerts — for every known AI case, we subscribe to docket alerts so new filings fire a webhook within minutes. Standing search queries catch net-new filings matching AI keywords.
- PACER RSS feeds — 24-hour lookback on 7 priority federal courts (SDNY, NDCA, CDCA, DDC, EDTX, WDTX, DDE) polled every 15 minutes as redundancy.
- Apify scrapers for state courts (CA, NY, DE, TX, FL, IL, MA, WA) and structured law-firm trackers (Taylor Wessing, Bird & Bird, Orrick) for international coverage.
- Google Alerts + Feedly + X — 30+ standing queries and a curated journalist/firm list pipe items into the same pending-review queue.
- LLM triage — every incoming item is classified and drafted by an LLM. LLM output is always status = draft. Never auto-published.
- Editorial review — a human editor verifies every field against the primary docket link before a record changes status to
published. - Publish pipeline — on publish, affected pages rebuild, sitemaps regenerate, and alerts fire to subscribers tracking the case.
Accuracy targets
Different court systems have different data quality realities. We are transparent about this:
| Layer | Target accuracy | Source |
|---|---|---|
| Federal (US District + Circuit + SCOTUS) | 95–99% | CourtListener/PACER |
| State courts (CA, NY, DE, TX) | 80–85% | Apify + state portals |
| International (UK, DE, FR) | 70–85% | Law firm trackers + press |
| Case status/phase | 90%+ | Docket entries |
| Settlement amounts | 60–75% | Labeled "claimed" vs "awarded" |
Our accuracy guardrails
- No LLM auto-publish, ever. Every word on the site has human review before publication.
- Weekly re-verification cron pings every active case's docket and refreshes
last_verified_at. Staleness is fatal on legal content. - Rulings quoted verbatim with page citations. We do not summarize holdings in our own words.
- "Claimed" vs "awarded" labels are always explicit on damages figures.
- Current docket number always linked; refiled or consolidated cases preserve history in the timeline.
- Visible correction policy. Every case page has a "Report an error" button. We acknowledge corrections within 24 hours and log them publicly.
- Weekly sample audit. We randomly sample 10 cases, verify against primary sources, and log the error rate below.
Public audit log
Rolling 8-week error rate across 10-case random samples. Lower is better.
| Week | Sample size | Errors found | Error rate | Status |
|---|---|---|---|---|
| Apr 18, 2026 | 10 | 0 | 0% | Clean |
| Apr 11, 2026 | 10 | 1 | 10% | Corrected |
| Apr 04, 2026 | 10 | 0 | 0% | Clean |
| Mar 28, 2026 | 10 | 0 | 0% | Clean |
| Mar 21, 2026 | 10 | 1 | 10% | Corrected |
Corrections & contact
Spotted an error? Email corrections@ailawsuittracker.com or use the "Report an error" button on any case page. Every correction is logged with a public diff and re-verified by an editor different from the original.
Editorial independence
We do not editorialize on active cases. We state facts, link to sources, and let readers draw conclusions. We do not accept payment from parties or firms to influence how a case is covered. Sponsorships in the newsletter are clearly labeled and never determine editorial decisions.