How the Citation Verifier works.
The Citation Verifier is a multi-pass pipeline that reads a brief, motion, or memorandum and produces two deliverables: a marked-up source file (.docx with tracked changes) and a sign-off-ready form-check report. Every finding is anchored to a specific Bluebook 22e rule or table number so a drafting attorney can audit the call against the Bluebook itself.
The protocol it applies is published as a Claude Skill — Citation Verification Protocol v1.1 (MIT-licensed) — and is the authority for every classification, rule mapping, and severity decision. The agent does not invent rules; it applies the protocol.
Pipeline
Citation extraction
Pulls plain text from the upload — body and footnotes from .docx via mammoth + word/footnotes.xml. A small
set of regex patterns flag anything that looks like a citation: case cites by reporter abbreviation,
statutes (U.S.C. + state codes), regulations (C.F.R.), constitutional cites, and the short forms
(Id., supra, short-form case). High recall is the goal — Pass 2 is the actual
classifier.
Each candidate carries character offsets and footnote number so downstream markup lands at the right spot.
Classification + rule mapping
Each candidate is sent to Claude Sonnet 4.6 with the protocol skill cached as the system prompt — so every
additional candidate after the first reads the protocol at a 90% discount via Anthropic's prompt cache.
Sonnet decides the citation type, parses out the components (case name, volume, reporter, pin-cite, etc.),
and emits the governing rule and table pin-cites (e.g. BB R. 10; T1.1; T6; T7).
Candidates are batched 15 at a time per call to keep cache hit rate near 100% across a long document.
Existence check
Every classified case citation is queried against CourtListener's public search API. If the top result's
reporter, volume, and first page exactly match the cited components, the citation is marked
existence_verified. If no result is found or no result matches, the citation gets a
review-severity flag pointing at the CourtListener search URL so the drafting attorney can verify
manually before filing.
The strongest language we use, ever: "could not be located in CourtListener — please verify before filing." We never claim a citation is fake, hallucinated, or non-existent — only that we couldn't locate it. CourtListener does not include every state opinion, recent unpublished disposition, or sealed matter.
Bluebook table validators
Pure code, no LLM. The five validators run against the parsed citation components:
- T6 — case-name word abbreviations (must abbreviate "Corporation" → "Corp.", "Education" → "Educ.", etc.; rule 10.2.1(c) skips the first word of each party).
- Reporter currency — reporter year coverage (catches "100 F.3d 200 (2022)" — F.3d ended in 2021).
- Court parenthetical — flags missing or malformed designators ("2nd Cir." → "2d Cir.", "DC Cir." → "D.C. Cir.").
- T10 — geographical abbreviations ("Calif." → "Cal.", "Penn." → "Pa.").
- T13 — periodical abbreviations (long-form law-review names → canonical Bluebook abbrev).
Each validator emits zero or more flags with severity, rule pin-cite, table pin-cite, and a ready-to-paste suggested fix.
Cross-citation judgment
Runs once per document with the entire classified citation list in view (no document body — just the structured metadata). Sonnet checks for things one citation at a time can't catch:
- Signal ordering and authority weight in string cites (R. 1.3 / R. 1.4)
- Short-form propriety: id., supra, case short forms (R. 4 / R. 10.9)
- Id. chain integrity across footnote breaks (R. 4.1)
- Parenthetical placement and ordering (R. 1.5 / R. 10.6)
Output generation
Two artifacts, written to private storage with signed download URLs:
- Form-check report (.docx) — header, summary tier counts, per-citation findings, aggregate findings by category, corrective-action checklist, drafting sign-off block, full per-citation Appendix A, and a rule-map log Appendix B. Mirrors the protocol's report template structure exactly.
- Marked source — your original .docx file with tracked changes and Word comments. Each comment carries the rule pin-cite plus a plain-language explanation, plus the suggested fix when one is available.
What we never claim
- That a citation is fake, hallucinated, fictitious, incorrect, or wrong.
- That this report substitutes for the drafting attorney's own substantive review.
- That CourtListener is a complete index of U.S. case law (it isn't).
Privacy & privilege
- By default, no citation text is persisted to our database — only a SHA-256 hash. Opt in to "retain text" if you want full strings in your run history.
- The uploaded source file is deleted from storage after the pipeline finishes. Only the marked source and the form-check report are kept, in a private bucket scoped to your user.
- Pass 4 sees only metadata — citation type, components, page, footnote number — never the document body.
- Disclaimer acceptances are recorded with hashed IP + UA, never raw values.