Products · OCR by Consensus

Five engines read it. One answer survives the vote.

Document extraction you can trust, because no single engine decides. Multiple independent OCR and language-model engines read the same document and reconcile every field by weighted vote, with a confidence tier on each value and a human review queue on anything they disagree about.

See it live →Talk to us

Engines vote

Per-field

Confidence tier

Audit

Every vote logged

Live

ocr.roderickc.com

Why one engine is not enough

A single OCR engine fails quietly. That is the dangerous part.

Any single OCR engine quietly corrupts characters, ordinals and table cells, and it does so without raising a flag. On a legal or regulatory document one wrong digit is a wrong citation, and a wrong citation read with full confidence is worse than no answer at all. Consensus turns disagreement into a signal instead of a silent error. When five independent readers split on a field, that split is exactly the place a person should look, and the service tells you so.

The five engines

Paid vision models and self-hosted engines, all reading the same pages.

Vision

Azure Document Intelligence

A purpose-built document model that reads layout, tables and key-value structure as well as raw characters.

Vision

Claude

A frontier language model that reads each page in context, resolving ambiguous characters from the surrounding words.

Vision

Grok

A second independent frontier model, so a single model's habit of misreading a glyph does not become the answer.

Self-hosted

PaddleOCR

An open detection and recognition engine running on our own infrastructure, with no per-page dependency on a paid vision API.

Self-hosted

Tesseract

The long-proven open OCR engine, a stable baseline vote that grounds the more aggressive readers.

How consensus works

From five readings to one trusted field.

Each engine reads independently

All five engines receive the same pages and extract the same document with no knowledge of each other. Independence is what makes the vote meaningful.

Values are normalized

Before any comparison, each engine's output is normalized so that whitespace, casing and formatting differences do not masquerade as real disagreement.

Engines vote per field

Reconciliation happens field by field, not page by page. Each field is decided by a weighted vote across the five readers, so one weak engine cannot drag down a value the others agree on.

The result gets a confidence tier

Every field is tagged with a tier that runs from unanimous, where all five engines agree, down to split, where the readers diverge. The tier travels with the data.

Low-agreement fields route to review

Anything below the agreement threshold lands in a human review queue with every engine's raw output shown side by side, so a person resolves it with full evidence instead of a guess.

Two modes

Extract structured fields, or trust the text verbatim.

Structured field extraction

Point the service at a schema and it returns typed fields, each one voted across the five engines and each one carrying its own confidence tier. Built for forms, invoices, permits and records where you know the shape you need.

Verbatim full-text consensus

For documents like statutes, where the exact wording and the article ordinals must be trustworthy, the engines reconcile the full text itself. The output is the verbatim document with disagreements surfaced rather than silently averaged away.

It is live

This is not a prototype. It is running in production.

OCR by Consensus is running in production today at ocr.roderickc.com. It is already used inside RCI to unblock PDF-only data sources, turning documents that no system could query into structured, confidence-scored data. The same service is available to put to work on your documents.

Stop trusting one reader. Trust the vote.

See OCR by Consensus running live, then bring us the documents your current tools read wrong.

See it live Talk to us