Confidence Score

A numerical value (typically 0–100%) indicating how certain an AI model is about a specific output — an extracted field, a classification decision, or a match result. Confidence scores determine when to trust automation fully and when to route a result to a human for review.

What is a Confidence Score?

When an AI model extracts a value, classifies a document, or makes a decision, it does not simply return a result — it returns a result with an associated confidence score. A score of 97% means the model is highly certain. A score of 61% means the model produced a best guess but has real uncertainty. The score reflects the model's internal probability distribution: how strongly the evidence in the input supports this particular output versus other possible outputs.

Confidence scores are the mechanism that makes human-in-the-loop automation practical. Without them, you either trust everything the AI outputs (risky) or review everything manually (pointless). With them, you can set a threshold: auto-accept above 92%, route to review below 92%, escalate to a senior reviewer below 70%.

How Confidence Scores Work in Practice

Different AI tasks produce confidence scores differently. In document extraction, a score might reflect how clearly a field value was identified versus how much it had to be inferred from context. In classification, it reflects how distinctly the input matched one category versus others. In matching (invoice vs. PO), it reflects how many fields aligned versus how many diverged.

  • High confidence (90%+): Auto-process. No human needed.

  • Medium confidence (70–90%): Flag for spot-check. Human reviews but does not need to re-extract from scratch.

  • Low confidence (below 70%): Route to full human review. Provide the extracted value as a suggestion, not a final result.

  • Calibration matters: A model that says 90% and is right 60% of the time is worse than useless — it generates false confidence. Thresholds must be validated against actual accuracy data.

Confidence Scores in Operations

In invoice processing, a confidence score below threshold on the VAT number means a human checks that field before posting to the ERP — not the entire invoice. In goods receipt matching, a low confidence score on a quantity field triggers a warehouse check before the receipt is confirmed. The practical goal is to minimize the volume of human review while ensuring that the cases that do reach a human are the ones that genuinely need judgment. At Lleverage, confidence thresholds are configured per client and per field type — the acceptable confidence for a total amount differs from the acceptable confidence for a line item description.

Turn your manual decisions into intelligent operations

See how we capture your decision intelligence and put it to work inside the systems you already have. Start with one workflow. See results in days.

Turn your manual decisions into intelligent operations

See how we capture your decision intelligence and put it to work inside the systems you already have. Start with one workflow. See results in days.