Field-level validation before any extracted data reaches your system: format validation (invoice numbers matching your expected pattern, dates in valid ranges, amounts within plausible bounds), required field presence (any invoice missing a PO number routes to review rather than being processed with a blank field), and cross-field consistency (line item totals summing to the subtotal, tax calculated correctly against the applicable rate). Business rule validation against your reference data: vendor codes and supplier IDs looked up against your approved vendor list, PO numbers validated against open purchase orders in your ERP, and currency codes checked against your accepted currencies. Discrepancies above a configurable tolerance threshold (e.g., ±1% for rounding on international invoices) surface for review rather than silently creating mismatches between the extracted values and expected amounts.
Regex patterns enforce the structural format of each field type: invoice numbers typically follow a vendor-specific pattern (e.g., INV-[0-9]6 or [A-Z]2[0-9]8), dates are normalised from ambiguous regional formats (01/02/2025 parsed correctly as DD/MM or MM/DD based on vendor locale), and amounts are cleaned of currency symbols and thousand-separator commas before numeric validation. Confidence score thresholds are set per field based on the cost of a missed error: a misread invoice total routes to human review at confidence below 0.92, while a secondary address field might be accepted at 0.75. Cross-field validation catches the extraction errors that confidence scores miss: a line item unit price of $0.05 against a grand total of $5,000 signals either a quantity error or extraction failure and routes to review regardless of individual field confidence. Documents that fail validation are never silently discarded, they enter the human review queue with the specific validation failure reason displayed alongside the document so reviewers can focus on the problem field rather than re-reading the entire document.