Shipping document extraction uses a combination of OCR and LLM-based extraction to parse structured data from the unstructured layouts and varying formats of logistics documents. Document types handled: bills of lading (BOL) -- shipper name and address, consignee name and address, notify party, carrier and SCAC code, pro number, commodity description, weight, piece count, freight class, special instructions, and hazmat information; proof of delivery (POD) -- delivery timestamp, recipient name and signature, exception notes, and condition notes; customs entry documents (CBP 3461/7501 for US imports, UK Customs Declaration for UK, EU Intrastat) -- HS codes, declared values, country of origin, tariff duty classification; commercial invoices -- invoice number, line items with HS codes and values, Incoterms, and total invoice value; and carrier freight invoices (for three-way matching against TMS rate confirmations). Document input formats: PDF (both digitally generated and scanned); TIFF and JPEG images from driver mobile apps; email attachments from carrier notifications. Extraction pipeline: document classification (identify document type before applying the type-specific extraction model); OCR via Azure Document Intelligence or AWS Textract (handles skewed scans and handwritten annotations); LLM extraction step using GPT-4o or Claude with structured output mode to interpret ambiguous fields and normalise addresses, dates, and weights to your required format. Extraction confidence scores per field: low-confidence extractions are queued for human review with the relevant document region highlighted, rather than being silently inserted into your TMS with an error. Output delivered via API to your TMS (Oracle TMS, SAP TM, MercuryGate) or ERP.