AI agent case study

AI Agent for AP Coding & Three-Way Matching Case Study

A $180M B2B services company deployed an AI agent that reads incoming invoices, suggests GL coding with confidence scores, performs three-way matching, and routes exceptions — all under a documented SOX-aware control framework with reviewer sign-off, evidence packs, and PCAOB-aligned governance.

Client profile: Composite case study based on a $180M B2B services company on NetSuite + Bill, ~3,800 invoices / month, ~3,200 active vendors, mix of PO and non-PO spend. PE-backed; SOX 404 readiness ahead of IPO; CFO mandate to deploy AI in finance with proper controls.

See measured results Talk through your situation

Company context Before — what was actually broken What Ledger Summit implemented AI agent governance mechanics Implementation timeline Measured results

Company context

The client is a $180M B2B services company on NetSuite + Bill, processing ~3,800 invoices monthly across ~3,200 vendors. AP team of 6 was 80% transactional; CFO wanted to redirect that capacity to strategic AP work (vendor management, discount capture, working capital optimization). The CFO also wanted to demonstrate AI-in-finance maturity ahead of the IPO; investors expected it.

AI in AP has a control-design problem. Auditors' baseline question: "What stops the agent from miscoding a $50K bill?" Answer requires confidence thresholds, reviewer routing, evidence packs, change management, and PCAOB-aligned governance. Without those, the AI agent is an audit finding waiting to happen.

$180M B2B services
NetSuite GL + Bill (BILL.com)
~3,800 invoices / month
~3,200 active vendors
6-person AP team
PE-backed; pre-IPO
SOX 404 readiness program in flight
CFO mandate: AI in finance with controls

Before — what was actually broken

Manual GL coding by AP team (80% of their time)
Three-way matching done manually with PO lookup
Exception triage entirely human
No discount capture beyond opportunistic
Vendor master cleanup sporadic
Audit fieldwork for AP took 3 days

What Ledger Summit implemented

AI coding agent: pre-trained on 18 months of historical invoice/coding pairs; confidence-scored suggestion per line item
Confidence threshold: ≥85% auto-code with reviewer sample-test; <85% routed to human reviewer
Three-way match agent: PO + receipt + invoice match with tolerance configuration; exception classification (price variance, qty variance, missing receipt, etc.)
Vendor matching: fuzzy match against vendor master with reviewer override on duplicates
Evidence pack per invoice: invoice image, AI suggestion + confidence, reviewer action, GL posting
Confidence threshold review: monthly reviewer reviews 25 random sub-threshold invoices for accuracy
Change management: any rule / threshold / model change requires documented reason + reviewer approval + audit committee notification if material
SOX control framework: per-invoice trail; preparer ≠ approver; segregation of duties; quarterly testing
PCAOB-aligned governance: methodology memo, walkthrough script, design effectiveness testing

Auditor concurrence on design before deployment
Quarterly precision/recall metrics tracked: false-positive (AI codes when human wouldn't) vs. false-negative (AI doesn't code when it could)
Human-in-the-loop for high-risk categories (entertainment, advertising, capitalization decisions)
Continuous training: weekly model update from accepted/rejected suggestions

AI agent governance mechanics

Layer	Control
Scope definition	What the agent is allowed to code; account list whitelist; dollar threshold
Confidence threshold	≥85% auto-code; <85% human review; threshold reviewed quarterly
Sample testing	Monthly review of 25 random AI-coded invoices for accuracy
Reviewer routing	Threshold-based + amount-based + GL-account-based routing rules
Exception queue	Three-way match exceptions classified and routed
Evidence pack	Invoice + AI suggestion + confidence + reviewer action + GL posting per transaction
Change management	Any rule / threshold change documented; reviewer approved; audit committee notified if material
SOX testing	Quarterly walkthrough; design + operating effectiveness testing
PCAOB alignment	Methodology memo; auditor concurrence on design; SOX 404 attestation walkthrough
Continuous improvement	Weekly model retrain from accepted/rejected; quarterly metrics review
Human-in-the-loop	High-risk categories always reviewed; complex modifications routed

Implementation timeline

Weeks 1–2: Discovery: invoice volume analysis, GL coding patterns, vendor master audit
Weeks 3–4: AI agent design: model selection, training data preparation, confidence threshold determination
Weeks 5–6: Pilot deployment: 200 invoices / week shadow-mode alongside human coding
Weeks 7–8: Calibration: precision/recall measurement; threshold tuning; vendor master cleanup
Weeks 9–10: Full deployment with reviewer routing; SOX control framework activation
Weeks 11–12: Hypercare; monthly sample testing process; auditor walkthrough

Measured results

Metric	Before	After	Delta
Invoices auto-coded by AI	0%	~92%	+92 pp
AP team transactional time	80%	25%	−55 pp
Avg invoice cycle	8 days	1.4 days	−83%
Three-way match auto-pass	62%	88%	+26 pp
Discount capture	~$8K / yr	~$320K / yr	+$312K
Audit fieldwork days (AP)	3	1.5	−1.5 days
SOX walkthrough	N/A (manual process)	Clean	—
Quarterly model accuracy	—	Tracked > 95% precision	—

Alternatives considered

Option	Time	Cost	Strengths	Weaknesses
Stampli (AI-native AP)	3 months	$240K–$420K + license	Modern AI	License + replacement of Bill
Tipalti AI	3 months	$240K–$420K + license	Strong globally	Replaces Bill
Bill native AI features	—	$0 incremental	Already in stack	Coverage thinner
Custom build (Anthropic Claude / OpenAI)	5 months	$320K–$520K	Full control	Maintenance + governance
Ledger Summit + Bill + custom AI agent (selected)	12 weeks	$180K–$280K	Right-sized; SOX-clean; preserves Bill	Maintenance ongoing

When this approach fits

$50–500M companies with material invoice volume (1,000+ / month)
Existing AP automation (Bill, Tipalti, etc.) on top of which to add AI
SOX or audit-readiness pressure
PE-backed or pre-IPO with control framework expectations
Willingness to invest in governance + control framework
Auditor open to AI-in-finance walkthroughs

Lessons learned

Confidence threshold is the lever. 85% works for most coding; high-risk categories warrant 95% or human-only.
Sample testing is non-negotiable. 25 invoices / month is the audit-friendly cadence.
Auditor concurrence on design before deployment. Methodology buy-in saves remediation later.
Change management for thresholds and models. Documented reason for any change; audit committee notification if material.
Human-in-the-loop on high-risk. Entertainment, advertising, capitalization, related-party — always reviewed.

Frequently asked questions

How does the AI know how to code GL accounts?

Pre-trained on 18 months of historical invoice/coding pairs from your tenant; learns vendor patterns, line-item descriptions, and historical reviewer corrections.

What if the AI is wrong?

Confidence threshold catches uncertain cases; human reviewer signs off. Below threshold: never auto-coded.

How does this satisfy SOX?

Documented design (scope, threshold, routing); operating effectiveness testing (sample + walkthrough); evidence per invoice; auditor concurrence on methodology.

What about hallucination risk?

Output validated against known vendor master, GL accounts, and PO data; confidence-threshold catches uncertain cases. Hallucination is a confidence-score signal.

Does this work in QuickBooks / Xero / smaller GLs?

Conceptually yes; specific platform integration depth varies. Best-fit on NetSuite / Sage Intacct currently.

What about international tax / VAT?

Out of scope for this implementation; multi-currency handled per Bill / NetSuite native; VAT requires separate workstream.

How do you train new vendors?

Few-shot learning from initial reviewer corrections; full integration after ~10 invoices.

Can the AI suggest journal accruals at month-end?

Yes — accrual suggestion is a related workflow; same control framework applies.

What about IPO / public-company SOX 404(b)?

Yes — the framework supports auditor attestation. Methodology memo + design + operating testing required.

How does this compare to Stampli?

Stampli is AP automation with embedded AI; we layer AI agent on top of Bill (existing stack). Trade-off: maturity vs. integration depth.

Want AI in AP without giving up SOX-clean controls?

A 30-minute call walks your AP volume and tells you what AI agent + governance looks like.

Book a free call