Vision + LLM Any visual document → structured, schema-aware JSON

Read what's in front of you.
Then make sense of it.

TextigoAI pulls structured context out of any visual document — invoices, contracts, IDs, business cards, handwritten notes, whiteboard photos, screenshots, scanned PDFs, even sketched diagrams. Not just text — entities, relationships, intent. Give it your JSON schema, get it back filled in.

20+ doc types Schema-aware PII-aware by design
Invoice
Acme Supply Co.
412 Mercer St, Brooklyn NY
No.
INV-2026-0481
Bill to
Demo Co. Ltd.
99 Field Rd, Austin TX
Date
2026-06-14
ItemQtyAmt
Hex bolts 5/16 box12$ 84.00
Galv anchor plate4$112.00
Self-leveling compound2$ 46.00
Delivery & handling1$ 25.00
Subtotal$267.00 Tax (8.875%)$ 23.69
Total $290.69
output.json
LIVE
{
  "vendor": "Acme Supply Co.",
  "invoice_no": "INV-2026-0481",
  "date": "2026-06-14",
  "bill_to": "Demo Co. Ltd.",
  "line_items": [ 4 ],
  "subtotal": 267.00,
  "total": 290.69|
}
              
schema match conf 0.97
20+ doc types
invoices, IDs, forms, whiteboards, charts, handwriting…
schema-aware
bring your JSON shape — get it back filled in
PII-aware
redact, vault & audit sensitive fields by default
ms latency
sub-second for most documents

The OCR problem

Legacy OCR gives you a wall of text.
You wanted a record you can act on.

A character is not a number. A line of text is not a line item. A signature isn't a name, a date, and a binding party. TextigoAI reads the document the way a person does — layout, entities, relationships, intent — and hands you back exactly the shape your system expects.

What it sees

Every document. Any layout.

From a crisp PDF to a phone photo of a napkin sketch — one extraction surface, one schema-aware output.

Invoices & receipts

Vendor, dates, line items, tax, totals, payment terms — itemized and totals-reconciled, ready to post.

Contracts & forms

Clauses, effective & renewal dates, signatories, named entities and the binding obligations between them.

Business cards & IDs

VCard-grade contact extraction, MRZ/PDF417 parsing, role and organization with confidence scoring.

Handwriting & whiteboards

Phone-photo-grade messy in, clean structured out — meeting notes, intake forms, scribbled to-do lists.

Charts & diagrams

Pull the underlying data series from a chart image — bar, line, scatter, pie — with axes and labels.

Screenshots & UI

Scrape what you see on screen, including dark-mode dashboards, tables, sidebars, and modal forms.

The difference

Context, not just text.

Same input. Two very different outputs.

Legacy OCR

Text only
ACME SUPPLY CO 412 MERCER ST BROOKLYN NY INVOICE INV-2026-0481 BILL TO Demo Co. Ltd 99 Field Rd Austin TX DATE 2026-06-14 ITEM QTY AMT Hex bolts 5/16 box 12 84.00 Galv anchor plate 4 112.OO Self-leveling compound 2 46.OO Delivery and handling 1 25.00 Subtotal 267.OO Tax 8.875% 23.69 Total 290.69
  • Wall of strings — no fields, no types
  • "112.OO" — character confusion bleeds into your DB
  • Reading order broken — columns merged across rows

TextigoAI

Structured
{
  "document_type": "invoice",
  "vendor": {
    "name": "Acme Supply Co.",
    "address": "412 Mercer St, Brooklyn NY"
  },
  "invoice_no": "INV-2026-0481",
  "issued_at": "2026-06-14",
  "bill_to": "Demo Co. Ltd.",
  "line_items": [
    { "desc": "Hex bolts 5/16 box", "qty": 12, "amt": 84.00 },
    { "desc": "Galv anchor plate", "qty": 4,  "amt": 112.00 }
  ],
  "subtotal": 267.00,
  "tax": 23.69,
  "total": 290.69,
  "reconciled": true
}
  • Typed fields, normalized dates, currency-aware numbers
  • Math reconciled — flagged when totals don't match line items
  • Layout & entity-aware — "bill to" vs "ship to" never confused

How it works

Three steps. One round trip.

Drag-and-drop in the playground or POST to the API — same engine, same output.

01

Upload

Drop any image, PDF, or photo into the playground — or POST it to /v1/extract. PNG, JPG, HEIC, TIFF, PDF, multi-page — all native.

02

Extract

Vision model + layout-aware OCR + entity recognition run in one pass. The model sees the page like a person — columns, headers, signatures, marks, hand-corrections.

03

Structure

Output to your JSON schema, your CSV columns, or fire it straight at your webhook. Confidence scores and a per-field audit trail come along for free.

Schema-aware output

Bring your own schema.

Tell us the shape you want back. TextigoAI binds the document to your fields — coercing types, normalizing units, picking the right entity for each slot — and gives you back a record that drops cleanly into your database, your CRM, your accounting system, your model input.

  • Any JSON Schema — we honor types, enums, required, and patterns.
  • Per-field confidence — gate writes on a threshold, fall back to human-in-the-loop.
  • Unit & format normalization — "Jun 14, '26" → 2026-06-14.
  • Multi-page & multi-doc — one schema, applied across batches.
Your schema in
schema.json
{
  "type": "object",
  "required": ["vendor", "total", "issued_at"],
  "properties": {
    "vendor":     { "type": "string" },
    "issued_at":  { "type": "string", "format": "date" },
    "total":      { "type": "number", "minimum": 0 },
    "currency":   { "enum": ["USD","EUR","GBP"] },
    "po_number":  { "type": "string", "pattern": "^PO-\\d+$" }
  }
}
Filled out
conf 0.97
{
  "vendor":     "Acme Supply Co.",
  "issued_at":  "2026-06-14",
  "total":      290.69,
  "currency":   "USD",
  "po_number":  "PO-44812"
}
Redaction in flight
on by default
Subject name vaulted · tok_a3f...
Card number redacted · **** 4221
SSN redacted · XXX-XX-XXXX
Address vaulted · tok_a3f...
Invoice total $290.69 · allowed
Audit log x-trace log_2ff1...9c4 · signed

PII & compliance

Sensitive fields, handled by design.

Documents are full of things you do not want sitting in plaintext in your data lake. TextigoAI flags, redacts, or vaults PII the moment it's extracted — before a single field is written downstream.

  • Redact PII before storage — configurable per field, per environment.
  • Vault sensitive values and exchange a reversible token to your app.
  • Signed audit logs — tamper-evident per-request trace, ready for SOC 2.
  • On-prem & VPC deployment available for regulated workloads.

API surface

A small, sharp surface.

REST · JSON in, JSON out · sync or async · idempotent.

VerbEndpointPurpose
POST /v1/extract Single doc — returns structured JSON synchronously.
POST /v1/extract/batch Bulk — queue many docs, get a job id back.
POST /v1/extract/with-schema Bind to your JSON Schema — typed, validated output.
GET /v1/jobs/<id> Async job status · webhook on completion.
POST /v1/redact PII redaction only — pass-through, no extraction.
Request
# single doc → structured JSON
curl https://api.textigo.ai/v1/extract \
  -H "Authorization: Bearer $TXG_KEY" \
  -F "file=@invoice.pdf" \
  -F "redact_pii=true"
Response
{
  "id": "doc_2ff1c9...",
  "document_type": "invoice",
  "data": { "vendor": "Acme Supply Co.", ... },
  "confidence": 0.97,
  "latency_ms": 418
}

Where it ships

Wired into the rest of the family.

TextigoAI is the extraction layer under several Gridspin verticals — same engine, vertical-specific schemas.

Why teams switch

Replace your OCR. Keep the rest of your stack.

Ship in a day

A single endpoint, your schema, your webhook. No templates to author, no field-mapping spreadsheets to maintain.

One layer, many docs

Stop maintaining four parsers. Invoices, IDs, contracts, and screenshots all share the same engine.

PII safe by default

Redaction, vaulting, and signed audit logs come standard — pass your next compliance review without rework.

Throw a document at it.

Open the playground, drag in your messiest invoice, contract, or photo. See it come back as the JSON you actually want — in under a second.