Azure-native legal contract intelligence · proof of concept · source on GitHub ↗

Contract Intelligence

contract review & clause comparison

A pipeline that ingests legal contracts, extracts structured metadata with page-level citations, answers natural-language questions, and compares clauses against a versioned gold standard — built so a legal team can trust every answer. SQL decides the facts; retrieval supplies the evidence; the model only explains what it can cite.

01 — Orchestration

A router that knows when not to spend

Most real questions over a contract corpus are old-school structured queries — “show me / list / how many”. Those should never cost an LLM call. The router resolves intent in layers, paying for language understanding only when the question actually needs it.

Tier 1 — deterministic

Rule-routed

A tiny regex shortcut catches canonical reporting phrasings and goes straight to SQL.

0 tokens · ~8 ms · $0

Tier 2 — SLM-first

LLM-routed → reporting

No rule match? A small classifier reads the intent — and can still land on the cheap SQL path for non-canonical phrasings like “show me expired”.

~600 router tokens → SQL

Tier 3 — full reasoning

Search & comparison

Genuine content questions route to RAG or a clause-vs-gold diff, where the token spend buys real work.

embeddings + reasoning LLM

The badge row under every answer makes the tier visible — routing, confidence, data sources, tokens, latency, cost.

02 — The application

What a reviewer actually sees

A pro-code web surface — full control over routing, citations, comparison, and the human-in-the-loop workflow. Click any frame to view it full size.

03 — Ingestion

Understand once, at write-time

The expensive AI work — layout parsing, metadata / clause / obligation extraction with per-field confidence, and embeddings — runs once as a document lands, populating a structured metadata database. Query time then reads that pre-built structure cheaply, instead of re-deriving facts from raw documents on every request.

→ blob
Upload
Contract lands in storage
layout
Page-tagged text
Document layout extraction
llm
Structured fields
Metadata · clauses · obligations + confidence
embed
Vectors
Summary & per-clause embeddings
sql + index
Source of truth
SQL upsert + two vector indexes

04 — Architecture

One codebase, two runtimes

Profile branching lives entirely in client factories, so the pipeline, router, and API never ask which environment they’re in. The production profile is Azure-native and managed-identity throughout; the local profile is a full docker-compose stack with zero cloud dependencies.

CapabilityAzure (production)Local (docker-compose)
DocumentsADLS Gen2 / BlobAzurite
LayoutDocument Intelligence (prebuilt-layout)unstructured.io
Extraction & reasoningAzure OpenAI · gpt-4.1-mini / gpt-4.1Ollama · qwen3
Embeddingstext-embedding-3-small (1536-d)mxbai-embed-large (1024-d)
Source of truthAzure SQL (serverless)SQL Server 2022
RetrievalAI Search Basic — 2 indexes, semantic rankerQdrant — vector + filter
Ingestion triggerEvent Grid → Functions (push)Polling watcher (~5 s)
Query APIAzure FunctionsFastAPI

05 — Evidence

A measurable playground

The local stack plus a synthetic corpus and an eval harness make this a sandbox for the full spread of questions a user actually asks — and a way to measure whether the answers hold up.

The model is first-draft data; the reviewer is the source of truth. Every human correction is also a labeled example — the audit trail doubles as a growing evaluation set.

06 — Generalization

The same machine, other domains

Strip away the legal vocabulary and what remains is a generic shape: ingest documents → extract structured fields and chunks → embed and index → answer via a router over RAG and structured lookup → compare snippets against a curated reference set. Roughly two-thirds of the code never knew it was about contracts.

Reused as-is

  • Profile-aware client factories — same Azure / local seam
  • Ingestion pipeline shape — idempotent hash-based upsert
  • Router + handler dispatch and the SQL builder
  • Vector-search abstraction — two indexes, filtered purge
  • Structured-output prompting, coercions, OpenAPI
  • Eval harness, both infra stacks, the audit pattern

Swapped per domain

  • Entity schema — the tables and what’s extracted
  • Extraction prompt & JSON schema
  • Router rules & intents, the filter parser
  • Reporting SQL projection and labels
  • Comparison-to-gold reference set
  • Synthetic corpus + manifest for the new domain

~75% reuse

Sales call notes

Action items, sentiment, next steps — measured against stage playbooks instead of gold clauses.

~70% reuse

Survey free-text

Themes, sentiment, risk flags against a canonical theme taxonomy; adds a trend / aggregation intent.

~65% reuse

Support transcripts

Resolution and escalation risk against an agent-behaviour rubric; the clauses index becomes a turns index.

Estimated effort to repurpose the substrate for a new domain: roughly two to three person-weeks.