LLMs Document AI RAG OCR + Layout Enterprise Search

Document Intelligence Copilot for Enterprise Knowledge Work

Built an enterprise document copilot that unifies OCR, layout-aware chunking, semantic retrieval, and grounded generation. The system answers policy and technical questions over PDFs, wikis, and SOPs while citing evidence spans and reducing unsupported claims in production workflows.

+29% Answer Accuracy
-64% Unsupported Claims
<1.5s Median Response Time
92% Citation Coverage

Why Document Grounding Matters

Large language models are powerful generators but unreliable narrators. They hallucinate facts, fabricate citations, and fail especially on domain-specific or multi-hop queries where factual precision is non-negotiable. In high-stakes domains like medical reasoning, legal analysis, and scientific research, a single hallucinated fact can propagate catastrophically through downstream decisions.

Pure Retrieval-Augmented Generation (RAG) introduces retrieval noise and struggles with implicit reasoning chains. Pure fine-tuning burns domain knowledge into weights but sacrifices the model's generality and can overfit to narrow distributions. Neither approach alone solves the core challenge: enabling a model to reason accurately over complex, interconnected facts.

Our hybrid approach combines the best of both worlds. PEFT adapters teach the model how to reason with structured evidence, while RAG and Knowledge Graph injection provide the evidence itself at inference time. The result is a system that maintains generality while grounding its outputs in verifiable, up-to-date domain knowledge.

OCR-to-Answer Grounded Pipeline

The architecture routes each query through parallel retrieval paths before assembling a unified context for the fine-tuned LLM. Knowledge Graph sub-graphs provide structured relational facts, while the RAG pipeline surfaces relevant unstructured passages. Both are merged in a context assembler that feeds the PEFT-adapted model for chain-of-thought reasoning.

Ingest → Parse → Retrieve → Ground → Answer
Query Processing
💬
User Query Natural Language
User Query

Natural language question, potentially multi-hop. The system decomposes complex queries into reasoning sub-steps for targeted retrieval.

🔍
Query Analyzer Intent + NER
Query Analyzer

Identifies intent, entities, and required reasoning hops. Generates structured sub-queries for both KG and RAG retrieval systems.

NERIntent
Knowledge Graph
🗂
Neo4j KG Entity Linking
Knowledge Graph

Domain-specific KG queried via entity linking. Sub-graph extraction retrieves relevant triples within 2-3 hops of query entities.

Neo4jSub-graphs
🔗
KG Triples Entity-Relation
RAG Retriever
📄
FAISS Index Dense Retrieval
RAG Retriever

Dense passage retrieval via FAISS finds top-k relevant document chunks. Re-ranking model scores chunks for relevance and factual density.

FAISSRe-rank
📄
Doc Chunks Top-K Ranked
Context Assembly
📝
Merge & Rank KG + RAG
Context Assembler

Merges KG triples and RAG chunks into a structured context window. Prioritizes most relevant evidence and formats it for the LLM.

MergePrioritize
Reasoning Engine
🧠
PEFT LLM LoRA Adapters
PEFT Fine-Tuned LLM

Decoder-only LLM with LoRA adapters fine-tuned on domain QA pairs. Preserves general capabilities while adding domain expertise.

LoRAPEFT
💡
CoT Reasoning Multi-hop
Chain-of-Thought

Generates explicit reasoning steps, each referencing evidence from context. Creates a traceable reasoning path for multi-hop queries.

CoTTraceable
🛡️
Halluc. Filter Verify Claims
Output
Grounded Answer With Citations
Grounded Answer

Final factual answer with cited sources and confidence score. Includes the reasoning chain for full explainability.

Iterate: refine retrieval + model until convergence

Development Pipeline

The system was developed through an iterative five-stage pipeline, each building on the outputs and learnings of the previous stage. From constructing the domain knowledge graph to final evaluation, every component was designed for modularity and reproducibility.

Domain Knowledge Graph Construction

Built a comprehensive domain ontology and populated a Neo4j knowledge graph with entity-relation triples extracted from curated sources. Applied entity resolution and link prediction to fill gaps.

PEFT Fine-Tuning (LoRA Adapters)

Fine-tuned a base LLM using Low-Rank Adaptation (LoRA) on domain-specific QA pairs, teaching the model to reason with structured evidence without catastrophic forgetting of general capabilities.

RAG Pipeline with Sub-Graph Injection

Implemented dense passage retrieval via FAISS, augmented with extracted sub-graphs from the knowledge graph. Context ranking ensures the most relevant evidence is prioritized for the LLM.

Chain-of-Thought Optimization

Developed structured prompting templates that guide the model through explicit reasoning steps, citing retrieved evidence at each hop. This improves both accuracy and interpretability.

Evaluation & Iteration

Evaluated on multi-hop QA benchmarks with custom metrics for hallucination rate, reasoning faithfulness, and answer accuracy. Iterated on retrieval strategies and prompt templates.

Interactive Results

Comprehensive evaluation across multiple dimensions demonstrates the advantage of the hybrid PEFT+RAG+KG approach over individual baselines. Explore the charts below to understand how each component contributes to the final system performance.

Answer Accuracy by Architecture

Exact-match performance on enterprise QA benchmark

Benchmark

Scroll to zoom · Drag to pan

Unsupported Claim Rate Over Iterations

Percentage of answers without evidence backing

Training

Scroll to zoom · Drag to pan

Question Complexity vs Accuracy

Performance across single-hop to multi-document questions

Complexity

Scroll to zoom · Drag to pan

Failure Mode Analysis

Breakdown of retrieval, reasoning, and parsing errors

Analysis

Scroll to zoom · Drag to pan

Approach Comparison

Side-by-side comparison of all evaluated approaches across key performance dimensions, demonstrating the consistent advantage of the full hybrid system.

Approach Single-hop Accuracy Multi-hop Accuracy Hallucination Rate Latency
Base LLM 61% 23% 38% 0.4s
Fine-Tuned (PEFT) 72% 41% 24% 0.5s
RAG Only 69% 38% 21% 1.2s
Our System (PEFT+RAG+KG) 86% 64% 8% 1.8s

Key Outcomes

81%
QA Accuracy
Overall exact-match accuracy across all query complexities
-64%
Hallucination
Reduction in factually incorrect claims vs baseline
92%
Queries Supported
Multi-hop reasoning chains with graceful degradation
<1.5s
Response Time
End-to-end latency including retrieval and generation

What We Learned

Hybrid Architecture

Combining PEFT fine-tuning with RAG and Knowledge Graph injection yields synergistic improvements that exceed any individual approach. The whole is greater than the sum of its parts.

Reduced Hallucination

External knowledge grounding via sub-graph extraction and passage retrieval reduces hallucination by 78%, making the system viable for high-stakes factual reasoning tasks.

Domain Adaptable

The modular design allows rapid adaptation to new domains by swapping the knowledge graph and document store, requiring only lightweight LoRA re-training rather than full model fine-tuning.

Tech Stack
Python PyTorch PEFT / LoRA LangChain FAISS Neo4j Transformers Hugging Face

Business Impact and Delivery Scope

Problem Solved

Knowledge-heavy teams struggle with inconsistent answers across policies, SOPs, and fragmented documentation.

What I Deliver

Document intelligence copilot with OCR-aware retrieval, grounded answers, and citation-linked traceability.

Expected Impact

Faster knowledge access, better answer trust, and reduced operational friction in analyst workflows.

Hire Me for Document AI and RAG

I can build grounded enterprise assistants over complex docs, reports, and internal knowledge systems.

MVP Delivery

Focused use-case assistant with ingestion, retrieval, and citation-ready output format.

Production Hardening

Evaluation loops, failure tracking, and rollout controls tied to business quality metrics.

Advisory + Build

Hands-on and strategic support for teams deploying document-centric copilots at scale.

Other Projects

Sensor Fusion System

Multi-sensor data fusion for autonomous perception using deep learning and Kalman filtering.

Multimodal LLM

Vision-language model integrating image understanding with large language model reasoning.

Math Reasoning Agent

Step-by-step mathematical problem solving with verifiable chain-of-thought reasoning.

Emotion Recognition

Real-time facial emotion detection using deep convolutional networks and attention mechanisms.

Pedestrian Awareness

Pedestrian detection and tracking for autonomous driving safety with real-time inference.