Introduction & Solution Approach
The problem we solve, why it matters at enterprise scale, and the six design principles that make our approach distinct.
Large IT enterprises often struggle with fragmented knowledge spread across PDFs, SOPs, support systems, and internal discussions — resulting in duplicate tickets, slow onboarding, repeated troubleshooting efforts, and unnecessary escalations. These inefficiencies reduce productivity, increase operational costs, and delay project delivery. AI Transformer addresses these challenges by centralising enterprise knowledge, automating repetitive IT-related queries, and streamlining onboarding and KT processes — enabling employees to focus on high-value work through secure, AI-driven automation. Our platform covers the complete employee and support workflow, from onboarding to ticket escalation, by integrating product documents, SOPs, policies, and KT session insights into a centralised knowledge ecosystem. With role-based access control, PII masking, and built-in guardrails, the solution ensures secure and relevant information access. Unlike traditional RAG-based systems, our unique approach captures and summarises real-time KT discussions and major issue resolutions — continuously enriching the knowledge base with practical insights that improve ticket automation, reduce duplication, and enhance enterprise-wide efficiency.
Solution Architecture
Our solution uses a six-layer architecture to deliver secure, scalable, and intelligent knowledge access. Each layer handles a distinct responsibility, ensuring modularity and reliability:
- 💬 User & Access Layer: Interfaces with employees via chat, web, and email, enforcing authentication and RBAC.
- 🤖 Agentic Orchestration Layer: Coordinates multiple intelligent agents for retrieval, escalation, and personalized guidance.
- 🧠 LLM & Reasoning Layer: Processes queries with LLMs while applying guardrails, grounding, and confidence scoring.
- 🔍 Knowledge & Retrieval Layer: Retrieves relevant documents and context from vector stores and conversation memory.
- 📂 Ingestion & Document Pipeline: Collects, cleans, chunks, embeds, and indexes enterprise content continuously.
- 🔒 Security, Observability & Compliance Plane: Enforces PII masking, audit logging, RBAC, compliance, and system monitoring.
Low Level Design (LLD)
Interdependent modules cover the complete runtime pipeline — from passive data collection through to secure response delivery. Each module owns a clearly bounded responsibility and exposes well-defined interfaces, making the system independently scalable and LLM-agnostic by design.
- IngestionScheduler — drives all offline data collection via configurable cron jobs across Jira, Confluence, and Zoom/Teams, with built-in failure retries and sync-event logging.
- DocumentPipeline — handles parsing, cleaning, chunking, PII detection and redaction, and role/project metadata tagging before any chunk reaches the vector store.
- EmbeddingService — manages dual-store indexing (FAISS for low-latency ANN search, pgvector for relational joins) and applies RBAC filtering before returning CrossEncoder-reranked top-k results.
- AgentOrchestrator — runs the stateful ReAct loop, dispatches to specialised sub-agents (document, escalation, KT suggestion), and makes the escalate-vs-answer decision at a configurable 0.7 confidence threshold.
- LLMService — wraps the open-source LLM with input guardrails (prompt-injection + PII), generation, output guardrails, source citation, and hallucination flagging.
- SecurityLayer — enforces JWT auth, RBAC, prompt-injection detection, PII redaction, and immutable audit logging — cutting orthogonally across every module at each call boundary, ensuring DPDP Act and HIPAA compliance.
Data Sources & Engineering Steps
Seven data sources feed the knowledge base — a mix of public datasets, synthetic enterprise documents, and evaluation sets. Each source goes through a tailored engineering pipeline before reaching the vector store.
- Public Tech Docs — Kafka, Kubernetes, Docker, FastAPI docs downloaded as PDFs/HTML → PyMuPDF parse → chunk → embed → FAISS.
- Internal SOP PDFs — 20–30 LLM-generated mock SOPs → PDF export → parse → PII mask → RBAC tag → chunk → embed.
- IT Support Tickets — 200–300 synthetic CSV tickets → pandas load → deduplicate → extract fields → vectorise → index.
- KT Recordings — Zoom/Teams audio → Whisper transcribe → LLM summarise → chunk → embed → stored with session ID.
- Confluence / Wiki — API crawl → HTML strip → chunk → spaCy NER tagging → embed.
- SQuAD Dataset — IT-relevant Q&A pairs filtered → used as ground-truth for precision/recall evaluation only; not indexed.
- StackOverflow (Kaggle) — top-voted Q&A → HTML tag strip → chunk → embed → index.
Data Model — Entity Relationship Diagram
The data model spans two databases — PostgreSQL for structured relational data and MongoDB for unstructured document content. Seventeen entities cover the full lifecycle from user authentication through document ingestion, query handling, and evaluation.
- Users & Sessions — stores user identity, role, department, and active session context with full audit columns on all rows.
- Documents & Chunks — each document is broken into chunks with ACL tags; chunks link to their embeddings stored via pgvector.
- Queries & Responses — every query is logged with PII flag, access-granted status, and confidence score; responses store citations and hallucination flag.
- Tickets & Escalations — low-confidence responses auto-create support tickets linked back to the originating query and assigned user.
- Evaluations & Feedback — F1, precision, recall, semantic similarity, and LLM-as-judge scores are stored per response alongside employee star ratings.
Data Flow Diagram (DFD)
The DFD shows how data moves between external sources, internal processing stages, and the employee — across both the ingestion pipeline and the live query path. Two flows run in parallel: an offline ingestion flow and a real-time retrieval flow.
- Data Sources — PDFs/SOPs, Jira tickets, Zoom/Teams recordings, and Confluence pages feed into the ingestion pipeline via cron-triggered connectors.
- Ingestion & Chunking — raw content is cleaned, PII-masked, chunked, and metadata-tagged before reaching the embedding stage.
- Embedding & Vector Store — chunks are vectorised (BGE) and indexed into FAISS/pgvector; this is the knowledge base all queries search against.
- Query & Retrieve — employee query is embedded, top-k chunks are retrieved with RBAC filtering and reranked before being passed to the LLM.
- Answer or Escalate — LLM generates a cited response; high-confidence answers go to the employee, low-confidence queries are auto-escalated to Jira.
Sequence Diagram
The sequence traces three runtime paths across six actors — Employee, API Gateway, Agent Orchestrator, Vector DB, LLM Service, and Jira/SME. Every path begins with JWT validation and RBAC enforcement at the gateway before any agent or data layer is touched.
- Happy Path — query is embedded → ACL-filtered top-k retrieved → CrossEncoder reranked → LLM generates with guardrails + citation → confidence ≥ 0.7 → cited answer returned to employee.
- Escalation Path — confidence < 0.7 triggers Jira ticket creation with full query context and priority; employee receives ticket ID and SME assignment instead of a direct answer.
- Async Ingestion Path — cron-triggered background jobs feed the Vector DB continuously; fully decoupled from the real-time query path and never adds latency to user requests.
State Transition Diagram
The state machine governs the complete lifecycle of a user query across eight distinct states. Guard conditions on every transition enforce security and correctness — no query reaches the LLM without passing auth, RBAC, and retrieval checks first.
- AUTHENTICATING — auth failure or RBAC denial transitions immediately to REJECTED; logged to the immutable audit trail before any data is touched.
- EMBEDDING → RETRIEVING — query is vectorised and top-k chunks retrieved with ACL filtering; if no chunks are found, a fallback prompt is applied rather than hard-failing.
- AGENT_LOOP — iterative ReAct cycle; agent calls tools (doc search, KT lookup, ticket query) multiple times until sufficient context is accumulated before moving to GENERATING.
- GENERATING — LLM runs with guardrails, citation, and hallucination detection; confidence score determines the final branch.
- ANSWERED — confidence ≥ 0.7; cited response delivered to the employee with source references.
- ESCALATING → FAILED — confidence < 0.7 creates a Jira ticket with SME assignment; any unhandled error at any state collapses to FAILED with a full audit log entry.
Open Source Libraries & Tools
The entire stack is built exclusively on open-source tools — zero proprietary licensing costs, full auditability, and complete vendor independence at every layer.
- LLM-agnostic by design — swap between LLaMA 3, Mistral, Qwen, or any lightweight model (Phi-3, Gemma 2B) with a single config change via Ollama + LiteLLM routing.
- Observable by default — Langfuse captures every LLM trace, token usage, and latency; Prometheus + Grafana covers infra metrics; RAGAS + DeepEval scores RAG quality continuously.
- Dual vector store strategy — FAISS for low-latency ANN search and pgvector inside PostgreSQL for relational joins, giving flexibility to choose per query type.