AI Transformer – Enterprise Knowledge Copilot

00

Introduction & Solution Approach

The problem we solve, why it matters at enterprise scale, and the six design principles that make our approach distinct.

Large IT enterprises often struggle with fragmented knowledge spread across PDFs, SOPs, support systems, and internal discussions — resulting in duplicate tickets, slow onboarding, repeated troubleshooting efforts, and unnecessary escalations. These inefficiencies reduce productivity, increase operational costs, and delay project delivery. AI Transformer addresses these challenges by centralising enterprise knowledge, automating repetitive IT-related queries, and streamlining onboarding and KT processes — enabling employees to focus on high-value work through secure, AI-driven automation. Our platform covers the complete employee and support workflow, from onboarding to ticket escalation, by integrating product documents, SOPs, policies, and KT session insights into a centralised knowledge ecosystem. With role-based access control, PII masking, and built-in guardrails, the solution ensures secure and relevant information access. Unlike traditional RAG-based systems, our unique approach captures and summarises real-time KT discussions and major issue resolutions — continuously enriching the knowledge base with practical insights that improve ticket automation, reduce duplication, and enhance enterprise-wide efficiency.

Our Approach & USP — Six Design Pillars

📈

Market Scale & Business Impact

India's enterprise IT services market exceeds $220B, yet knowledge silos silently erode 20–30% of productive hours per employee. Reducing mean ticket resolution time by just 15% translates to millions in annual savings per organisation — making this a high-ROI, immediately deployable platform with measurable impact from day one.

🔒

Privacy-First & Compliance-Ready

Automated PII masking via Microsoft Presidio, role-scoped retrieval, and immutable audit logs ensure full alignment with India's DPDP Act and healthcare HIPAA standards — no retrofitting needed. Privacy is a first-class architectural concern, not an afterthought.

🧩

Open-Source Stack & Horizontal Scalability

Zero proprietary licensing costs. FAISS, LangChain, sentence-transformers, and LLaMA give teams full auditability and freedom from vendor lock-in. Every service is containerised and independently scalable via Kubernetes — handling high data volumes and growing enterprise workloads without re-architecture.

🔄

Modular & LLM-Agnostic by Design

The LLM layer is fully interchangeable — swap Claude, LLaMA 3, Mistral, or any future model with a single config update. This modularity future-proofs the investment and lets enterprises choose models based on cost, performance, or data-residency requirements as the AI landscape evolves.

⏱️

Passive Data Collection via Cron Jobs

Background schedulers silently harvest Jira tickets, Confluence pages, and Zoom / Teams transcripts on configurable intervals — no manual uploads, always-fresh knowledge. This passive ingestion model keeps the knowledge base current with zero disruption to existing employee workflows.

⚡

Real-Time vs. Offline Processing

Urgent employee queries are handled by a live ReAct agent loop for near-instant responses; bulk ingestion, embedding updates, and KT summarisation run via async offline pipelines. This dual-mode architecture intelligently balances latency and throughput across both interactive and batch workloads.

01

Solution Architecture

Our solution uses a six-layer architecture to deliver secure, scalable, and intelligent knowledge access. Each layer handles a distinct responsibility, ensuring modularity and reliability:

💬 User & Access Layer: Interfaces with employees via chat, web, and email, enforcing authentication and RBAC.
🤖 Agentic Orchestration Layer: Coordinates multiple intelligent agents for retrieval, escalation, and personalized guidance.
🧠 LLM & Reasoning Layer: Processes queries with LLMs while applying guardrails, grounding, and confidence scoring.
🔍 Knowledge & Retrieval Layer: Retrieves relevant documents and context from vector stores and conversation memory.
📂 Ingestion & Document Pipeline: Collects, cleans, chunks, embeds, and indexes enterprise content continuously.
🔒 Security, Observability & Compliance Plane: Enforces PII masking, audit logging, RBAC, compliance, and system monitoring.

① User & Access Layer

Chat / Web UI (React) Slack / MS Teams Bot Email Interface REST / WebSocket (FastAPI) Auth / SSO + JWT RBAC Enforcement

↓

② Agentic Orchestration Layer — ReAct / Plan–Execute

Orchestrator Agent Document Search Agent Escalation Agent KT Suggestion Agent Monitoring Agent Personal Agent Agent Registry LangGraph / LangChain

↓

③ LLM & Reasoning Layer

LLaMA 3 / Mistral / Qwen Ollama (local serving) Input Guardrails — PII / Injection Output Guardrails + Citation Hallucination Flag Confidence Scoring Grounding & Source Cite

↓

④ Knowledge & Retrieval Layer

Retriever — Top-k + Filters CrossEncoder Reranker Role / Project / ACL Filter Summarizer Tool Conversation Memory FAISS / pgvector / ChromaDB SME & POC Lookup

↓

⑤ Ingestion & Document Pipeline — Cron / Event-Driven

Loaders — PDF / PPT / Wiki / CSV / Audio Whisper ASR (KT transcription) PyMuPDF / pdfplumber LangChain Chunker Embedder — BGE / sentence-transformers PII Redactor — Presidio Metadata Tagger (Role / Project / ACL) Jira REST Sync Confluence Crawler spaCy NER

↓

⑥ Security, Observability & Compliance Plane

Presidio PII Masking RBAC — Role-Tagged Metadata Audit Logs — PostgreSQL DPDP Act / HIPAA Compliance Eval — F1 · Precision · Recall LLM-as-Judge (DeepEval) Drift / Threshold Monitoring Prometheus + Grafana

⚡ Real-Time Path

Employee query → API Gateway → ReAct Agent Loop → Vector Retrieval (RBAC filtered) → LLM Generation → Confidence check → Answer + Citations or Jira Escalation

⏱️ Offline / Cron Path

Scheduled cron jobs → Pull from Jira / Confluence / Zoom → Transcribe (Whisper) → Clean → PII mask → Chunk → Embed → Index into Vector Store → Always-fresh knowledge base

Solution Components

🗃️ Enterprise Data Sources

Jira / ServiceNowIT tickets & issues

Confluence / WikiKB & policies

Zoom / MS TeamsKT recordings

SOP / PDF / PPTinternal documents

CSV / Logsstructured data

⚙️ Ingestion & Processing

PyMuPDF / pdfplumberPDF extraction

Whisper ASRKT transcription

Microsoft PresidioPII masking

spaCy NERentity tagging

Cron / APSchedulerpassive scheduling

🔍 Vector & Retrieval

FAISSprimary vector store

pgvector (PostgreSQL)relational vectors

ChromaDBalt vector store

BGE / sentence-transformersembeddings

CrossEncoder (HF)reranking

🤖 LLM & Agent Stack

LLaMA 3 / Mistral / Qwenopen-source LLMs

Ollamalocal LLM serving

LangChain / LangGraphagent framework

Guardrails AIoutput safety

FastAPIREST + WebSocket API

🗄️ Database Layer

PostgreSQL + pgvector11 relational tables

MongoDB6 document collections

Redisquery cache

Audit Logs (PG)compliance trail

Conv. Memory (Mongo)session context

📊 Eval, Monitoring & Deploy

RAGASRAG eval framework

DeepEval / LLM-as-Judgeanswer quality

Prometheus + Grafanadrift & monitoring

Docker / Kubernetescontainerised deploy

Streamlit / Reactuser interface

02

Low Level Design (LLD)

Interdependent modules cover the complete runtime pipeline — from passive data collection through to secure response delivery. Each module owns a clearly bounded responsibility and exposes well-defined interfaces, making the system independently scalable and LLM-agnostic by design.

IngestionScheduler — drives all offline data collection via configurable cron jobs across Jira, Confluence, and Zoom/Teams, with built-in failure retries and sync-event logging.
DocumentPipeline — handles parsing, cleaning, chunking, PII detection and redaction, and role/project metadata tagging before any chunk reaches the vector store.
EmbeddingService — manages dual-store indexing (FAISS for low-latency ANN search, pgvector for relational joins) and applies RBAC filtering before returning CrossEncoder-reranked top-k results.
AgentOrchestrator — runs the stateful ReAct loop, dispatches to specialised sub-agents (document, escalation, KT suggestion), and makes the escalate-vs-answer decision at a configurable 0.7 confidence threshold.
LLMService — wraps the open-source LLM with input guardrails (prompt-injection + PII), generation, output guardrails, source citation, and hallucination flagging.
SecurityLayer — enforces JWT auth, RBAC, prompt-injection detection, PII redaction, and immutable audit logging — cutting orthogonally across every module at each call boundary, ensuring DPDP Act and HIPAA compliance.

IngestionScheduler

schedule_sync(source, cron_expr) → job_id

fetch_documents(source_type, since_ts) → [raw_docs]

transcribe_audio(file_path) → text

trigger_pipeline(doc) → status

log_sync_event(source, ts, doc_count)

handle_failure(job_id, err) → retry_after

DocumentPipeline

load_file(path, file_type) → raw_text

clean_normalize(text) → cleaned

chunk(text, size=500, overlap=50) → [chunks]

detect_pii_entities(text) → [entities]

mask_pii(chunk) → safe_chunk

tag_metadata(chunk, doc_id, role, project, acl)

EmbeddingService

embed(chunks, model='bge-base') → [vectors]

store_faiss(vectors, metadata) → index_id

store_pgvector(vectors, metadata) → chunk_id

retrieve_topk(query_vec, k=5, acl) → [chunks]

rerank(chunks, query) → ranked_chunks

filter_rbac(chunks, user_role, project_id)

AgentOrchestrator

route_query(query, user_ctx) → agent_type

run_react_loop(query, tools, session_state)

call_tool(tool_name, args) → tool_output

summarize_kt(transcript) → kt_note

escalate_ticket(query, ctx) → ticket_id

check_confidence(score, thresh=0.7) → bool

LLMService

apply_input_guardrails(query) → safe_query

build_prompt(template, ctx_chunks, history)

generate(prompt, model='llama3') → raw_answer

apply_output_guardrails(output) → safe

cite_sources(answer, chunks) → cited_answer

flag_hallucination(answer, chunks) → bool

score_confidence(output, ctx) → float[0–1]

SecurityLayer

authenticate(jwt_token) → user_claims

check_rbac(user_role, resource_id) → bool

detect_prompt_injection(query) → bool

redact_pii_response(text) → redacted

check_access_policy(user, doc_id) → bool

log_audit(user_id, action, resource, ip)

enforce_compliance(resp, standard) → bool

IngestionScheduler → DocumentPipeline → EmbeddingService | AgentOrchestrator → LLMService | SecurityLayer ⊥ all modules

03

Data Sources & Engineering Steps

Seven data sources feed the knowledge base — a mix of public datasets, synthetic enterprise documents, and evaluation sets. Each source goes through a tailored engineering pipeline before reaching the vector store.

Public Tech Docs — Kafka, Kubernetes, Docker, FastAPI docs downloaded as PDFs/HTML → PyMuPDF parse → chunk → embed → FAISS.
Internal SOP PDFs — 20–30 LLM-generated mock SOPs → PDF export → parse → PII mask → RBAC tag → chunk → embed.
IT Support Tickets — 200–300 synthetic CSV tickets → pandas load → deduplicate → extract fields → vectorise → index.
KT Recordings — Zoom/Teams audio → Whisper transcribe → LLM summarise → chunk → embed → stored with session ID.
Confluence / Wiki — API crawl → HTML strip → chunk → spaCy NER tagging → embed.
SQuAD Dataset — IT-relevant Q&A pairs filtered → used as ground-truth for precision/recall evaluation only; not indexed.
StackOverflow (Kaggle) — top-voted Q&A → HTML tag strip → chunk → embed → index.

Data Source Diagram

04

Data Model — Entity Relationship Diagram

The data model spans two databases — PostgreSQL for structured relational data and MongoDB for unstructured document content. Seventeen entities cover the full lifecycle from user authentication through document ingestion, query handling, and evaluation.

Users & Sessions — stores user identity, role, department, and active session context with full audit columns on all rows.
Documents & Chunks — each document is broken into chunks with ACL tags; chunks link to their embeddings stored via pgvector.
Queries & Responses — every query is logged with PII flag, access-granted status, and confidence score; responses store citations and hallucination flag.
Tickets & Escalations — low-confidence responses auto-create support tickets linked back to the originating query and assigned user.
Evaluations & Feedback — F1, precision, recall, semantic similarity, and LLM-as-judge scores are stored per response alongside employee star ratings.

SQL - Relational Diagram -->

SQL - Relational Diagram

NoSQL - Relational Diagram -->

NoSQL - Relational Diagram

05

Data Flow Diagram (DFD)

The DFD shows how data moves between external sources, internal processing stages, and the employee — across both the ingestion pipeline and the live query path. Two flows run in parallel: an offline ingestion flow and a real-time retrieval flow.

Data Sources — PDFs/SOPs, Jira tickets, Zoom/Teams recordings, and Confluence pages feed into the ingestion pipeline via cron-triggered connectors.
Ingestion & Chunking — raw content is cleaned, PII-masked, chunked, and metadata-tagged before reaching the embedding stage.
Embedding & Vector Store — chunks are vectorised (BGE) and indexed into FAISS/pgvector; this is the knowledge base all queries search against.
Query & Retrieve — employee query is embedded, top-k chunks are retrieved with RBAC filtering and reranked before being passed to the LLM.
Answer or Escalate — LLM generates a cited response; high-confidence answers go to the employee, low-confidence queries are auto-escalated to Jira.

06

Sequence Diagram

The sequence traces three runtime paths across six actors — Employee, API Gateway, Agent Orchestrator, Vector DB, LLM Service, and Jira/SME. Every path begins with JWT validation and RBAC enforcement at the gateway before any agent or data layer is touched.

Happy Path — query is embedded → ACL-filtered top-k retrieved → CrossEncoder reranked → LLM generates with guardrails + citation → confidence ≥ 0.7 → cited answer returned to employee.
Escalation Path — confidence < 0.7 triggers Jira ticket creation with full query context and priority; employee receives ticket ID and SME assignment instead of a direct answer.
Async Ingestion Path — cron-triggered background jobs feed the Vector DB continuously; fully decoupled from the real-time query path and never adds latency to user requests.

07

State Transition Diagram

The state machine governs the complete lifecycle of a user query across eight distinct states. Guard conditions on every transition enforce security and correctness — no query reaches the LLM without passing auth, RBAC, and retrieval checks first.

AUTHENTICATING — auth failure or RBAC denial transitions immediately to REJECTED; logged to the immutable audit trail before any data is touched.
EMBEDDING → RETRIEVING — query is vectorised and top-k chunks retrieved with ACL filtering; if no chunks are found, a fallback prompt is applied rather than hard-failing.
AGENT_LOOP — iterative ReAct cycle; agent calls tools (doc search, KT lookup, ticket query) multiple times until sufficient context is accumulated before moving to GENERATING.
GENERATING — LLM runs with guardrails, citation, and hallucination detection; confidence score determines the final branch.
ANSWERED — confidence ≥ 0.7; cited response delivered to the employee with source references.
ESCALATING → FAILED — confidence < 0.7 creates a Jira ticket with SME assignment; any unhandled error at any state collapses to FAILED with a full audit log entry.

08

Open Source Libraries & Tools

The entire stack is built exclusively on open-source tools — zero proprietary licensing costs, full auditability, and complete vendor independence at every layer.

LLM-agnostic by design — swap between LLaMA 3, Mistral, Qwen, or any lightweight model (Phi-3, Gemma 2B) with a single config change via Ollama + LiteLLM routing.
Observable by default — Langfuse captures every LLM trace, token usage, and latency; Prometheus + Grafana covers infra metrics; RAGAS + DeepEval scores RAG quality continuously.
Dual vector store strategy — FAISS for low-latency ANN search and pgvector inside PostgreSQL for relational joins, giving flexibility to choose per query type.

open-source-stack.html

🤖 LLM & Agent Layer

LLaMA 3 / Mistral / Qwenprimary generation

Phi-3 / Gemma 2Blightweight / edge LLM

Ollamalocal LLM serving

LiteLLMLLM routing & switching

LangChain / LangGraphagent framework

Guardrails AIoutput safety

🔍 Embeddings & Vector Store

BGE / sentence-transformersembedding model

FAISSprimary ANN vector store

pgvector (PostgreSQL)relational vector store

ChromaDBalt vector store

CrossEncoder (HF)reranking

📥 Data Ingestion

PyMuPDF / pdfplumberPDF extraction

python-docxWord documents

WhisperKT audio transcription

Jira REST SDKticket sync

APSchedulercron ingestion jobs

✂️ Chunking & Processing

LangChain TextSplitterrecursive chunking

spaCyNLP / NER tagging

spaCy AnonymizerPII detection & redaction

tiktokentoken counting

pandasCSV / ticket data

⚙️ Backend & Infrastructure

FastAPIREST + WebSocket API

PostgreSQLstructured data + audit log

MongoDBdocument & chunk store

Redisquery cache

Docker / Kubernetescontainerised deploy

📊 Eval, Observability & UI

LangfuseLLM tracing & observability

RAGASRAG evaluation framework

DeepEvalLLM-as-judge scoring

Prometheus + Grafanainfra monitoring

Streamlit / Reactuser interface