Nasscom Hackathon · Stage 2 Submission · Team: AI Transformer

Enterprise Knowledge Copilot

This document presents the key technical artifacts and design considerations for our proposed RAG + Agentic Workflow solution. It includes architectural diagrams, data models, data flows, open-source technologies, and other project artifacts that demonstrate the design, functionality, and feasibility of the solution.

RAG Pipeline Multi-Agent (ReAct) PII Masking · RBAC
00

Introduction & Solution Approach

The problem we solve, why it matters at enterprise scale, and the six design principles that make our approach distinct.

Large IT enterprises often struggle with fragmented knowledge spread across PDFs, SOPs, support systems, and internal discussions — resulting in duplicate tickets, slow onboarding, repeated troubleshooting efforts, and unnecessary escalations. These inefficiencies reduce productivity, increase operational costs, and delay project delivery. AI Transformer addresses these challenges by centralising enterprise knowledge, automating repetitive IT-related queries, and streamlining onboarding and KT processes — enabling employees to focus on high-value work through secure, AI-driven automation. Our platform covers the complete employee and support workflow, from onboarding to ticket escalation, by integrating product documents, SOPs, policies, and KT session insights into a centralised knowledge ecosystem. With role-based access control, PII masking, and built-in guardrails, the solution ensures secure and relevant information access. Unlike traditional RAG-based systems, our unique approach captures and summarises real-time KT discussions and major issue resolutions — continuously enriching the knowledge base with practical insights that improve ticket automation, reduce duplication, and enhance enterprise-wide efficiency.

Our Approach & USP — Six Design Pillars
📈
Market Scale & Business Impact
India's enterprise IT services market exceeds $220B, yet knowledge silos silently erode 20–30% of productive hours per employee. Reducing mean ticket resolution time by just 15% translates to millions in annual savings per organisation — making this a high-ROI, immediately deployable platform with measurable impact from day one.
🔒
Privacy-First & Compliance-Ready
Automated PII masking via Microsoft Presidio, role-scoped retrieval, and immutable audit logs ensure full alignment with India's DPDP Act and healthcare HIPAA standards — no retrofitting needed. Privacy is a first-class architectural concern, not an afterthought.
🧩
Open-Source Stack & Horizontal Scalability
Zero proprietary licensing costs. FAISS, LangChain, sentence-transformers, and LLaMA give teams full auditability and freedom from vendor lock-in. Every service is containerised and independently scalable via Kubernetes — handling high data volumes and growing enterprise workloads without re-architecture.
🔄
Modular & LLM-Agnostic by Design
The LLM layer is fully interchangeable — swap Claude, LLaMA 3, Mistral, or any future model with a single config update. This modularity future-proofs the investment and lets enterprises choose models based on cost, performance, or data-residency requirements as the AI landscape evolves.
⏱️
Passive Data Collection via Cron Jobs
Background schedulers silently harvest Jira tickets, Confluence pages, and Zoom / Teams transcripts on configurable intervals — no manual uploads, always-fresh knowledge. This passive ingestion model keeps the knowledge base current with zero disruption to existing employee workflows.
Real-Time vs. Offline Processing
Urgent employee queries are handled by a live ReAct agent loop for near-instant responses; bulk ingestion, embedding updates, and KT summarisation run via async offline pipelines. This dual-mode architecture intelligently balances latency and throughput across both interactive and batch workloads.
01

Solution Architecture

Our solution uses a six-layer architecture to deliver secure, scalable, and intelligent knowledge access. Each layer handles a distinct responsibility, ensuring modularity and reliability:


① User & Access Layer
Chat / Web UI (React) Slack / MS Teams Bot Email Interface REST / WebSocket (FastAPI) Auth / SSO + JWT RBAC Enforcement
② Agentic Orchestration Layer — ReAct / Plan–Execute
Orchestrator Agent Document Search Agent Escalation Agent KT Suggestion Agent Monitoring Agent Personal Agent Agent Registry LangGraph / LangChain
③ LLM & Reasoning Layer
LLaMA 3 / Mistral / Qwen Ollama (local serving) Input Guardrails — PII / Injection Output Guardrails + Citation Hallucination Flag Confidence Scoring Grounding & Source Cite
④ Knowledge & Retrieval Layer
Retriever — Top-k + Filters CrossEncoder Reranker Role / Project / ACL Filter Summarizer Tool Conversation Memory FAISS / pgvector / ChromaDB SME & POC Lookup
⑤ Ingestion & Document Pipeline — Cron / Event-Driven
Loaders — PDF / PPT / Wiki / CSV / Audio Whisper ASR (KT transcription) PyMuPDF / pdfplumber LangChain Chunker Embedder — BGE / sentence-transformers PII Redactor — Presidio Metadata Tagger (Role / Project / ACL) Jira REST Sync Confluence Crawler spaCy NER
⑥ Security, Observability & Compliance Plane
Presidio PII Masking RBAC — Role-Tagged Metadata Audit Logs — PostgreSQL DPDP Act / HIPAA Compliance Eval — F1 · Precision · Recall LLM-as-Judge (DeepEval) Drift / Threshold Monitoring Prometheus + Grafana
⚡ Real-Time Path
Employee query → API Gateway → ReAct Agent Loop → Vector Retrieval (RBAC filtered) → LLM Generation → Confidence check → Answer + Citations or Jira Escalation
⏱️ Offline / Cron Path
Scheduled cron jobs → Pull from Jira / Confluence / Zoom → Transcribe (Whisper) → Clean → PII mask → Chunk → Embed → Index into Vector Store → Always-fresh knowledge base
Solution Components
🗃️ Enterprise Data Sources
Jira / ServiceNowIT tickets & issues
Confluence / WikiKB & policies
Zoom / MS TeamsKT recordings
SOP / PDF / PPTinternal documents
CSV / Logsstructured data
⚙️ Ingestion & Processing
PyMuPDF / pdfplumberPDF extraction
Whisper ASRKT transcription
Microsoft PresidioPII masking
spaCy NERentity tagging
Cron / APSchedulerpassive scheduling
🔍 Vector & Retrieval
FAISSprimary vector store
pgvector (PostgreSQL)relational vectors
ChromaDBalt vector store
BGE / sentence-transformersembeddings
CrossEncoder (HF)reranking
🤖 LLM & Agent Stack
LLaMA 3 / Mistral / Qwenopen-source LLMs
Ollamalocal LLM serving
LangChain / LangGraphagent framework
Guardrails AIoutput safety
FastAPIREST + WebSocket API
🗄️ Database Layer
PostgreSQL + pgvector11 relational tables
MongoDB6 document collections
Redisquery cache
Audit Logs (PG)compliance trail
Conv. Memory (Mongo)session context
📊 Eval, Monitoring & Deploy
RAGASRAG eval framework
DeepEval / LLM-as-Judgeanswer quality
Prometheus + Grafanadrift & monitoring
Docker / Kubernetescontainerised deploy
Streamlit / Reactuser interface
02

Low Level Design (LLD)

Interdependent modules cover the complete runtime pipeline — from passive data collection through to secure response delivery. Each module owns a clearly bounded responsibility and exposes well-defined interfaces, making the system independently scalable and LLM-agnostic by design.

  • IngestionScheduler — drives all offline data collection via configurable cron jobs across Jira, Confluence, and Zoom/Teams, with built-in failure retries and sync-event logging.
  • DocumentPipeline — handles parsing, cleaning, chunking, PII detection and redaction, and role/project metadata tagging before any chunk reaches the vector store.
  • EmbeddingService — manages dual-store indexing (FAISS for low-latency ANN search, pgvector for relational joins) and applies RBAC filtering before returning CrossEncoder-reranked top-k results.
  • AgentOrchestrator — runs the stateful ReAct loop, dispatches to specialised sub-agents (document, escalation, KT suggestion), and makes the escalate-vs-answer decision at a configurable 0.7 confidence threshold.
  • LLMService — wraps the open-source LLM with input guardrails (prompt-injection + PII), generation, output guardrails, source citation, and hallucination flagging.
  • SecurityLayer — enforces JWT auth, RBAC, prompt-injection detection, PII redaction, and immutable audit logging — cutting orthogonally across every module at each call boundary, ensuring DPDP Act and HIPAA compliance.
IngestionScheduler
schedule_sync(source, cron_expr) job_id
fetch_documents(source_type, since_ts) [raw_docs]
transcribe_audio(file_path) text
trigger_pipeline(doc) status
log_sync_event(source, ts, doc_count)
handle_failure(job_id, err) retry_after
DocumentPipeline
load_file(path, file_type) raw_text
clean_normalize(text) cleaned
chunk(text, size=500, overlap=50) [chunks]
detect_pii_entities(text) [entities]
mask_pii(chunk) safe_chunk
tag_metadata(chunk, doc_id, role, project, acl)
EmbeddingService
embed(chunks, model='bge-base') [vectors]
store_faiss(vectors, metadata) index_id
store_pgvector(vectors, metadata) chunk_id
retrieve_topk(query_vec, k=5, acl) [chunks]
rerank(chunks, query) ranked_chunks
filter_rbac(chunks, user_role, project_id)
AgentOrchestrator
route_query(query, user_ctx) agent_type
run_react_loop(query, tools, session_state)
call_tool(tool_name, args) tool_output
summarize_kt(transcript) kt_note
escalate_ticket(query, ctx) ticket_id
check_confidence(score, thresh=0.7) bool
LLMService
apply_input_guardrails(query) safe_query
build_prompt(template, ctx_chunks, history)
generate(prompt, model='llama3') raw_answer
apply_output_guardrails(output) safe
cite_sources(answer, chunks) cited_answer
flag_hallucination(answer, chunks) bool
score_confidence(output, ctx) float[0–1]
SecurityLayer
authenticate(jwt_token) user_claims
check_rbac(user_role, resource_id) bool
detect_prompt_injection(query) bool
redact_pii_response(text) redacted
check_access_policy(user, doc_id) bool
log_audit(user_id, action, resource, ip)
enforce_compliance(resp, standard) bool
IngestionSchedulerDocumentPipelineEmbeddingService  |  AgentOrchestratorLLMService  |  SecurityLayer ⊥ all modules
03

Data Sources & Engineering Steps

Seven data sources feed the knowledge base — a mix of public datasets, synthetic enterprise documents, and evaluation sets. Each source goes through a tailored engineering pipeline before reaching the vector store.

  • Public Tech Docs — Kafka, Kubernetes, Docker, FastAPI docs downloaded as PDFs/HTML → PyMuPDF parse → chunk → embed → FAISS.
  • Internal SOP PDFs — 20–30 LLM-generated mock SOPs → PDF export → parse → PII mask → RBAC tag → chunk → embed.
  • IT Support Tickets — 200–300 synthetic CSV tickets → pandas load → deduplicate → extract fields → vectorise → index.
  • KT Recordings — Zoom/Teams audio → Whisper transcribe → LLM summarise → chunk → embed → stored with session ID.
  • Confluence / Wiki — API crawl → HTML strip → chunk → spaCy NER tagging → embed.
  • SQuAD Dataset — IT-relevant Q&A pairs filtered → used as ground-truth for precision/recall evaluation only; not indexed.
  • StackOverflow (Kaggle) — top-voted Q&A → HTML tag strip → chunk → embed → index.
Data Source Diagram
Data Source Diagram
04

Data Model — Entity Relationship Diagram

The data model spans two databases — PostgreSQL for structured relational data and MongoDB for unstructured document content. Seventeen entities cover the full lifecycle from user authentication through document ingestion, query handling, and evaluation.

  • Users & Sessions — stores user identity, role, department, and active session context with full audit columns on all rows.
  • Documents & Chunks — each document is broken into chunks with ACL tags; chunks link to their embeddings stored via pgvector.
  • Queries & Responses — every query is logged with PII flag, access-granted status, and confidence score; responses store citations and hallucination flag.
  • Tickets & Escalations — low-confidence responses auto-create support tickets linked back to the originating query and assigned user.
  • Evaluations & Feedback — F1, precision, recall, semantic similarity, and LLM-as-judge scores are stored per response alongside employee star ratings.
SQL - Relational Diagram -->
SQL - Relational Diagram
SQL - Relational Diagram


NoSQL - Relational Diagram -->
NoSQL - Relational Diagram
NoSQL - Relational Diagram
05

Data Flow Diagram (DFD)

The DFD shows how data moves between external sources, internal processing stages, and the employee — across both the ingestion pipeline and the live query path. Two flows run in parallel: an offline ingestion flow and a real-time retrieval flow.

  • Data Sources — PDFs/SOPs, Jira tickets, Zoom/Teams recordings, and Confluence pages feed into the ingestion pipeline via cron-triggered connectors.
  • Ingestion & Chunking — raw content is cleaned, PII-masked, chunked, and metadata-tagged before reaching the embedding stage.
  • Embedding & Vector Store — chunks are vectorised (BGE) and indexed into FAISS/pgvector; this is the knowledge base all queries search against.
  • Query & Retrieve — employee query is embedded, top-k chunks are retrieved with RBAC filtering and reranked before being passed to the LLM.
  • Answer or Escalate — LLM generates a cited response; high-confidence answers go to the employee, low-confidence queries are auto-escalated to Jira.
Data Flow Diagram
06

Sequence Diagram

The sequence traces three runtime paths across six actors — Employee, API Gateway, Agent Orchestrator, Vector DB, LLM Service, and Jira/SME. Every path begins with JWT validation and RBAC enforcement at the gateway before any agent or data layer is touched.

  • Happy Path — query is embedded → ACL-filtered top-k retrieved → CrossEncoder reranked → LLM generates with guardrails + citation → confidence ≥ 0.7 → cited answer returned to employee.
  • Escalation Path — confidence < 0.7 triggers Jira ticket creation with full query context and priority; employee receives ticket ID and SME assignment instead of a direct answer.
  • Async Ingestion Path — cron-triggered background jobs feed the Vector DB continuously; fully decoupled from the real-time query path and never adds latency to user requests.
Employee API Gateway Agent Orch. Vector DB LLM Service Jira / SME escalation POST /ask (query + JWT) auth · RBAC · JWT route_query(user, query, role) embed + search(k=5, acl_filter) top-k chunks [RBAC filtered] rerank + ACL filter generate(prompt, ctx_chunks) guard → LLM → guard answer + citations + conf_score confidence ≥ 0.7? answer + citations [conf ≥ 0.7] response + sources — escalation path (conf < 0.7) — create_ticket(query, ctx, priority=HIGH) ticket_id + sme_assigned escalation_response(ticket_id) "Escalated → Ticket #xxx" request response success path escalation path
07

State Transition Diagram

The state machine governs the complete lifecycle of a user query across eight distinct states. Guard conditions on every transition enforce security and correctness — no query reaches the LLM without passing auth, RBAC, and retrieval checks first.

  • AUTHENTICATING — auth failure or RBAC denial transitions immediately to REJECTED; logged to the immutable audit trail before any data is touched.
  • EMBEDDING → RETRIEVING — query is vectorised and top-k chunks retrieved with ACL filtering; if no chunks are found, a fallback prompt is applied rather than hard-failing.
  • AGENT_LOOP — iterative ReAct cycle; agent calls tools (doc search, KT lookup, ticket query) multiple times until sufficient context is accumulated before moving to GENERATING.
  • GENERATING — LLM runs with guardrails, citation, and hallucination detection; confidence score determines the final branch.
  • ANSWERED — confidence ≥ 0.7; cited response delivered to the employee with source references.
  • ESCALATING → FAILED — confidence < 0.7 creates a Jira ticket with SME assignment; any unhandled error at any state collapses to FAILED with a full audit log entry.
start RECEIVED log query · validate format submit AUTHENTICATING JWT · RBAC · access policy auth fail REJECTED audit log · terminate auth OK · role set EMBEDDING query → vector (BGE) embedded RETRIEVING FAISS top-k · ACL filter · rerank no chunks → fallback chunks found AGENT_LOOP ReAct · tool calls · state update iterate ReAct done GENERATING LLM · guardrails · cite · score conf ≥ 0.7 ANSWERED response + citations conf < 0.7 ESCALATING Jira ticket · SME notify error FAILED transition guard / fallback success terminal error / escalation terminal
08

Open Source Libraries & Tools

The entire stack is built exclusively on open-source tools — zero proprietary licensing costs, full auditability, and complete vendor independence at every layer.

  • LLM-agnostic by design — swap between LLaMA 3, Mistral, Qwen, or any lightweight model (Phi-3, Gemma 2B) with a single config change via Ollama + LiteLLM routing.
  • Observable by default — Langfuse captures every LLM trace, token usage, and latency; Prometheus + Grafana covers infra metrics; RAGAS + DeepEval scores RAG quality continuously.
  • Dual vector store strategy — FAISS for low-latency ANN search and pgvector inside PostgreSQL for relational joins, giving flexibility to choose per query type.
open-source-stack.html
🤖 LLM & Agent Layer
LLaMA 3 / Mistral / Qwenprimary generation
Phi-3 / Gemma 2Blightweight / edge LLM
Ollamalocal LLM serving
LiteLLMLLM routing & switching
LangChain / LangGraphagent framework
Guardrails AIoutput safety
🔍 Embeddings & Vector Store
BGE / sentence-transformersembedding model
FAISSprimary ANN vector store
pgvector (PostgreSQL)relational vector store
ChromaDBalt vector store
CrossEncoder (HF)reranking
📥 Data Ingestion
PyMuPDF / pdfplumberPDF extraction
python-docxWord documents
WhisperKT audio transcription
Jira REST SDKticket sync
APSchedulercron ingestion jobs
✂️ Chunking & Processing
LangChain TextSplitterrecursive chunking
spaCyNLP / NER tagging
spaCy AnonymizerPII detection & redaction
tiktokentoken counting
pandasCSV / ticket data
⚙️ Backend & Infrastructure
FastAPIREST + WebSocket API
PostgreSQLstructured data + audit log
MongoDBdocument & chunk store
Redisquery cache
Docker / Kubernetescontainerised deploy
📊 Eval, Observability & UI
LangfuseLLM tracing & observability
RAGASRAG evaluation framework
DeepEvalLLM-as-judge scoring
Prometheus + Grafanainfra monitoring
Streamlit / Reactuser interface