Projects

Governed Agent Runtime

Production LLM agent runtime with multi-step tool use, rate limiting, idempotency, audit logging, token metering, and OpenTelemetry tracing.

FastAPIPythonRedisKafkaOpenAI APIPrometheusGrafanaOpenTelemetryDockerGitHub Actions

GitHub

Hybrid Retrieval and Reranking RAG System

Production RAG pipeline with BM25 + dense retrieval, cross-encoder reranking, RAGAS evaluation suite, and Prometheus metrics for faithfulness and latency monitoring.

PythonFastAPIFAISSBM25sentence-transformersRAGASDockerPrometheusGrafana

GitHub

Parameter-Efficient Fine-Tuning Benchmark

Systematic benchmark comparing Full FT, LoRA, and QLoRA across GPU memory, training cost, inference latency, and task performance on instruction-following datasets.

PythonPyTorchHugging Face TransformersPEFTBitsAndBytesWeights & BiasesDocker

GitHub

Production ML Lifecycle Platform

End-to-end MLOps platform with experiment tracking, registry-based model promotion, canary serving with traffic splitting, statistical drift detection, and automated retraining triggers.

PythonMLflowXGBoostFastAPIDockerKubernetesPrometheusGrafanaPostgreSQLRedis

GitHub

Real-Time Feature Store

Point-in-time consistent feature store with online/offline parity, sub-10ms serving from Redis, Kafka-based feature ingestion, and training-serving skew detection.

PythonKafkaRedisFastAPIParquetApache ArrowPrometheusGrafanaDocker

GitHub

Two-Stage Recommender System

Two-tower retrieval model with FAISS approximate nearest neighbor search, LambdaMART reranking optimized on NDCG, and a feature pipeline serving sub-50ms p99 latency.

PythonPyTorchFAISSLightGBMFastAPIRedisDockerPrometheusGrafana

GitHub

LLM Gateway

Centralized LLM gateway with per-tenant policy enforcement, semantic caching, prompt injection detection, cost tracking per model and team, and unified observability across OpenAI and Anthropic.

PythonFastAPIRedisOpenAIAnthropicPostgreSQLPrometheusGrafanaDocker

GitHub

More projects available on GitHub