
Point-in-time feature store with online/offline consistency and training-serving skew prevention.
Two-Tower retrieval, FAISS candidate generation, and LambdaMART reranking for low-latency recommendations.
BM25 + dense retrieval with cross-encoder reranking and evaluation pipelines for grounded answers.
Experiment tracking, registry-based promotion, canary serving, drift detection, and automated retraining.
Compares Full FT, LoRA, and QLoRA across training cost, GPU memory, latency, and task performance.
LLM agent runtime with rate limiting, idempotency, audit logging, token metering, and streaming.
More projects available on GitHub