Question 1

Should we use OpenAI / Anthropic / open-source models?

Accepted Answer

Honest answer first. For most use cases, Anthropic Claude or OpenAI GPT-4-class models hit production fastest and are cheapest at low-to-moderate volume. Self-hosted (Llama 3 / Mistral) wins above ~5M tokens/day or when data residency forces it. We design for portability — same prompts, swappable backend.

Question 2

Do you do RAG?

Accepted Answer

Yes — we've shipped RAG systems for customer support, internal search, sales enablement and legal document retrieval. We're opinionated about hybrid retrieval (BM25 + vector + reranking) and aggressive eval before production.

Question 3

How do you handle data residency for Indian clients?

Accepted Answer

For India-only data, we keep everything (vector DB, model inference, logs) in ap-south-1. For LLM calls — Anthropic and OpenAI offer EU/US-only routing; for strict data residency we self-host on Indian GPU infra. We design this into the architecture, not bolt it on.

Question 4

What about hallucinations in production?

Accepted Answer

Mitigated via three layers: retrieval grounding (RAG with source citation), structured outputs with schema validation, and a dedicated eval suite that runs in CI (you cannot deploy a prompt change that regresses the eval set). For high-stakes outputs we add a human-in-the-loop checkpoint.

Question 5

Can you fine-tune models?

Accepted Answer

Yes — LoRA / QLoRA on Llama 3 / Mistral / Phi. We recommend fine-tuning only when prompt engineering + RAG don't close the gap, because fine-tuning is harder to maintain. Roughly 20% of AI engagements end up needing fine-tuning.

AI / ML development for Indian teams.
In production, not in a Jupyter notebook.

Concrete deliverables for India (remote) clients.

Local context, not boilerplate.

Questions India (remote) clients ask first.

Same team, other cities and tracks.

Talk to a senior engineer.