The Eight Month Reality

The CEO asks how long it would take to replace the vendor system. Engineering says three months. Product says six. The CTO stays quiet. If you've never built production RAG for high-stakes domains, you think that range is reasonable. It isn't.

Abstract timeline visualization showing parallel engineering tracks and project phases with glowing milestones

The board meeting goes like this.

The CEO asks how long it would take to replace the vendor system. The engineering lead says three months. The product manager says six. The CTO stays quiet.

Everyone looks at the CTO.

If you’ve never built production RAG for high-stakes domains, you think three to six months is reasonable. You’ve built CRUD apps. Integrated APIs. How hard can search and summarization actually be?

This is the replacement cost fallacy. It destroys engineering roadmaps and burns through runway.

The Clean Slate Illusion

When you estimate a rebuild, you imagine starting fresh. No legacy code. No technical debt. Modern frameworks. Best practices.

That mental model is dangerously incomplete.

You’re not rebuilding features. You’re rebuilding institutional knowledge. Every chunking decision in the legacy system represents a tested hypothesis. Every prompt represents a refined interaction pattern. Every ranking threshold represents a resolved tradeoff between precision and recall.

When you start fresh, you lose all of that history. You have to relearn it through experimentation. And those experimentation cycles can’t be compressed.

You’ll try five chunking strategies before finding the right balance between context preservation and semantic granularity. You’ll benchmark three embedding models. You’ll test hybrid retrieval approaches. You’ll tune reranking thresholds. Each experiment takes days to implement and weeks to evaluate.

These steps are not parallelizable within a single domain. You cannot skip them.

The Two Engineer Problem

You can’t build this with one senior engineer.

You need two distinct profiles working in parallel. A Scientist and an Engineer.

The Scientist owns encoder experimentation, chunking strategy, retrieval quality optimization, reranking logic, prompt design, and evaluation methodology. This person focuses on accuracy and empirical performance. They’re comfortable running experiments that fail. They iterate on prompts that produce inconsistent results.

The Engineer owns ingestion pipelines, incremental processing, vector storage, environment setup, and artifact versioning. They build systems that fail predictably. When reindexing runs, it doesn’t corrupt the production corpus.

If you assign both roles to one person, context switching kills velocity. If you fill the Scientist role with a mid-level engineer, experimentation takes twice as long and yields worse results. If you fill the Engineer role with a researcher, the infrastructure becomes brittle.

These are not interchangeable roles. Treating them as such is one of the most common and expensive mistakes in AI platform development.

The Hidden Phase Five

Most project plans have four phases. Architecture. Data ingestion. Retrieval implementation. Production hardening.

Production RAG requires a fifth phase that doesn’t fit into quarterly planning.

Evaluation and final tuning.

This phase runs six to eight weeks and happens after the system appears to work. During this phase, you replace intuition with measurable performance. You create evaluation suites. You implement LLM-as-judge frameworks. You generate golden datasets where possible.

You discover that retrieval accuracy drops on edge cases involving rare disease interactions. You realize your evidence sufficiency scoring is too aggressive for certain query types. You adjust. You test again.

This phase is not bug fixing. It’s quality calibration. It determines whether clinicians trust the system or ignore it.

Skipping it gets you to launch faster. It also gets you to abandonment faster.

Parallel Execution

You can’t run these phases sequentially and hit a reasonable timeline.

While the Engineer builds ingestion pipeline and vector storage infrastructure, the Scientist experiments with chunking strategies and encoder selection on sample data. These activities look independent but require synchronization.

The Scientist can’t finalize chunk sizes until the Engineer confirms the vector database can handle the segment configuration. The Engineer can’t finalize indexing logic until the Scientist determines whether metadata filtering happens at the database or application layer.

These are real dependencies. They require explicit synchronization points. Architectural decision milestones. Evaluation checkpoints. Final tuning stages.

Without them, you build disconnected components that don’t integrate.

The Seniority Tax

Junior engineers can build RAG demos. They cannot build production medical RAG.

This isn’t a knock on junior engineers. It’s a domain reality.

When you lose continuity, you lose more than velocity. You lose decision rationale. Why this embedding model? Why this chunk size? Why this reranking threshold? If the person who made those decisions leaves, the team second-guesses everything, or worse, follows obsolete patterns without understanding the original constraints.

Any loss of continuity or reduction in seniority materially impacts the timeline. That’s not HR speak. It’s architectural reality.

The Pricing Floor

When you’re evaluating whether to buy or build, you need a number that represents the floor. Not the purchase price. The rebuild cost.

This answers a specific question: what would it realistically cost to rebuild this system today, under disciplined scope, using senior talent and modern tooling?

This isn’t market valuation. It doesn’t capture strategic upside or time-to-market benefits. It’s simply the economic baseline below which building makes more sense than buying.

For a CTO, this number is a negotiation anchor. It prevents the board from treating the vendor price as arbitrary. It demonstrates that walking away isn’t a choice between buying and saving money. It’s a choice between buying and spending eight months of engineering time with no guarantee of equivalence.

The Strategic Choice

Three months is a fantasy. Six months is optimistic. Eight months is realistic.

This is the honest timeline for rebuilding a system with controlled behavior, evidence validation, and production reliability in a high-stakes clinical environment.

You can argue for the shorter timeline. Hire faster. Cut scope. Skip the evaluation phase. These decisions feel like leadership. They’re actually risk accumulation.

When the CEO asks how long the rebuild takes, the CTO has to answer with full context. Not just the months. The team composition. The experimentation cycles. The parallel execution requirements. The seniority assumptions.

Eight months isn’t a padded estimate. It’s the honest cost of building something clinicians can actually trust.

Anything less is a different product entirely.

Architecture Review

Is your AI system ready for patient data?

Book an architecture review — we'll map your system end-to-end, identify every PHI exposure point, and give you a prioritized plan to fix, build, or scale with confidence.