Case study · Production AI

A RAG that answers farmers
in their own dialect.

An agriculture advisory assistant for Egypt. Users type Egyptian Arabic on their phones; the knowledge base is written in Modern Standard Arabic. On day one, retrieval found almost nothing. This is how it became something I'm willing to put in front of farmers.

Sector

Agriculture advisory · Egypt

My role

AI Engineer — pipeline end to end

Stack

RAG · pgvector · Postgres FTS · GPT-4o rewriting

Status

In production

01The problem

RAG looks like magic in English demos. Drop in PDFs, retrieve, prompt the model, done. This one had to answer questions like "how do I water my courgette in summer" — typed as ازاى اروى الكوسة فى الصيف؟ — against documentation written in formal Arabic. The same word arrives with different alif shapes, with or without diacritics, with stray RTL marks from a phone keyboard. Dialect sits on top of all of that.

Most embedding models, even multilingual ones, can't tell that the dialect question and the MSA documentation mean the same thing. Day-one retrieval came back almost empty.

02The constraint that shaped everything

In agriculture, a wrong pesticide dose can kill a crop — or worse. The system could not be allowed to guess confidently. That single constraint drove the confidence architecture: every answer is scored twice, and anything uncertain goes to a human expert instead of the model.

"The 'I don't know, let me get someone' path isn't a fallback. It's a feature."

03The pipeline

Normalize the query — never the docs

Strip diacritics, tatweel, RTL marks; unify alif variants. Ta-marbuta stays — it carries meaning in technical Arabic. The knowledge base is left untouched so the exact words farmers search for survive.

Rewrite dialect into MSA — with frozen nouns

A few-shot GPT-4o call converts Egyptian colloquial to MSA, explicitly told not to touch crop names, pesticide names, doses, dates, units. Rewrites are cached; on failure the system falls back to the original query and merges both result sets.

Hybrid retrieval with the Arabic tokenizer

pgvector for semantics, Postgres full-text for keywords — merged by node with max score. Switching Postgres FTS to the Arabic config was a one-line change that did more for recall than any prompt edit.

Recalibrated similarity thresholds

English-tuned 0.7 cosine thresholds rejected nearly every correct Arabic match (real values: 0.20–0.45). A 20-query benchmark across Arabic, French, and English set an empirical floor and ceiling, remapped to a 0–1 confidence.

Two-signal confidence, three outcomes

The LLM reports its own confidence; retrieval similarity is the safety net. High → answer. Medium → answer with disclaimer. Low → escalate to a human agronomist.

Every expert answer becomes part of the brain

Resolved escalations are embedded back into the knowledge base as single nodes, tagged and kept in the user's own language and register. The system grows its dialect knowledge one escalation at a time.

04The outcome

Retrieval went from almost nothing on day one to a system in production in front of real farmers — without a single Arabic-specific embedding model. The wins came from a pipeline of cheap, honest steps, not a magic model. And because of the feedback loop, most of the dialect knowledge in the retriever today wasn't there at launch: the system taught itself.

~0 → usable

Day-one retrieval recall vs. production — measured on a 20-query multilingual benchmark

3 paths

Answer / answer-with-disclaimer / human expert — no confident guessing on crop-critical advice

6 fixes

Normalization, dialect rewriting, hybrid search, recalibration, dual confidence, feedback loop

Self-growing

Every expert escalation is embedded back — the knowledge base improves with use

Want the engineering detail — the actual normalization rules, the rewriter prompt design, the threshold numbers? It's all in the long-form write-up: Arabic Broke My RAG. Here's What Saved It.

Have a messy, non-English, real-world RAG problem? Let's talk ↗