Retrieval Augmented Generation (RAG) is a big deal™. Organisations can use it to supplement generative AI models with proprietary or third-party data, in order to generate more accurate or specific responses from a LLM.
Want to create a generative AI chatbot for customers that “understands” your business, or an LLM informed by private data for internal use cases? You will almost certainly have to use RAG. To spell it out, the R is retrieving your data of choice, the A is augmenting your chosen large language model, the G is generating a response for your users/customers.
“With retrieval-augmented generation, users can essentially have conversations with data repositories, opening up new kinds of experiences. This means the applications for RAG could be multiple times the number of available datasets” as one upbeat NVIDIA blog puts it.
So, is it difficult to do?On paper, for all the jargon around “embedding models” and “vector stores” RAG is not hugely complicated to do. (It can take just five lines of code…) Yet as organisations move to proof-of-concept (POC) builds and beyond, they are increasingly running into complications, from data quality issues to immature toolchains. RAG also has data privacy and cybersecurity professionals widely concerned.
“Using LLMs to build customer service bots with RAG access to your data is not the low-hanging fruit it seems to be. It is, in fact, right in the weak spot of current LLMs - you risk both hallucinations and data exfiltration,” warns Professor Ethan Mollick, who teaches at the University of Pennsylvania.
(Not everyone agrees...)
So what do practitioners and executives need to be aware of? And what is being learned by those building out PoCs at the enterprise coalface?