跳转到主要内容

category

This article is part of a larger series on using large language models in practice. In the previous post, we fine-tuned Mistral-7b-Instruct to respond to YouTube comments using QLoRA. Although the fine-tuned model successfully captured my style when responding to viewer feedback, its responses to technical questions didn’t match my explanations. Here, I’ll discuss how we can improve LLM performance using retrieval augmented generation (i.e. RAG).

The original RAG system. Image from Canva.

Large language models (LLMs) have demonstrated an impressive ability to store and deploy vast knowledge in response to user queries. While this has enabled the creation of powerful AI systems like ChatGPT, compressing world knowledge in this way has two key limitations.

First, an LLM’s knowledge is static, i.e., not updated as new information becomes available. Second, LLMs may have an insufficient “understanding” of niche and specialized information that was not prominent in their training data. These limitations can result in undesirable (and even fictional) model responses to user queries.

One way we can mitigate these limitations is to augment a model via a specialized and mutable knowledge base, e.g., customer FAQs, software documentation, or product catalogs. This enables the creation of more…