LLMS with RAG: Information extraction made easy

Reading time: 4 min

For some time now, ChatGPT has offered a new feature: the uploading of documents or images and the possibility to ask questions afterwards. Thanks to this function, users can now efficiently search for relevant information in large amounts of text, which is extremely valuable in both a professional and academic context. Our software developer Dominic Hückmann explains how the new feature works.

In an increasingly digitalized world where information is abundant, the ability to extract and understand relevant data from documents is becoming more and more important. Because we strive to gain quick and easy access to our data.

This is why content from documents could not simplybe inserted into ChatGPT

The input in the chat window of ChatGPT is subject to a maximum number of characters. For users, this has so far meant splitting long texts into smaller sections so that they can then ask questions about them. A very complex and time-consuming process. Unfortunately, this often led to inaccurate results, as ChatGPT could not correctly interpret the overall context to generate correct answers.

In addition, the content in documents does not always consist of pure text. Tables, images and even formulas can appear in the documents, which cannot simply be copied into the chat.

How can we solve this problem without having to extensively train an AI every time?

One technique that has proven to be effective is RAG.

RAG stands for “Retrieval Augmented Generation” and was introduced by Meta AI in 2020. Using AI-based methods and models, the content is analyzed and only the relevant parts are taken into account in order to generate precise and context-related answers.

How RAG works

Retrieval Augmented Generation (RAG) is a concept that combines the idea of retrieving information and generating new information based on it. In a physical context, you could think of it as a visit to the library. The vector database represents the library in which information is stored in the form of books and the retrieval system represents the librarian. As a user, you enter the library and have a specific question or problem that you would like to solve.

Retrieval Augmented Generation (RAG) explained simply
Retrieval Augmented Generation (RAG) explained simply

R – Retrieval

Instead of going directly to the library shelves yourself to look for books, ask the librarian for help. The librarian, like a retrieval system, specializes in finding the relevant books for you. During the query, you explain the question to him and he uses his knowledge to search for the best books (sources) that could help you with your problem. This is done by taking into account the semantic similarity between the word embeddings of the question and the word embeddings of the text sections.

A – Augmented

The librarian “walks” through the shelves and compiles a list of books (sources) that are relevant to your problem or question. This means that the librarian has the specific books from which to give you the information. If necessary, these can be expanded with additional books or sources to support the generation of the result for the last step.

G – Generation

Once the librarian has provided the books or sources, an AI – such as a language model (LLM) – is used. The AI uses the sources it receives to then generate the information you need to answer your question.

In summary, the retrieval component (the librarian) helps you to find the relevant sources (books), whereas the generation component (the reading and summarizing capabilities) allows you to generate the required information from the selected sources.

In the digital world, it works in a similar way to our example in the library: First, a retrieval system retrieves the relevant information from a database. In the next step, a generation system uses this information to respond to the user’s specific question and generate a meaningful answer.

Before RAG can be used, the document must first be prepared accordingly. One example:

To prepare for the retrieval step in RAG, the uploaded document was first broken down into short text sections and these sections were then converted into numerical vectors, also known as word embeddings.

In this simple representation, words with high similarity are positioned closer together. | Diagram: projector.tensorflow.org

These word embeddings are then stored in a kind of vector database or vector memory in order to find only the most similar text sections for a question.

Not only ChatGPT can do this!

There are a number of other large language models to which RAG (Retrieval Augmented Generation) can be effectively applied. One of these is the European answer to OpenAI, Aleph-Alpha.

Aleph-Alpha has both a model for creating word embeddings and one for generating results (language model).

To illustrate how RAG works, here is a simple example that we created in the Aleph Alpha Playground.

In this case, the text after “### Input:” is the text that was retrieved by “R – Retrieval”. The preparation step is illustrated here by promptly defining it and generating the correct result for the question.

Quite simple really, isn’t it?

Not quite, because: The longer, more complex and unstructured the documents are, the more difficult it is to generate high-quality word embedding and the retrieval step to select relevant sections. However, there are various techniques that can be used depending on the situation to significantly reduce the likelihood of false information in the output.

The author
Editor & copywriter
Huyen uses her flair for language to make texts lively, appealing and effective.
MicrosoftTeams-image (110)