

LARGE LANGUAGE MODEL PLAYGROUND (PT 1)
Silas Liu - Sep. 22, 2024
Updated: Sep. 29, 2024
Large Language Models, Graphs
Recently, the focus on LLMs has dominated the tech world. As a Data Science professional, I believe it is essencial to stay up-to-date with the latest techniques. With that in mind, I started this playground project, where I explore and implement functionalities that leverage LLMs to solve practical problems and create useful tools for everyday tasks.
​
One of the most common challenges I have observed is the need to summarize lengthy PDFs, extract specific details, or search for precise information. While tools for extracting text from PDFs have been around for a long time, they often miss important information embedded in figures and charts. To address this, I developed a solution that utilizes multimodal LLMs to extract both text and images from PDFs. The extracted data is then stored in a Graph Database, enriched with metadata, allowing the use of RAG techniques to efficiently retrieve information.

With the rapid advancement of LLM techniques, these tools are becoming increasingly effective in performing various NLP tasks, especially summarization, keyword extraction, and information retrieval. With this in mind, I developed a system that combines practicality with innovation, employing a multimodal LLM to extract both text and images from PDFs. This system aims to address a common problem: dealing with large PDFs when there is no time to read through or locate important information.
​
The first step was to implement a multimodal LLM, a model capable of processing and understanding different types of data, such as text and images. This significantly broadens the scope of data that can be analyzed, enabling a more comprehensive extraction of information. By integrating multimodal capabilities, the system is not only able to read text from PDFs but also interpret visual elements, like images, charts, and diagrams. This opens up possibilities for deeper insights and more robust content understanding, making the model much more versatile in real-world applications where data types are varied.

Next step was to storage all the data in a Graph Database. Graph was chosen for its ability to represent unstructured data in a way that reveals intricate relationships between entities. Unlike traditional relational databases, which rely on predefined schemas, graph databases are highly flexible and excel at modeling complex networks. They can uncover hidden patterns, such as communities or clusters of related entities, identify shortest paths between points, and even detect subtle connections that might otherwise go unnoticed. This makes graph databases ideal for scenarios where understanding the relationships and interactions within a data ecosystem is just as important as the data itself.

Next, I developed a Retrieval-Augmented Generation (RAG) system to enhance the efficiency of information retrieval. The final LLM model intelligently decides when to apply RAG based on the user's query. It autonomously determines which fields to search, such as article titles or specific pages, optimizing the retrieval process according to the user’s needs. When RAG is not necessary, the system seamlessly falls back to using standard vector embeddings for simpler queries. This adaptive approach allows the model to handle both straightforward and complex questions, even cross-referencing data between different PDFs for deeper insights.

I made some improvements, both in the RAG system and in the metadata retrieval. The system now captures the page sequence of documents, improving the retrieval of texts spread across consecutive pages. I also implemented the system to show which sources were used for the answer.
In the images below, I show how two articles are represented in the Graph and a search for information across the two sources.


Another improvement was the implementation of the chat memory. Now the user can ask questions about previous context and the LLM is capable of searching for relevant information on the memory. In following example I asked about previous papers and the LLM is able to answer correctly.
