Integrating Factual Data for Enhanced Accuracy and Utility
- 4 minutes read - 645 wordsTable of Contents
This article presents an approach that combines web statistics, topic clustering analysis from LLMs, and search term statistics from SEO data to enhance the LLM’s understanding of the real world. By analyzing a website’s content with an LLM, extracting key topics and their relationships, and correlating these with search terms using a graph database, we can reveal patterns that help guide the LLM in a more accurate and relevant direction.
I have linked facts with the implicit knowledge base of an LLM by combining:
- web statistics from a website
- topic clustering analysis from LLM
- search term statistics from SEO data
What is so special about it?
LLMs do not have explicit knowledge, and they are criticized for this. RAG enriches AI prompts with factual data from databases, either with so-called RAG or function calls, to overcome this limitation.
The content of the website was analyzed with an LLM. In two steps:
- Extract Topics and evaluate their SEO relevance.
- Extract Relationships between topics.
The result was loaded into a graph database, allowing the analysis of which topics relate best to a page and identifying topic clusters with the content.
After loading the search statistics into the graph database, it became possible to relate search terms to topic clusters.
A pattern could be identified after analyzing the search term volumes and related terms. It became apparent that the related search terms correlated with topic clusters.
Combining strengths and capabilities
The hidden potential of Language Learning Models (LLMs) can be unlocked by integrating factual data with their implicit knowledge base. This approach combines web statistics, topic clustering analysis from LLMs, and search term statistics from SEO data. We can load this information into a graph database by analyzing a website’s content with an LLM, extracting key topics, and understanding their relationships. This allows us to correlate search terms with topic clusters, revealing patterns that help guide the LLM in a more accurate and relevant direction. This method bridges the gap between the LLM’s text-based learning and real-world understanding, enhancing its accuracy and utility.
- LLMs excel at learning from text, but they lack real-world understanding. This method injects factual data, such as website content and search trends, to bridge the gap.
- We analyze a website’s content with the LLM to identify key topics and how they connect.
- Search term data is then layered on, revealing patterns between what people search for and the website’s content.
- This combined picture helps the LLM understand how its knowledge relates to real-world searches and topics.
The cool part?
- The LLM can be nudged in the right direction, improving its accuracy and making its outputs more relevant.
- Factual data is a guide, helping the LLM navigate the vast sea of information it’s been trained on.
This approach promises to help LLMs become more grounded in the real world, leading to more accurate and helpful outputs.
Integration with tools and systems
The approach can be combined with:
- Vector RAG for similarity search
- Graph RAG for complex analysis
- Function calling
- Agentic workflows for automatic search term discovery
Conclusion
The LLM’s implicit knowledge base contains knowledge about the real world. Combining the LLM’s understanding with factual data from different sources allows us to automatically guide it and evaluate its accuracy.
Analyzing a website’s content with an LLM to extract critical topics and their relationships and then correlating these with search terms using a graph database is an intelligent way to reveal patterns that can guide the LLM. This method bridges the gap between the LLM’s text-based learning and real-world understanding and enhances its accuracy and utility.
Sources:
- https://stackoverflow.blog/2024/02/26/even-llms-need-education-quality-data-makes-llms-overperform/
- https://datasciencecampus.ons.gov.uk/using-large-language-models-llms-to-improve-website-search-experience-with-statschat/
- https://lembergsolutions.com/blog/large-language-model-use-cases-and-implementation-insights
- https://foobar.agency/blog/harnessing-the-power-of-large-language-models-for-data-integration
- https://blog.vespa.ai/improving-text-ranking-with-few-shot-prompting/
- https://ahrefs.com/academy/how-to-use-ahrefs/keywords-explorer/related-terms
- https://insight.factset.com/using-large-language-models-to-converse-with-your-data