Are there any examples for when RAG powered by vectorsearch works really well?
I tried best practices like having the llm formulate an answer and using the answer for the search (instead of the question) and trying different chunk sizes and so on but never got it to work in a way that I would consider the result as "good".
Maybe it was because of the type of data or the capabilities of the model at the time (GPT 3.5 and GPT 4)?
By now context windows with some models are large enough to fit lots of context directly into the prompt which is easier to do and yields better results. It is way more costly but cost is going down fast so I wonder what this means for RAG + vectorsearch going forward.
We built a RAG system for one of our clients in the aviation industry. >20m technical support messages and associated answers / documentation, and we're seeing between 60-80% recall for top 3 documents when testing. Definitely pays off to use as much of the structure you'll find in the data, plus combining multiple strategies (knowledge graph for structured data, text embeddings across data types, filtering and boosting based on experts experience, etc). The baseline pure RAG-only approach was under 25% recall.
Instead of performing rag on the (vectorised) raw source texts, we create representations of elements/"context clusters" contained within the source, which are then vectorised and ranked. That's all I can disclose, hope that helps.
Thanks for your message. I should say that giving your comment to GPT-4, with a request for a solution architecture that could produce good results based on the comment, produced a very detailed, fascinating solution.
https://chat.openai.com/share/435a3855-bf02-4791-97b3-4531b8...
Maybe, but it expanded on the idea in the vague comment and together introduced me to the idea of embedding each sentence and then clustering the sentences, then taking the centroid of the sentences as the embedding to index/search against. I had not thought of doing that before.
After seeing raw source text performance, I agree that representational learning of higher-level semantic "context clusters" as you say seems like an interesting direction.
I‘m using it in for an internal application and the results so far are amazing. Considering it was hacked together in a few hours.
It helps a lot with discovery. We have some large PDFs and also a large amount of smaller PDFs. Simply asking a question, getting an answer with the exact location in the pdf is really helpful.
From our experience simple RAG is often not that helpful as the questions itself are not represented in the vector space (except you use an FAQ dataset as input). Either a preprocessing by an LLM or specific context handling needs to be done.
I tried best practices like having the llm formulate an answer and using the answer for the search (instead of the question) and trying different chunk sizes and so on but never got it to work in a way that I would consider the result as "good".
Maybe it was because of the type of data or the capabilities of the model at the time (GPT 3.5 and GPT 4)?
By now context windows with some models are large enough to fit lots of context directly into the prompt which is easier to do and yields better results. It is way more costly but cost is going down fast so I wonder what this means for RAG + vectorsearch going forward.
Where does it shine?