The missing piece in the AI application Stack - RAG Index Sharing

Alex Valente
Written by:
Alex Valente
The missing piece in the AI application Stack - RAG Index Sharing

I was never in love with the previous AI application stack. Models were highly specific and often focused on classification use cases: banks trained models with massive amounts of transaction data to improve fraud detection. One large valuable use case, one trained model. Models required large amounts of data to get started, and there was a lot of upfront costs to testing and learning.

Everything changed with LLMs. Models went from highly specific to highly generalisable, and developers built a series of frameworks that enabled the creation of impressive applications.

No more massive data science teams, no more big training bills, no more complex data labelling.

Langchain and LlamaIndex have quickly built application frameworks that enable developers to build solutions. And people built solutions. We saw an explosion of applications attempting to automate and improve all types of functions from back office jobs at enterprise companies to consumer applications that helped people plan holidays.

As applications attempted to achieve more complex goals, it required additional context, and that context was supplied via a process called Retrieval Augmented Generation (RAG). The way it worked was simple: when the LLM needed specific information, do a search over some data and then return that data that is relevant to the LLM query and get it to analyse it before making its next decision.

RAG became an easy way to inject additional information into an LLM, no training required. When people wanted to add company specific information, the AI engineering community found it was even better than fine tuning. It was no wonder that it caused an explosion in all of the constituent pieces of the RAG pipeline, from embeddings models, to vector data stores.

How Redactive solves for different elements of the RAG stack.

It's not perfect though: RAG isn't cheap. You need to take data, cut it into pieces (chunking), convert the data into embeddings, store the embeddings in an index and then run queries against that index. Once people had created RAG systems that worked, they wanted to share them in order to get more value out of the work they had done. Accordingly, LlamaIndex has attempted to increase the utility of RAG indexes by making them sharable across “index-networks”.

Index networks are “a library extension makes it possible to build a network of RAGs built on top of external data sources and supplied by external actors”. Essentially this is an interesting method of taking expensive data indexes and reducing their ‘per query cost’ by making them more available in more contexts. Instead of chunking and embedding Wikipedia and storing it in an index, just point to the Wikipedia index that someone already created. This takes application based indexes and makes Shareable Indexes.

Querying a network of knowledge with llama-index-networks — LlamaIndex, Data Framework for LLM Applications

It becomes even more valuable when data is siloed on a per user basis. Say for example you want to use a number of applications that require context about your Facebook and Twitter accounts. You can store one copy of your social index and point numerous applications to it. Now they know who you are and you control the data.

There is a bit missing here though.

The vast majority of valuable software applications exist in the B2B space. Enterprises are investing heavily into AI and are have a big need for getting their organisational context into their AI applications. Similarly, organisations can’t index information for every AI application they are looking to use. We need to be able to create an organisational index that can be pointed at a variety applications.

Organisational indexes are far more complex than user indexes or data source indexes. They have a wide variety of information with a variety of permissions. Some users can see some documents and some users can’t. To solve for companies, we need to build Permission Aware Indexes.

How permissions and time sensitivity overlap to create different use cases

Some people have tried to solve this problem previously, Glean for example, believe that powering RAG is their future. However, with a traditional enterprise sales motion, they are hampered from supporting the vast majority of new applications.

The product positioning of Glean, and enterprise search players alike, mean that developers are blocked from utilising the product to solve RAG problems. These types of products cannot be used as components to developers applications because to function it requires the end user to be an enterprise search customer. Permission Aware Indexes that aren’t easily created, shared and accessed reduce their value to developers.

There needs to be an easy way to build a Permission Aware Sharable Index and connect it to applications. There needs to be a source of truth where applications can go to find these indexes.

Enter: Redactive.

Redactive looks to build these Permission Aware Sharable Indexes, whenever it's needed. No enterprise sale required. If you are building an application and you need to connect to customers' data, Redactive will either point you in the direction of the Permission Aware Sharable Indexes or start indexing the customer information for you.

By embracing the potential of indexes that are both permission-aware and shareable, we're opening up new possibilities for developers. This approach allows for a seamless integration of companies with the latest in AI technology, from agents and chatbots to sophisticated automation and AI applications. Our commitment is to facilitate secure and efficient connections between B2B SaaS tools and AI applications, ensuring every interaction is empowered with permissioned data. At Redactive we're enthusiastic about the future of AI application development and the role permissioned data plays in this landscape.