Demo: Pgvector

Terra firma: playing at scale

My last demo was impressive, but pitiable in a lot of ways

For the past year or so, I’ve been thinking about how People can benefit from LLMs and I’ve been noodling out a design for sharing context in an interesting way with friends, coworkers, and customers, but I’ve hardly touched any code that actually does anything interesting with an LLM or any other more typically AI-adjacent construct.

But, just before that, I did a code up quick demo that showed off a simple RAG pipeline. It worked remarkably well but it was dead simple: a lightweight model, a Chroma vector store, and some custom chunking code.

A few weeks later, I built something similar at work to mine Basecamp conversations for support information. Again, though just a demo, the results were pretty badass.

There were some pretty obvious scaling limitations:

Scoping out the future

Because I know I’ll be crossing these bridges at some point, I’ve been eyeing a bunch of answers to the demo’s shortcomings. PostgreSQL is an easy choice if it works since I’ve been using it for years. Caching the model is an obvious upgrade too.

Docling was a bit of an unknown, but it performed admirably as did vLLM in a Docker container, once I’d upgraded my drivers to the 580 version.

This demo

Demo: Pgvector is simply a proof-of-concept that shows off the same sort of RAG pipeline and query solution I’d built in the past, but with some enhancements:

Reflections

Closing thoughts

There were no “Eureka!” moments with this demo, but it was pretty easy to get to where all the moving parts were in place and working and the future is bright:

What’s next?

 

Additional Notes

THERE ARE NO TESTS!! HERE BE DRAGONS! RUN AWAY!