HTAP databases are cool but lets go further - part 4 of 4
Introduction
So far we have discussed the HTAP database category at length and I gave a bunch of pseudocode for making it accessible to build your own version. But I want to go further and do some things that are weird and perhaps we can find something interesting. So what about incorporating vector DBs into our designs and swapping out the Row or Column in one of these designs and then replacing it with the Vector type? I think experimentation is the root of invention so I hope you will join in on the fun. But first lets look start with one more HTAP databaes.
The above is a Primary Column store with Delta Row store where each component does as the name suggests and stores data in the appropriate format for analytics and transactions, respectively. Very nice. This is a typical architecture used for the famous Hyper database and SAP HANA. These are also HTAP type DBs and find application in fraud detection due to providing high freshness of data, but note the scalability could be better for this design, and is thus a dis-benefit. So what about that weird design I spoke about earlier?
Well perhaps its not so weird after all, its just a Row Store on Disk with IO_URING for improved async IO and a Vector Store that gets Vector data parsed to it after some embedding has been made from the transactional data. Note that the OLTP and the Vector DB are in the same memory address with the Row store on Disk and Vector in a Grid Storage memory unit. We can assume here that freshness is high, and we might have scalability penalties due to this. Also note how we use the Kernal space threads to do our work instead of threading in the User space. The point is not that this is a correct or optimal design but rather a play around so we can see what might be possible, with a little imagination and some architecture. I will not do the pseudocode this time as I already have shown you how you might do that for any system you encounter, and this system is easier to model than the previous you have seen. But I wish to mention that with just a little bit of drawing and some modelling we can perhaps design the next database category for the next 10 to 20 years. Perhaps I am too optimistic but I err on the side of the dreamers who want to mold the world into the image I wish to see!
Analysis and Final thoughts
I hope you have learned something from this blog series. It was done in four parts, partly because it was 4 major architectures and partly because its too much for the reader to digest in a single sitting. Either way I hope you share this anyone who could find it useful.
And finally for my shameless self promotion: if you are interested in these types of databases, I am building a database called KestrelDB. Please check it out! It's written in mostly Rust. If you want to learn about databases in general, follow me on Linkedin and checkout out my blogs. This blog space is my contribution to my favourite topic - Concurrency!
I should also mention that KestrelDB is no longer an HTAP: I got feedback that its not ideal to build a HTAP as its not as useful as it first appears. Thus I am investigating various other database architectures for the purposes of creating an innovative design that can be useful to the community. If you have any tips, let me know. Until next time, Happy Databasing and thank you for reading !! :)
Comments
Post a Comment