Posts

Showing posts from February, 2025

HTAP databases are cool but lets go further - part 4 of 4

Image
Introduction So far we have discussed the HTAP database category at length and I gave a bunch of pseudocode for making it accessible to build your own version. But I want to go further and do some things that are weird and perhaps we can find something interesting. So what about incorporating vector DBs into our designs and swapping out the Row or Column in one of these designs and then replacing it with the Vector type? I think experimentation is the root of invention so I hope you will join in on the fun. But first lets look start with one more HTAP databaes. The above is a Primary Column store with Delta Row store where each component does as the name suggests and stores data in the appropriate format for analytics and transactions, respectively. Very nice. This is a typical architecture used for the famous Hyper database and SAP HANA. These are also HTAP type DBs and find application in fraud detection due to providing high freshness of data, but note the sca...

HTAP database - Distributed column storage - Part 3 of 4

Image
Introduction This is the 3rd in a 4-part series on HTAP (Hybrid Transaction and Analytical Processing) databases. In the previous blog post ( link opens in separate window ) in this series I described the architecture known as "Distributed row store with a replicated column store" and discussed a simple design of such a system and the concurrency that would be required when designing it. If you like distributed systems go check that one out as well. I will now describe a separate system architecture for HTAP known as "Distributed column storage with Row store on a disk" , see the image below for the architecture. This blog will cover weaknesses and strengths in this architecture, some notable implementations in production systems and some pseudocode model in fake Pluscal to describe what I think it should look like. Again this is my mental model, you are encouraged to make your own and mine is purposely high level. As an aside, Pluscal is a math based language f...

HTAP databases - lets get distributed - Part 2 of 4

Image
Introduction This is the second in a 4-part series on HTAP (Hybrid Transaction and Analytical Processing) databases. In the previous blog post ( link opens in separate window ) in this series I described the architecture known as "Primary-row store with an in-memory column store" and discussed a simple design of such a system and the concurrency that would be required when designing it. I will now describe a separate system architecture for HTAP known as "Distributed row store with a replicated column store" , see the image below for the architecture. This blog will cover weaknesses and strengths in this architecture, some notable implemementations in production systems and some pseudocode model in fake Pluscal to describe what I think it should look like. Again this is my mental model, you are encouraged to make your own and mine is purposely high level. As an aside, Pluscal is a math based language for describing concurrent systems. I use it as its more genera...

HTAP Databases and the processes that keep them going - Part 1 of 4

Image
Introduction This is the first in a 4-part series on HTAP (Hybrid Transaction and Analytical Processing) databases. By now you know I am nuts about concurrency and its use in OSs, DBs, calculations and everything in between. I will cover a few weaknesses and strengths in these DB systems, some notable implemementations in production systems and then write some fake version of Pluscal to describe what I think it should look like in an abstract code implementation. Note I use math as its the most powerful language I know and pluscal as its the best tool for describing these types of systems where concurrency is everywhere. So lets begin with a drawing of the Primary-row store with an in-memory column store . Please note this diagram is adapted from the paper "HTAP databases: A Survey", by Zhang et al. 2024. The contents of this blog are inspired by insights from this brilliant paper as well. However I will not go into the depth that the paper provides. The reader ...