Concurrency Corner

Posts

HTAP database - Distributed column storage - Part 3 of 4

February 08, 2025

Introduction This is the 3rd in a 4-part series on HTAP (Hybrid Transaction and Analytical Processing) databases. In the previous blog post ( link opens in separate window ) in this series I described the architecture known as "Distributed row store with a replicated column store" and discussed a simple design of such a system and the concurrency that would be required when designing it. If you like distributed systems go check that one out as well. I will now describe a separate system architecture for HTAP known as "Distributed column storage with Row store on a disk" , see the image below for the architecture. This blog will cover weaknesses and strengths in this architecture, some notable implementations in production systems and some pseudocode model in fake Pluscal to describe what I think it should look like. Again this is my mental model, you are encouraged to make your own and mine is purposely high level. As an aside, Pluscal is a math based language f...

HTAP databases - lets get distributed - Part 2 of 4

February 05, 2025

Introduction This is the second in a 4-part series on HTAP (Hybrid Transaction and Analytical Processing) databases. In the previous blog post ( link opens in separate window ) in this series I described the architecture known as "Primary-row store with an in-memory column store" and discussed a simple design of such a system and the concurrency that would be required when designing it. I will now describe a separate system architecture for HTAP known as "Distributed row store with a replicated column store" , see the image below for the architecture. This blog will cover weaknesses and strengths in this architecture, some notable implemementations in production systems and some pseudocode model in fake Pluscal to describe what I think it should look like. Again this is my mental model, you are encouraged to make your own and mine is purposely high level. As an aside, Pluscal is a math based language for describing concurrent systems. I use it as its more genera...

HTAP Databases and the processes that keep them going - Part 1 of 4

February 02, 2025

Introduction This is the first in a 4-part series on HTAP (Hybrid Transaction and Analytical Processing) databases. By now you know I am nuts about concurrency and its use in OSs, DBs, calculations and everything in between. I will cover a few weaknesses and strengths in these DB systems, some notable implemementations in production systems and then write some fake version of Pluscal to describe what I think it should look like in an abstract code implementation. Note I use math as its the most powerful language I know and pluscal as its the best tool for describing these types of systems where concurrency is everywhere. So lets begin with a drawing of the Primary-row store with an in-memory column store . Please note this diagram is adapted from the paper "HTAP databases: A Survey", by Zhang et al. 2024. The contents of this blog are inspired by insights from this brilliant paper as well. However I will not go into the depth that the paper provides. The reader ...

Rate Limiters are everywhere: Distributed and Local

January 19, 2025

What is a rate limiter The Rate Limiter (RL) is a middleware (think Redis) or some software strategy to limit access of clients to resources that are limited or for which we want to ensure that fair usage and access is encouraged. In the context of websites the approach also provides security as it limits access to adversarial clients who wish to abuse the system. This approach also finds use in the context of databases - which I will give details on in this post. Note: adversaries are a superset of the user set (they can do what users do plus other things we dont normally want to allow) who behave badly, and could be external actors to the system or registered members. Rate Limiters in Distributed Systems The poster child for rate limiting - in my opinion - is Redis as it is the first thing that comes to mind from my background and experience. The reason is that they have made the use of rate limiting simple via their api. You should check out Redis if you have not alre...

Cache and Buffer Pool Manager

January 12, 2025

The present post is about Caches and the Buffer Pool Manager for a database named SparrowDB that I am building. The programming language is Rust. The DB is a SQL relational database I wish to take from scratch to distributed. If you wish to come along for the ride, I will post weekly on my thoughts, struggles and completed work. Disclaimer: A lot of what I know and some parts of this blog comes from lessons by Andy Pavlo from CMU who made a youtube course, and I highly recommend you learn from him if you desire to work with database internals. Cache Policies : There are several types of caches we could consider eg. course-grained and fine-grained, but we will only consider the former category. Furthermore we will only use LRU-K in this project, but I wish to give an honourable mention to frequency based cache policies such as Q2 and LFU. The LRU-K is an improvement on the LRU policy as it avoids thrashing in the database cache where we evict a page only to read it again in t...

Workers - what can they do and what do they look like? Lets make a model in Go.

November 04, 2024

Today I want to talk about work in software. So naturally/unnaturally I gravitate towards ... orchestrators, for reasons that will become clear shortly. Borg and K8's come to mind. What do these have in common and what do they contain that makes them indispensable to modern computing. I think its the ability to make distributed systems so easy to deploy (K8s). Perhaps also that it hides much of the complexity of failures, and writing software to handle these cases. In my opinion what I love about them is the concept of work. It's the same reason I like the MapReduce system, and the concept of doing work as a whole resonates with me. No this isn't an AI assistant writing for me, I use the word resonate because I like that word. So in this post I will model a Worker, and what does that actually mean for the system, ie. What do Workers do? Manager (M) , Workers (W) and KVstores (KV) are the system we will think about today. Lets start with the following: Worker comes to work...

Search This Blog