Rate Limiters are everywhere: Distributed and Local

What is a rate limiter

The Rate Limiter (RL) is a middleware (think Redis) or some software strategy to limit access of clients to resources that are limited or for which we want to ensure that fair usage and access is encouraged. In the context of websites the approach also provides security as it limits access to adversarial clients who wish to abuse the system. This approach also finds use in the context of databases - which I will give details on in this post. Note: adversaries are a superset of the user set (they can do what users do plus other things we dont normally want to allow) who behave badly, and could be external actors to the system or registered members.

Rate Limiters in Distributed Systems

The poster child for rate limiting - in my opinion - is Redis as it is the first thing that comes to mind from my background and experience. The reason is that they have made the use of rate limiting simple via their api. You should check out Redis if you have not already.

Why should you care about RL?

In the context of websites and Distributed Systems, the issue is that security is paramount, and an approach is needed to prohibit or limit access to Adversaries who wish to bring the system down or abuse a feature of the system, to the detriment of other users. The list of security issues that are mitigated with the help of - among others - RL is given below:

  • DDOS attacks - a primary attack in online systems. Limiting access is an effective strategy.

  • Data Scraping - getting free data has been an annoying thorn in the side of many companies who wish to sell data to users or developers. By limiting the amount of data accessed from a single IP address, the user/developer is forced to pay for the premium data access experience
  • What types of RL exist?

  • User Based - this strategy tries to track the same users and when the user shows Adversarial behaviour, then restrictions on this user are applied to limit access and discourage abuse of the system by said user. The problem is of course tracking the user is hard and requires knowing the user across different access sessions, so this RL strategy is not without issue.

  • Application Based - here it is required to know what type of application is using the system or making requests. This type of strategy is used for Data scrapers who use the application in a certain way, and can be quite effective against applications that are obviously not registered users of the system. The problem is that a registered user could still be an adversary and so how does the system deal with this case? Well of course just limit access, but the user wont be happy. The point being this strategy also has issues in this case.
  • Some honarable mentions for the IP Based, Leaky Bucket strategy, Token Bucket strategy and Fixed Window counter strategy, all of which are valid solutions for the problem of RL. A full analysis of these can be found on the Redis webpage for those interested.



    Rate Limiters in Local Systems - in TiKV and SparrowDB

    RL strategies (though perhaps not the exact same strategies as for distributed systems) are equally effective in systems that run on a single machine and have nothing to do with adversaries who are abusing the system. A legitimate case for RL is found in databases where certain operations are more important than others. For this reason we might wish to limit access to resources when an operation is of low priority to the application at a certain moment in time. The converse is also true, where more generous access to our resources should be given to high priority operations. Here we go with the Database talk again ... Say we have a system that allows reads for analytic purposes and writes for transaction purposes, and the system is under load from many analytic accesses due to some system requirement, the question is which operation is more important? The analytic operation would be priority in this case, thus we should flag within the system that an operation that is analytic in nature needs to be given more resources (think faster speeds without limiting IO) to perform the action within the system. Again the same approach could be applied to a write heavy system under load, and for which analytics is not priority at the moment, so then flag the operation that is write in nature as higher priority and ensure it gets more resources (to complete faster do not limit IO).
    Ok with that out the way, so what would it look like in code? I will give pseudocode here but the actual implementation details can be seen here in Rust at TiKV's github repository. I only supply the parts here for the discussion, and its expressed in english/Fake TLA+ so that we can describe what we want without details, ie. I am being deliberately abstract here.

    
    
    type RateLimiter {
        var ofType IoRateLimitMode,
        map_of_priorities ofType [TYPE_OF_IO -> Nat] , // a mapping from type of operation to a Natural number
        throughput_limiter ofType RateLimiterImplemation,
    }
    
    impl RateLimiter {
        init(mode: RateLimitMode) { //construct and set prio on the operations
    		for (givenOps: {SetOfOperations}{
    			MAP (operations to their priority score as a number in Natural number set)
    			USE  (rate limiter to limit the rate of operations based on the map and the givenOps above)
            }
        }
    	
    	// Action to set or change the priority of a task on the fly
        pub fn set_rate_limit_for_operation(rate: Nat) {
            set (this->Objects Rate limit and ensure that the next operation is done at correct rate)
        }
    }
    
    

    Please note that the present pseudocode does not begin to describe all the details that can be learned from the original at TiKV, but I am distilling this for the busy reader who wants to glean some information for their project.


    In the definition for the rate limiter above we define the rate limiter mode as a variable, then define a mapping from types of IO to the set of natural numbers, and use a predined rate limiter implementation for the specific type of RL we wish to perform. The constructor called init in this case makes the RL, then we also define a setter method that can change the RL for a specific operation on the fly. An additional detail not given here is of course to define a getter for the rate limiter that can allow us to get the RL properties for a specific operation at runtime. With this level of detail it does not matter the language you use to program, the intent is clear.



    As an aside note the same approach used by TiKV is affectionately adopted for the SparrowDB system that I am building. I chose this approach as it makes a lot of sense for the types of operations I wish to perform with SparrowDB. You can find the code for SparrowDB's implementation of rate limiting at the repository here under the file system implementation. I also want to take this opportunity to thank the Redis docs team who always provides great documentation in their glossary and guides. Go check them out!

    Comments

    Popular posts from this blog

    HTAP Databases and the processes that keep them going - Part 1 of 4

    Cache and Buffer Pool Manager

    HTAP databases - lets get distributed - Part 2 of 4