Skip to content

Rate Limiter

A Rate Limiter caps how many requests a client can make inside a time window. RateLimiter is algorithm-agnostic. Pass an algorithm config to choose semantics. Everything else (API, RateLimitResult, backend registry, fail_open) is shared.

Why

  • Protect services from overload and abuse.
  • Enforce fair usage across clients.
  • Produce HTTP 429 responses with RFC 9211 RateLimit-* or legacy X-RateLimit-* headers.

Construction

For day-to-day Python code, use the factory classmethods. They keep the call site explicit and short:

from grelmicro.resilience import RateLimiter

auth_limiter = RateLimiter.gcra("auth", limit=5, window=60)
api_limiter = RateLimiter.token_bucket(
    "api",
    capacity=100,
    refill_rate=10,
)

Use RateLimiter.from_config(name, config) when the algorithm config already comes from a settings tree, YAML, or another declarative source.

from grelmicro.resilience import GCRAConfig, RateLimiter

cfg = GCRAConfig(limit=5, window=60)
limiter = RateLimiter.from_config("auth", cfg)

RateLimiter intentionally does not flatten both algorithms into one generic kwargs constructor. Token bucket and GCRA have different parameter vocabularies, and keeping one explicit entry point per behaviour makes the public API easier to read.

Choosing an algorithm

Pick the algorithm whose behaviour matches how operators describe the limit in runbooks and API docs. Both algorithms share the same Python API, backends, and RateLimitResult shape, so you can switch later.

Decision guide

  1. Are you throttling an HTTP API with RateLimit-* or X-RateLimit-* headers? Use GCRAConfig. Its sliding-window model matches the IETF RateLimit headers directly and produces precise limit, remaining, and reset_after values.
  2. Do you want "allow a burst of N, then 1 per second sustained"? Use TokenBucketConfig. The capacity and refill_rate parameters describe exactly that.
  3. Does a client need to send occasional spikes above the average rate? Use TokenBucketConfig. The capacity absorbs the spike. GCRA can allow bursts too, but the configuration is less direct.
  4. Did you search for "leaky bucket"? Use GCRAConfig. It is the leaky-bucket-as-meter formulation.

Side-by-side

GCRAConfig TokenBucketConfig
Mental model "N requests per sliding T-second window" "A bucket holding N tokens that refills at R tokens/sec"
Parameters limit, window capacity, refill_rate
Burst behaviour Up to limit requests if the window is empty Up to capacity if the bucket is full
Sustained rate limit / window requests per second refill_rate tokens per second
HTTP header fit Strong. reset_after is a true window boundary and maps directly to RateLimit-Reset. Workable. retry_after is the time until the next token (continuous refill), not a window reset.

Performance

Both algorithms run in O(1) per operation. End-to-end latency is dominated by the backend: a Redis round-trip costs far more than the algorithm itself. Per-key memory on the Memory backend differs by about 15 MB per million keys. Choose based on behaviour, not compute cost.

Worked scenarios

  • "Limit each user to 100 API calls per minute." Use GCRAConfig(limit=100, window=60). The sliding window matches the natural description, and RateLimitResult.reset_after feeds directly into RateLimit-Reset.
  • "Allow a burst of 20 uploads, then 2 per second." Use TokenBucketConfig(capacity=20, refill_rate=2). Each word in the sentence maps to one parameter.
  • "Fair share. Every account gets 1 heavy job per 10 seconds but can queue up to 5." Use TokenBucketConfig(capacity=5, refill_rate=0.1).
  • "Throttle expensive webhook retries. At most 10 per minute per target." Use GCRAConfig(limit=10, window=60).

Note

There is no separate LeakyBucket algorithm because GCRA is the leaky-bucket-as-meter formulation. Operators searching for "leaky bucket" should use GCRAConfig.

Backend

Load a backend before using RateLimiter. The same backend serves every algorithm.

Install

The Redis backend needs the redis extra: pip install "grelmicro[redis]". See the installation guide for uv and poetry.

from grelmicro.resilience.redis import RedisRateLimiterBackend

backend = RedisRateLimiterBackend("redis://localhost:6379/0")
from grelmicro.resilience.memory import MemoryRateLimiterBackend

backend = MemoryRateLimiterBackend()

Warning

Please make sure to use a proper way to store connection URLs, such as environment variables, not hard-coded strings like the example above.

Redis Memory
Use case Production Testing / single-process
Multi-node Yes No
Persistence Yes (auto-expiring keys) No

The backend compiles the algorithm into a bound strategy at RateLimiter.__init__ through backend.bind(config). Runtime acquire, peek, and reset calls invoke that strategy directly. There is no algorithm dispatch on the request path.

Usage

from grelmicro.resilience import RateLimiter

# GCRA for precise sliding-window API throttling.
auth_limiter = RateLimiter.gcra("auth", limit=5, window=60)

# Token bucket for burst-friendly "N then 1/sec" semantics.
api_limiter = RateLimiter.token_bucket("api", capacity=100, refill_rate=10)


async def login(ip: str) -> None:
    result = await auth_limiter.acquire(key=ip)
    if not result.allowed:
        print(f"Too many attempts, retry after {result.retry_after:.0f}s")
        return
    print(f"Login allowed, {result.remaining} attempts remaining")


async def api_call(user_id: str) -> None:
    # Raises RateLimitExceededError if the bucket is empty
    await api_limiter.acquire_or_raise(key=user_id)
    print("API call allowed")

Result fields

RateLimitResult is the same across algorithms and carries everything needed for HTTP rate limit headers. The HTTP header column shows the RFC 9211 name first and the legacy X-RateLimit-* name second. Pick whichever convention your API already uses.

Field Type Description HTTP Header
allowed bool Whether the request is permitted 200 vs 429 status
limit int Total quota (limit for GCRAConfig, int(capacity) for TokenBucketConfig) RateLimit-Limit / X-RateLimit-Limit
remaining int Remaining requests / tokens RateLimit-Remaining / X-RateLimit-Remaining
retry_after float Seconds until next allowed request Retry-After
reset_after float Seconds until full quota resets RateLimit-Reset / X-RateLimit-Reset

Weighted requests

Use the cost parameter to consume multiple tokens per request.

# Bulk endpoint costs 10 tokens
result = await api_limiter.acquire(key=user_id, cost=10)

Peek (check without consuming)

Use peek() to inspect current state without consuming tokens.

from grelmicro.resilience import RateLimiter

invite_limiter = RateLimiter.gcra("invite", limit=5, window=3600)


async def is_locked(code: str) -> bool:
    result = await invite_limiter.peek(key=code)
    return not result.allowed

Reset

Use reset() to delete the state for a key, restoring its full quota.

from grelmicro.resilience import RateLimiter

auth_limiter = RateLimiter.gcra("auth", limit=5, window=60)


def verify_password(password: str) -> bool:
    return True


async def login(ip: str, password: str) -> None:
    await auth_limiter.acquire_or_raise(key=ip)

    if verify_password(password):
        # Successful login: clear the failure counter
        await auth_limiter.reset(key=ip)

Fail-open mode

Use fail_open=True when availability matters more than strictness. On backend errors (e.g. Redis down), the rate limiter returns an allowed result instead of raising.

from grelmicro.resilience import RateLimiter

# Non-critical limiter: prefer availability over strictness
limiter = RateLimiter.token_bucket(
    "analytics",
    capacity=100,
    refill_rate=10,
    fail_open=True,
)


def record_event(user_id: str) -> None: ...


async def track_event(user_id: str) -> None:
    # If Redis is down, the event is still tracked
    result = await limiter.acquire(key=user_id)
    if result.allowed:
        record_event(user_id)

Warning

Fail-open mode only catches backend infrastructure errors. Legitimate rate-limit rejections still work normally.

Tip

The rate limiter uses the same backend registry pattern as the synchronization primitives. See Backend Architecture for details.

Standalone MemoryTokenBucket

MemoryTokenBucket is a standalone, synchronous, thread-safe in-memory token-bucket primitive. Unlike RateLimiter, it is not pluggable and not async. Use it when you need a raw, zero-I/O bucket on a synchronous performance-critical path. It powers grelmicro.log.RateLimitFilter, which is the recommended way to use it for rate-limiting log records. Call it directly for any other use case.

Usage

from grelmicro.resilience import MemoryTokenBucket

# Sync, thread-safe, zero-I/O primitive.
# Useful for CLI tools, shell helpers, and other sync hot paths
# where the async RateLimiter isn't appropriate.
bucket = MemoryTokenBucket(capacity=5, refill_rate=1)


def handle_event(event_id: str) -> None:
    if not bucket.try_acquire(key=event_id):
        return
    # ... process the event

API

Method Description
try_acquire(key="", *, cost=1.0) -> bool Consume cost tokens and return True if allowed.
peek(key="") -> float Current token count (fractional).
reset(key="") -> None Restore key to full capacity.
capacity / refill_rate Read-only configuration.