Rate Limiter
A Rate Limiter caps how many requests a client can make inside a time window. RateLimiter is algorithm-agnostic. Pass an algorithm config to choose semantics. Everything else (API, RateLimitResult, backend registry, fail_open) is shared.
Why
- Protect services from overload and abuse.
- Enforce fair usage across clients.
- Produce HTTP 429 responses with RFC 9211
RateLimit-*or legacyX-RateLimit-*headers.
Construction
For day-to-day Python code, use the factory classmethods. They keep the call site explicit and short:
from grelmicro.resilience import RateLimiter
auth_limiter = RateLimiter.gcra("auth", limit=5, window=60)
api_limiter = RateLimiter.token_bucket(
"api",
capacity=100,
refill_rate=10,
)
Use RateLimiter.from_config(name, config) when the algorithm config already comes from a settings tree, YAML, or another declarative source.
from grelmicro.resilience import GCRAConfig, RateLimiter
cfg = GCRAConfig(limit=5, window=60)
limiter = RateLimiter.from_config("auth", cfg)
RateLimiter intentionally does not flatten both algorithms into one generic kwargs constructor. Token bucket and GCRA have different parameter vocabularies, and keeping one explicit entry point per behaviour makes the public API easier to read.
Choosing an algorithm
Pick the algorithm whose behaviour matches how operators describe the limit in runbooks and API docs. Both algorithms share the same Python API, backends, and RateLimitResult shape, so you can switch later.
Decision guide
- Are you throttling an HTTP API with
RateLimit-*orX-RateLimit-*headers? UseGCRAConfig. Its sliding-window model matches the IETF RateLimit headers directly and produces preciselimit,remaining, andreset_aftervalues. - Do you want "allow a burst of N, then 1 per second sustained"? Use
TokenBucketConfig. Thecapacityandrefill_rateparameters describe exactly that. - Does a client need to send occasional spikes above the average rate? Use
TokenBucketConfig. The capacity absorbs the spike. GCRA can allow bursts too, but the configuration is less direct. - Did you search for "leaky bucket"? Use
GCRAConfig. It is the leaky-bucket-as-meter formulation.
Side-by-side
| GCRAConfig | TokenBucketConfig | |
|---|---|---|
| Mental model | "N requests per sliding T-second window" | "A bucket holding N tokens that refills at R tokens/sec" |
| Parameters | limit, window |
capacity, refill_rate |
| Burst behaviour | Up to limit requests if the window is empty |
Up to capacity if the bucket is full |
| Sustained rate | limit / window requests per second |
refill_rate tokens per second |
| HTTP header fit | Strong. reset_after is a true window boundary and maps directly to RateLimit-Reset. |
Workable. retry_after is the time until the next token (continuous refill), not a window reset. |
Performance
Both algorithms run in O(1) per operation. End-to-end latency is dominated by the backend: a Redis round-trip costs far more than the algorithm itself. Per-key memory on the Memory backend differs by about 15 MB per million keys. Choose based on behaviour, not compute cost.
Worked scenarios
- "Limit each user to 100 API calls per minute." Use
GCRAConfig(limit=100, window=60). The sliding window matches the natural description, andRateLimitResult.reset_afterfeeds directly intoRateLimit-Reset. - "Allow a burst of 20 uploads, then 2 per second." Use
TokenBucketConfig(capacity=20, refill_rate=2). Each word in the sentence maps to one parameter. - "Fair share. Every account gets 1 heavy job per 10 seconds but can queue up to 5." Use
TokenBucketConfig(capacity=5, refill_rate=0.1). - "Throttle expensive webhook retries. At most 10 per minute per target." Use
GCRAConfig(limit=10, window=60).
Note
There is no separate LeakyBucket algorithm because GCRA is the leaky-bucket-as-meter formulation. Operators searching for "leaky bucket" should use GCRAConfig.
Backend
Load a backend before using RateLimiter. The same backend serves every algorithm.
Install
The Redis backend needs the redis extra: pip install "grelmicro[redis]". See the installation guide for uv and poetry.
from grelmicro.resilience.redis import RedisRateLimiterBackend
backend = RedisRateLimiterBackend("redis://localhost:6379/0")
from grelmicro.resilience.memory import MemoryRateLimiterBackend
backend = MemoryRateLimiterBackend()
Warning
Please make sure to use a proper way to store connection URLs, such as environment variables, not hard-coded strings like the example above.
| Redis | Memory | |
|---|---|---|
| Use case | Production | Testing / single-process |
| Multi-node | Yes | No |
| Persistence | Yes (auto-expiring keys) | No |
The backend compiles the algorithm into a bound strategy at RateLimiter.__init__ through backend.bind(config). Runtime acquire, peek, and reset calls invoke that strategy directly. There is no algorithm dispatch on the request path.
Usage
from grelmicro.resilience import RateLimiter
# GCRA for precise sliding-window API throttling.
auth_limiter = RateLimiter.gcra("auth", limit=5, window=60)
# Token bucket for burst-friendly "N then 1/sec" semantics.
api_limiter = RateLimiter.token_bucket("api", capacity=100, refill_rate=10)
async def login(ip: str) -> None:
result = await auth_limiter.acquire(key=ip)
if not result.allowed:
print(f"Too many attempts, retry after {result.retry_after:.0f}s")
return
print(f"Login allowed, {result.remaining} attempts remaining")
async def api_call(user_id: str) -> None:
# Raises RateLimitExceededError if the bucket is empty
await api_limiter.acquire_or_raise(key=user_id)
print("API call allowed")
Result fields
RateLimitResult is the same across algorithms and carries everything needed for HTTP rate limit headers. The HTTP header column shows the RFC 9211 name first and the legacy X-RateLimit-* name second. Pick whichever convention your API already uses.
| Field | Type | Description | HTTP Header |
|---|---|---|---|
allowed |
bool |
Whether the request is permitted | 200 vs 429 status |
limit |
int |
Total quota (limit for GCRAConfig, int(capacity) for TokenBucketConfig) |
RateLimit-Limit / X-RateLimit-Limit |
remaining |
int |
Remaining requests / tokens | RateLimit-Remaining / X-RateLimit-Remaining |
retry_after |
float |
Seconds until next allowed request | Retry-After |
reset_after |
float |
Seconds until full quota resets | RateLimit-Reset / X-RateLimit-Reset |
Weighted requests
Use the cost parameter to consume multiple tokens per request.
# Bulk endpoint costs 10 tokens
result = await api_limiter.acquire(key=user_id, cost=10)
Peek (check without consuming)
Use peek() to inspect current state without consuming tokens.
from grelmicro.resilience import RateLimiter
invite_limiter = RateLimiter.gcra("invite", limit=5, window=3600)
async def is_locked(code: str) -> bool:
result = await invite_limiter.peek(key=code)
return not result.allowed
Reset
Use reset() to delete the state for a key, restoring its full quota.
from grelmicro.resilience import RateLimiter
auth_limiter = RateLimiter.gcra("auth", limit=5, window=60)
def verify_password(password: str) -> bool:
return True
async def login(ip: str, password: str) -> None:
await auth_limiter.acquire_or_raise(key=ip)
if verify_password(password):
# Successful login: clear the failure counter
await auth_limiter.reset(key=ip)
Fail-open mode
Use fail_open=True when availability matters more than strictness. On backend errors (e.g. Redis down), the rate limiter returns an allowed result instead of raising.
from grelmicro.resilience import RateLimiter
# Non-critical limiter: prefer availability over strictness
limiter = RateLimiter.token_bucket(
"analytics",
capacity=100,
refill_rate=10,
fail_open=True,
)
def record_event(user_id: str) -> None: ...
async def track_event(user_id: str) -> None:
# If Redis is down, the event is still tracked
result = await limiter.acquire(key=user_id)
if result.allowed:
record_event(user_id)
Warning
Fail-open mode only catches backend infrastructure errors. Legitimate rate-limit rejections still work normally.
Tip
The rate limiter uses the same backend registry pattern as the synchronization primitives. See Backend Architecture for details.
Standalone MemoryTokenBucket
MemoryTokenBucket is a standalone, synchronous, thread-safe in-memory token-bucket primitive. Unlike RateLimiter, it is not pluggable and not async. Use it when you need a raw, zero-I/O bucket on a synchronous performance-critical path. It powers grelmicro.log.RateLimitFilter, which is the recommended way to use it for rate-limiting log records. Call it directly for any other use case.
Usage
from grelmicro.resilience import MemoryTokenBucket
# Sync, thread-safe, zero-I/O primitive.
# Useful for CLI tools, shell helpers, and other sync hot paths
# where the async RateLimiter isn't appropriate.
bucket = MemoryTokenBucket(capacity=5, refill_rate=1)
def handle_event(event_id: str) -> None:
if not bucket.try_acquire(key=event_id):
return
# ... process the event
API
| Method | Description |
|---|---|
try_acquire(key="", *, cost=1.0) -> bool |
Consume cost tokens and return True if allowed. |
peek(key="") -> float |
Current token count (fractional). |
reset(key="") -> None |
Restore key to full capacity. |
capacity / refill_rate |
Read-only configuration. |