Retry
A Retry policy repeats a failing call with backoff and jitter. Use it for transient errors that are safe to retry: network blips, brief overloads, deadline-related failures.
Why
- Survive flaky network paths without changing the call site.
- Apply exponential backoff with jitter so retries do not stampede a recovering dependency.
- Stay safe by default: retries trigger only on the exception classes you allow.
Usage
import httpx
from grelmicro.resilience import retry
@retry(when=httpx.HTTPError, attempts=3)
async def fetch(client: httpx.AsyncClient, url: str) -> bytes:
response = await client.get(url)
response.raise_for_status()
return response.content
async def main() -> bytes:
async with httpx.AsyncClient() as client:
return await fetch(client, "https://example.com")
The decorator works on async and sync functions. It auto-detects which kind it wraps.
For inline retries that span multiple statements, use the block form:
import httpx
from grelmicro.resilience import retrying
async def submit(client: httpx.AsyncClient, url: str, payload: dict) -> dict:
async for attempt in retrying(when=httpx.HTTPError, attempts=3):
async with attempt:
response = await client.post(url, json=payload)
response.raise_for_status()
return response.json()
return {}
async def main() -> dict:
async with httpx.AsyncClient() as client:
return await submit(client, "https://example.com", {"k": "v"})
when= is required. There is no default. Pass a Match instance, or one of the shorthand forms (an exception class, a tuple of classes, or a predicate callable). See the next section for the full filter DSL.
Filtering outcomes with Match
Match is the DSL every resilience strategy uses to decide whether an outcome (an exception OR a return value) should engage the strategy. The when= parameter on Retry accepts any Match instance, plus the bare-class shorthand for the simple case.
Exception filtering
from grelmicro.resilience import Match, Retry
Retry("api", when=Match.exception(httpx.HTTPError))
Retry("api", when=Match.exception(httpx.HTTPError, OSError))
Retry("api", when=Match.exception(lambda e: e.status >= 500))
Retry("api", when=Match.exception_message(contains="timeout"))
Retry("api", when=Match.exception_message(regex=r"^5\d\d "))
Retry("api", when=Match.exception_cause(KeyError))
Result filtering
Retry("polling", when=Match.result(None))
Retry("polling", when=Match.result(False))
Retry("polling", when=Match.result(lambda r: r.status_code >= 500))
Match.result(callable) always treats the argument as a predicate. To match a function literal exactly, wrap with lambda r: r is my_fn.
Composition
# OR
Retry("api", when=Match.exception(httpx.HTTPError) | Match.result(None))
# AND
Retry("api", when=Match.exception(httpx.HTTPError) & Match.exception(lambda e: e.status >= 500))
# NOT (one symmetric `not_*` per primitive)
Retry("api", when=Match.not_exception(ValueError))
Retry("api", when=Match.not_result(None))
Retry("api", when=Match.not_exception_message(contains="ok"))
Retry("api", when=Match.not_exception_cause(KeyError))
Use | for OR and & for AND. Each primitive (exception, result, exception_message, exception_cause) has a not_* twin for the negated form.
Worked example
import httpx
from grelmicro.resilience import Match, retry
# Compose exception and result matchers with `|`.
# Retry on transient HTTP errors OR when the response carries a
# server-side soft-fail marker.
@retry(
when=Match.exception(httpx.HTTPError)
| Match.result(lambda r: r.headers.get("X-Soft-Fail") == "true"),
attempts=5,
)
async def fetch(client: httpx.AsyncClient, url: str) -> httpx.Response:
return await client.get(url)
# Polling-style: retry until the result is no longer ``None``.
@retry(when=Match.result(None), attempts=20)
async def poll_job(client: httpx.AsyncClient, job_id: str) -> dict | None:
response = await client.get(f"/jobs/{job_id}")
payload = response.json()
return payload if payload["status"] == "ready" else None
What never retries
asyncio.CancelledError, KeyboardInterrupt, and SystemExit are BaseException subclasses outside Exception. They always propagate, regardless of the Match you pass. This is required for correct asyncio shutdown.
Backoff algorithms
Retry ships five algorithms. The default is exponential with full jitter. Pick by purpose.
| Algorithm | Use when |
|---|---|
ExponentialBackoff (default) |
Network and HTTP retries. Doubling delay with jitter avoids retry storms. |
ConstantBackoff |
Polling-style retries (waiting for a job). Fixed interval is predictable. |
LinearBackoff |
Steady, predictable growth without the early-attempt cluster of exponential. |
FibonacciBackoff |
Smoother growth than exponential, faster than linear. |
RandomBackoff |
Uniform random delay in a fixed range. Maximum spread, no growth. |
The factory classmethods build the right config for you:
policy = Retry.exponential("payments", when=httpx.HTTPError, attempts=5)
polling = Retry.constant("wait_job", when=NotReady, attempts=20, delay=1.0)
Exponential
The raw wait before retry N is min(base_delay * 2 ** (N - 1), max_delay). It doubles each attempt until it reaches the cap. With the defaults (base_delay=0.1, max_delay=30.0), the raw wait is 0.1s, 0.2s, 0.4s, 0.8s, 1.6s, ..., capped at 30.0s.
Jitter then transforms each raw wait into the actual sleep (see Jitter below). The actual sleep may be smaller than the raw value.
Jitter
Jitter is randomness added to the wait so concurrent clients do not retry at the same instant and overwhelm the recovering server.
Pick a mode by how much spread you want. Default: full.
| Jitter | Spread | When to use |
|---|---|---|
none |
none | Single client. Never with concurrency. |
full (default) |
maximum | The safe default. |
equal |
half | When you need predictable timing. |
decorrelated |
adaptive | High contention on a shared dependency. |
Constant
A fixed delay between retries. One field: delay (seconds, default 1.0).
Configuration
Retry follows the three-paths configuration contract.
Programmatic
import httpx
from grelmicro.resilience import Retry
policy = Retry.exponential(
"payments",
when=httpx.HTTPError,
attempts=5,
base_delay=0.2,
max_delay=10.0,
jitter="full",
)
@policy
async def call_payments():
pass
Declarative
import httpx
from grelmicro.resilience import (
ExponentialBackoff,
Match,
Retry,
RetryConfig,
)
config = RetryConfig(
attempts=5,
when=Match.exception(httpx.HTTPError),
backoff=ExponentialBackoff(base_delay=0.2, max_delay=10.0, jitter="full"),
)
policy = Retry("payments", config=config)
Environmental
Prefix: GREL_RETRY_{NAME_UPPER}_
| Env var | Field | Type | Default |
|---|---|---|---|
GREL_RETRY_{NAME_UPPER}_ATTEMPTS |
attempts |
int (>= 1) |
3 |
GREL_RETRY_{NAME_UPPER}_WHEN |
when |
CSV or JSON list of FQN strings (e.g. httpx.HTTPError). Coerced to Match.exception(...). Predicate forms cannot come from env. |
required |
GREL_RETRY_{NAME_UPPER}_BACKOFF |
backoff |
JSON object with a kind field (see below) |
{"kind":"exponential"} |
The full backoff config is a discriminated Pydantic union, so the env value is parsed as one JSON object. Each algorithm accepts the same fields it takes in code:
kind |
Fields |
|---|---|
exponential |
base_delay, max_delay, jitter (none / full / equal / decorrelated) |
constant |
delay |
linear |
base_delay, max_delay |
fibonacci |
base_delay, max_delay |
random |
min_delay, max_delay |
import httpx
from grelmicro.resilience import Retry
# Reads the config from environment variables. The backoff field is
# a discriminated union, pass it as a single JSON object.
#
# - GREL_RETRY_PAYMENTS_ATTEMPTS=5
# - GREL_RETRY_PAYMENTS_WHEN=httpx.HTTPError
# - GREL_RETRY_PAYMENTS_BACKOFF={"kind":"exponential","base_delay":0.2}
policy = Retry("payments", when=httpx.HTTPError)
The callable form of when cannot come from env. Use the FQN list for env-driven configs.
Composition with Circuit Breaker
Retry and Circuit Breaker compose by intent. When the breaker is OPEN, it raises CircuitBreakerError. Pick a narrow when= allowlist so the retry loop does not swallow that signal:
import httpx
from grelmicro.resilience import CircuitBreaker, retry
cb = CircuitBreaker("payments")
# A narrow allowlist that excludes CircuitBreakerError. When the
# breaker is open it raises CircuitBreakerError, which is not in
# `on`, so the retry loop aborts immediately.
@retry(when=(httpx.ConnectError, httpx.TimeoutException), attempts=3)
async def call_payments(
client: httpx.AsyncClient, url: str, payload: dict
) -> dict:
async with cb:
response = await client.post(url, json=payload)
response.raise_for_status()
return response.json()
async def main() -> dict:
async with httpx.AsyncClient() as client:
return await call_payments(client, "https://example.com", {"k": "v"})
A broad allowlist (when=Exception) would retry through the open breaker. The narrow allowlist lets the breaker do its job.
Behavior on exhaustion
When attempts is exhausted, the underlying exception is re-raised with a PEP 678 note attached:
try:
await fetch(url)
except httpx.ConnectError as exc:
print(exc.__notes__)
# ['retry: 3/3 attempts exhausted in 1.40s (exponential backoff)']
Callers catch the underlying error type, unchanged. There is no RetryError wrapper class.
Live reconfiguration
Retry inherits Reconfigurable[RetryConfig]. Calling policy.reconfigure(new_config) swaps the snapshot for future loops. An in-flight async for attempt in policy: keeps its snapshot until it completes. See Live reconfiguration.
Reference
See the API reference for every option.