Skip to main content

API Rate Limits

Each API key is assigned a usage plan with the following default limits:
LimitDefault
Requests per second100
Burst200 requests
Monthly quota10,000 requests
These are enforced at the API Gateway level before your request reaches the application. Requests that exceed these limits receive a 429 response. Limits can be adjusted per API key. Book a call to discuss your needs.

Model Rate Limits

Each model has its own upstream provider limits. These are shared across all Consus Gateway customers and are separate from your API key limits.
ModelRequests per MinuteTokens per Minute
claude-3-7-sonnet:il5125500,000
claude-sonnet-4-5:il510,0005,000,000
gemini-2-5-pro:il51004,000,000
gemini-2-5-flash:il52,0004,000,000
Model rate limits are set by the upstream provider and may change. If you receive a 429 error but are within your API key limits, you’ve hit a model-level limit. Consus Gateway automatically retries these once before returning the error.

What Happens When You Hit a Limit

  1. API key limit exceeded: You get an immediate 429 from the API Gateway (no JSON body).
  2. Model limit exceeded: Consus Gateway retries once with a 2 second delay. If the retry also fails, you receive a 429 with type: "rate_limit_error" and code: "upstream_rate_limit".
The code field lets you tell the difference: if code is "upstream_rate_limit", it’s a model-level limit. If there’s no JSON body at all, it’s your API key limit. For client-side handling, retry on 429 with exponential backoff: start at 1 second, double each attempt, and cap at 5 retries.