API Rate Limits
Each API key is assigned a usage plan with the following default limits:
| Limit | Default |
|---|
| Requests per second | 100 |
| Burst | 200 requests |
| Monthly quota | 10,000 requests |
These are enforced at the API Gateway level before your request reaches the application. Requests that exceed these limits receive a 429 response. Limits can be adjusted per API key. Book a call to discuss your needs.
Model Rate Limits
Each model has its own upstream provider limits. These are shared across all Consus Gateway customers and are separate from your API key limits.
| Model | Requests per Minute | Tokens per Minute |
|---|
claude-3-7-sonnet:il5 | 125 | 500,000 |
claude-sonnet-4-5:il5 | 10,000 | 5,000,000 |
gemini-2-5-pro:il5 | 100 | 4,000,000 |
gemini-2-5-flash:il5 | 2,000 | 4,000,000 |
Model rate limits are set by the upstream provider and may change. If you receive a 429 error but are within your API key limits, you’ve hit a model-level limit. Consus Gateway automatically retries these once before returning the error.
What Happens When You Hit a Limit
- API key limit exceeded: You get an immediate
429 from the API Gateway (no JSON body).
- Model limit exceeded: Consus Gateway retries once with a 2 second delay. If the retry also fails, you receive a
429 with type: "rate_limit_error" and code: "upstream_rate_limit".
The code field lets you tell the difference: if code is "upstream_rate_limit", it’s a model-level limit. If there’s no JSON body at all, it’s your API key limit.
For client-side handling, retry on 429 with exponential backoff: start at 1 second, double each attempt, and cap at 5 retries.