Rate Limits

API Rate Limits
Model Rate Limits
What Happens When You Hit a Limit

API Rate Limits

Each API key is assigned a usage plan with the following default limits:

Limit	Default
Requests per second	100
Burst	200 requests
Monthly quota	10,000 requests

These are enforced at the API Gateway level before your request reaches the application. Requests that exceed these limits receive a 429 response. Limits can be adjusted per API key. Book a call to discuss your needs.

Model Rate Limits

Each model has its own upstream provider limits. These are shared across all Consus Gateway customers and are separate from your API key limits.

Model	Requests per Minute	Tokens per Minute
`claude-3-7-sonnet:il5`	125	500,000
`claude-sonnet-4-5:il5`	10,000	5,000,000
`gemini-2-5-pro:il5`	100	4,000,000
`gemini-2-5-flash:il5`	2,000	4,000,000

Model rate limits are set by the upstream provider and may change. If you receive a 429 error but are within your API key limits, you’ve hit a model-level limit. Consus Gateway automatically retries these once before returning the error.

What Happens When You Hit a Limit

API key limit exceeded: You get an immediate 429 from the API Gateway (no JSON body).
Model limit exceeded: Consus Gateway retries once with a 2 second delay. If the retry also fails, you receive a 429 with type: "rate_limit_error" and code: "upstream_rate_limit".

The code field lets you tell the difference: if code is "upstream_rate_limit", it’s a model-level limit. If there’s no JSON body at all, it’s your API key limit. For client-side handling, retry on 429 with exponential backoff: start at 1 second, double each attempt, and cap at 5 retries.

Authentication Chat Completions

Product

Getting Started

API Reference

Integrations

API Rate Limits

Model Rate Limits

What Happens When You Hit a Limit

Product

Getting Started

API Reference

Integrations

​API Rate Limits

​Model Rate Limits

​What Happens When You Hit a Limit

API Rate Limits

Model Rate Limits

What Happens When You Hit a Limit