Chat Completions - Consus Gateway

POST /v1/chat/completions Creates a chat completion. This is the primary endpoint for generating AI responses. Requests are routed to the appropriate government cloud provider based on the model you specify. Supports text, multi-turn conversations, tool use, and image input (vision).

Request

Headers

Header	Required	Description
`x-api-key`	Yes	Your API key
`Content-Type`	Yes	`application/json`

Body Parameters

Parameter	Type	Required	Description
`model`	string	Yes	Model ID to use (see Models)
`messages`	array	Yes	List of messages (max 256)
`temperature`	float	No	Sampling temperature, 0.0 to 2.0
`max_tokens`	integer	No	Maximum tokens to generate
`top_p`	float	No	Nucleus sampling parameter, 0.0 to 1.0
`stream`	boolean	No	Whether to stream the response via SSE
`stream_options`	object	No	Streaming options (e.g. `{"include_usage": true}`)
`stop`	string or array	No	Stop sequence(s) to end generation
`tools`	array	No	Tools the model may call (max 128). See Tool Use.
`tool_choice`	string or object	No	Controls tool selection: `auto`, `none`, `required`, or a specific tool
`reasoning_effort`	string	No	One of `none`, `low`, `medium`, `high`. Enables extended thinking on capable models. See Reasoning.

Message Object

Field	Type	Required	Description
`role`	string	Yes	One of `system`, `user`, `assistant`, or `tool`
`content`	string or array	Yes	The message content. Can be a plain string (max 1 MB) or an array of content parts (`text` and `image_url`). See Image Input.
`tool_calls`	array	No	Tool calls made by the assistant (on `assistant` messages, max 128)
`tool_call_id`	string	No	ID of the tool call this message responds to (on `tool` messages, max 256 chars)

Response

Non-Streaming

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "claude-3-7-sonnet:il5",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 12,
    "total_tokens": 22
  }
}

Streaming

Streaming is not real-time today. Setting stream: true is accepted, but the gateway buffers the full upstream response before emitting SSE chunks. The wire shape is the standard OpenAI streaming format and SDK clients (OpenAI SDK, LangChain, etc.) parse it correctly, but tokens do not arrive incrementally. All events flush together once the upstream model finishes. If your application depends on incremental delivery (live token rendering, early cancellation), that is not available on this endpoint yet.

Finish Reasons

Value	Meaning
`stop`	Natural end of response or stop sequence hit
`length`	Hit `max_tokens` limit
`tool_calls`	The model is invoking one or more tools
`content_filter`	Content was filtered by the provider’s safety system

Reasoning

Some models support extended thinking before producing the final response. The model spends additional output tokens reasoning through the problem internally, then returns the visible answer. This generally improves accuracy on harder tasks at the cost of latency and tokens. Pass reasoning_effort on the request to control it. Reasoning is currently supported on all Claude models (Bedrock and Vertex), GPT-5.1 (Azure), and GPT-OSS-120B and GPT-5.4 (Bedrock). Other models silently ignore the parameter.

Effort levels

Value	Meaning
`none`	Disable reasoning even on capable models
`low`	Light reasoning (about 1,024 thinking tokens on Claude)
`medium`	Moderate reasoning (about 8,192 thinking tokens on Claude)
`high`	Heavy reasoning (about 16,384 thinking tokens on Claude)

If you omit reasoning_effort, the gateway applies the per-model default from the catalog. Today that is medium on Claude, GPT-OSS-120B, and GPT-5.4, and high on GPT-5.1. To explicitly disable reasoning on a capable model, send "reasoning_effort": "none".

Token budget

The thinking budget counts against max_tokens. On Claude, the gateway clamps the budget to at most max_tokens - 1024 so room remains for the visible answer, with a floor of 1,024 thinking tokens. If you do not set max_tokens, the gateway sets it to budget + 4096 so the request always has headroom. Setting max_tokens very low will reduce the effective thinking budget. GPT-5.1, GPT-OSS-120B, and GPT-5.4 manage their own reasoning internally from reasoning_effort; the gateway does not clamp their thinking budget against max_tokens the way it does on Claude.

Sampling parameters

On Claude (Bedrock and Vertex), GPT-5.1 (Azure), and GPT-5.4 (Bedrock), temperature and top_p are not compatible with reasoning: when reasoning_effort is a non-none value, both are stripped before forwarding upstream to avoid 400 errors. GPT-OSS-120B (Bedrock) accepts temperature and top_p alongside reasoning, so they are preserved.

Tool choice

When reasoning is enabled on a Claude model, a forced tool_choice (required or a specific tool) is coerced to auto. The model needs latitude to think before deciding whether to call a tool. Tool definitions still pass through unchanged. GPT-5.1, GPT-OSS-120B, and GPT-5.4 have no equivalent restriction.

Reasoning token usage

When the model used reasoning, the OpenAI-shape usage object includes a completion_tokens_details.reasoning_tokens count. Reasoning tokens are already included in completion_tokens; the field is surfaced separately so you can attribute spend.

{
  "usage": {
    "prompt_tokens": 42,
    "completion_tokens": 1820,
    "total_tokens": 1862,
    "completion_tokens_details": {
      "reasoning_tokens": 1700
    }
  }
}

This field is populated for GPT-5.1 and GPT-5.4. On Claude and GPT-OSS-120B the upstream does not expose a separate reasoning token count, so the field is omitted; reasoning tokens are still present, just rolled into completion_tokens.

Example

curl -X POST https://api.consus.io/v1/chat/completions \
  -H "x-api-key: $CONSUS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-5:il5",
    "messages": [{"role": "user", "content": "How are cranes removed from skyscrappers?"}],
    "max_tokens": 4096,
    "reasoning_effort": "high"
  }'

Tool Use

Pass a tools array to let the model call functions. The model will respond with tool_calls when it wants to use a tool, and you send back the result in a tool message. Tool use is supported on all available Claude, Gemini, and GPT models. See Reasoning > Tool choice for how tool_choice interacts with reasoning_effort on Claude.

Tool Definition

Function name must match ^[a-zA-Z0-9_-]{1,64}$ (ASCII letters, digits, underscore, hyphen; 1 to 64 characters), matching the OpenAI and Anthropic tool name specs. Names outside this pattern return 400 invalid_request_error. description is limited to 65,536 characters.

{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": { "type": "string" }
          },
          "required": ["location"]
        }
      }
    }
  ]
}

Tool Call Response

When the model calls a tool, the response includes tool_calls instead of (or alongside) content:

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"location\": \"Washington, DC\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ]
}

Sending Tool Results

Include the tool call result in a follow-up message with role: "tool":

{
  "messages": [
    {"role": "user", "content": "What's the weather in DC?"},
    {"role": "assistant", "content": null, "tool_calls": [{"id": "call_abc123", "type": "function", "function": {"name": "get_weather", "arguments": "{\"location\": \"Washington, DC\"}"}}]},
    {"role": "tool", "tool_call_id": "call_abc123", "content": "72°F and sunny"}
  ]
}

Rejected Tool Schemas

The gateway rejects tool definitions whose parameter schemas include property names that clearly describe an outbound destination. These schemas are the shape of a data exfiltration tool, and accepting them at the gateway would be careless regardless of what the caller intends to do with the result. Rejected property names (case insensitive):

Destination names: destination, destination_url, dest_url, dst_url
Webhook names: webhook, webhook_url, webhooks
Callback names: callback, callback_url
Send and forward names: forward_to, forward_url, send_to, post_to, push_to
Target names: target_url, target_host
Named sinks: upload_url, ingest_url, notification_url, notify_url, report_url, sink_url
Obvious intent: exfil_url, exfiltrate

The check walks the full JSON Schema tree, so hiding a denied name inside a nested property, an array items schema, a $defs entry, or a oneOf branch will not bypass it. Ambiguous names that can legitimately read as well as write are not rejected. url, uri, endpoint, host, and hostname all pass at this layer. A database connection tool with host and port, or a tool that reads a record from an internal API by url, continues to work. The runtime check described in the next section handles what actually appears in the arguments. A rejected request returns 400 invalid_request_error and identifies the offending tool and parameter:

{
  "error": {
    "type": "invalid_request_error",
    "message": "tools -> 0 -> function: Value error, Tool 'save_results' has a parameter named 'destination_url' that suggests an outbound destination. Tools with destination-bearing parameters are rejected by gateway governance to prevent data exfiltration via tool calls. If you have a legitimate use case for this parameter name, contact Consus support for an exception."
  }
}

If you have a real business case for a parameter that matches a rejected name, contact us and we can work through the exception together.

Tool Call Governance Metadata

When a model returns tool calls, the gateway scans each arguments payload for outbound destinations (URLs with a scheme like https://, ftp://, s3://, data:, mailto:, and raw IPv4 addresses). When any are found, the response includes an advisory field called x_consus_governance alongside the standard OpenAI body. The tool call itself is not modified. You still receive the real arguments so your application can run. Response shape when destinations are detected:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "claude-3-7-sonnet:il5",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "save_results",
              "arguments": "{\"url\": \"https://collector.example.com/ingest\", \"data\": \"...\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ],
  "usage": { "prompt_tokens": 20, "completion_tokens": 30, "total_tokens": 50 },
  "x_consus_governance": {
    "flags": [
      {
        "tool_call_id": "call_abc123",
        "tool_name": "save_results",
        "destinations": ["https://collector.example.com/ingest"],
        "reason": "external_destination"
      }
    ]
  }
}

When no destinations are found, the field is absent from the response. This is an advisory signal. We do not block or redact the tool call. The purpose is to give your application something structured to act on before you execute a tool call whose destination came from the model output. The typical handling pattern is: check for the field, look up each destination against whatever allowlist or policy your application runs under, and surface it to a human or your policy engine when it looks unfamiliar. Streaming responses carry the same field in the final SSE chunk (the chunk that includes finish_reason). Clients that already parse the final chunk for usage can pick up x_consus_governance from the same place.

Examples

Basic Completion

curl -X POST https://api.consus.io/v1/chat/completions \
  -H "x-api-key: $CONSUS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-3-7-sonnet:il5",
    "messages": [
      {"role": "user", "content": "What is FedRAMP?"}
    ]
  }'

Streaming

curl -X POST https://api.consus.io/v1/chat/completions \
  -H "x-api-key: $CONSUS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-3-7-sonnet:il5",
    "messages": [
      {"role": "user", "content": "What is FedRAMP?"}
    ],
    "stream": true
  }'

With System Prompt and Parameters

curl -X POST https://api.consus.io/v1/chat/completions \
  -H "x-api-key: $CONSUS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-5:il5",
    "messages": [
      {"role": "system", "content": "You are a helpful government compliance assistant."},
      {"role": "user", "content": "Summarize CMMC Level 2 requirements."}
    ],
    "temperature": 0.3,
    "max_tokens": 2048
  }'

Multi-Turn Conversation

curl -X POST https://api.consus.io/v1/chat/completions \
  -H "x-api-key: $CONSUS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-3-7-sonnet:il5",
    "messages": [
      {"role": "user", "content": "What is an ATO?"},
      {"role": "assistant", "content": "An ATO (Authority to Operate) is a formal authorization..."},
      {"role": "user", "content": "How long does it typically take to get one?"}
    ]
  }'

Image Input (Vision)

All current Claude and Gemini models accept image input, as do the Azure GPT models (GPT-5.1, GPT-4.1) and GPT-5.4 (Bedrock). GPT-OSS-120B is text-only and does not accept image input. See the models page for the live list. Supported formats: jpeg, png, gif, webp

How images are sent

To include an image, set content to an array instead of a plain string. Each element is a content part with a type field, either "text" or "image_url".

The image_url field name is inherited from OpenAI’s API format. Despite the name, you do not pass a URL. You pass the raw image bytes encoded as a base64 data URI. External URLs (https://...) are rejected with 400 to prevent data exfiltration.

A base64 data URI looks like this:

data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA...
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
│         │         │    │
│         │         │    └─ your base64-encoded image bytes
│         │         └─ encoding must be "base64"
│         └─ MIME type, must match your actual image (image/jpeg, image/png, image/gif, image/webp)
└─ always starts with "data:"

You encode the raw file bytes to base64, prefix with data:<mime-type>;base64,, and put the whole string in the url field.

Request Format

{
  "model": "claude-3-7-sonnet:il5",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What does this screenshot show?"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "data:image/png;base64,<YOUR_BASE64_BYTES_HERE>"
          }
        }
      ]
    }
  ]
}

content can contain any number of text and image_url parts in any order
Images are only valid in user messages. system and assistant messages must use a plain string for content.
Multiple images per message are supported (up to 20)

Size Limits

Limit	Value
Per image (raw decoded)	3.5 MB
Total image data per message (base64)	4.5 MB
Max images per message	20

Requests exceeding these limits are rejected with 400.

Integrations handle this for you

If you’re using an integration like OpenCode or Cline, you don’t need to do any of this manually. Those tools encode images automatically when you paste a screenshot. Just paste and send. The base64 encoding and data URI formatting is handled behind the scenes. If you’re calling the API directly, read on.

Complete Examples

bash (curl)

# 1. Base64-encode your image file
IMAGE_B64=$(base64 -i screenshot.png)   # macOS
# IMAGE_B64=$(base64 -w 0 screenshot.png) # Linux

# 2. Send it. The mime type in the data URI must match your file (image/png, image/jpeg, etc.)
curl -X POST https://api.consus.io/v1/chat/completions \
  -H "x-api-key: $CONSUS_API_KEY" \
  -H "Content-Type: application/json" \
  -d "{
    \"model\": \"claude-sonnet-4-5:il5\",
    \"messages\": [{
      \"role\": \"user\",
      \"content\": [
        {\"type\": \"text\", \"text\": \"What does this screenshot show?\"},
        {\"type\": \"image_url\", \"image_url\": {\"url\": \"data:image/png;base64,${IMAGE_B64}\"}}
      ]
    }]
  }"

Python

import base64
import httpx

# 1. Read and encode the image
with open("screenshot.png", "rb") as f:
    b64 = base64.b64encode(f.read()).decode()

# 2. Send. Set the mime type to match your file.
httpx.post(
    "https://api.consus.io/v1/chat/completions",
    headers={"x-api-key": "CONSUS_API_KEY"},
    json={
        "model": "claude-3-7-sonnet:il5",
        "messages": [
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "What does this screenshot show?"},
                    {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{b64}"}},
                ],
            }
        ],
    },
)

Document Input (PDF)

All Claude and Gemini models support PDF document inputs. GPT models do not — neither the Azure GPT models (GPT-5.1, GPT-4.1) nor GPT-OSS-120B and GPT-5.4 on Bedrock. Supported file types: application/pdf

How documents are sent

To include a PDF, add a content part with "type": "file" to the content array. The file_data field must be a base64 data URI. External URLs are rejected for the same no-egress reason as images.

{
  "model": "claude-3-7-sonnet:il5",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Summarize the key findings in this report."
        },
        {
          "type": "file",
          "file": {
            "filename": "report.pdf",
            "file_data": "data:application/pdf;base64,<YOUR_BASE64_BYTES_HERE>"
          }
        }
      ]
    }
  ]
}

Documents are only valid in user messages. system and assistant messages must use plain strings.
filename is metadata passed to the model; it is treated as untrusted input and sanitized before use
Text, images, and files can be mixed in a single message’s content array

Size Limits

Limit	Value
Per file (raw decoded)	3.5 MB
Total base64 content per message (images + files combined)	4.5 MB
Max files per message	5

The combined image + file budget is shared at 4.5 MB per message, enforced by Lambda’s 6 MB payload ceiling.

curl example

PDF_B64=$(base64 -i report.pdf)   # macOS
# PDF_B64=$(base64 -w 0 report.pdf) # Linux

curl -X POST https://api.consus.io/v1/chat/completions \
  -H "x-api-key: $CONSUS_API_KEY" \
  -H "Content-Type: application/json" \
  -d "{
    \"model\": \"claude-3-7-sonnet:il5\",
    \"messages\": [{
      \"role\": \"user\",
      \"content\": [
        {\"type\": \"text\", \"text\": \"Summarize this document.\"},
        {\"type\": \"file\", \"file\": {\"filename\": \"report.pdf\", \"file_data\": \"data:application/pdf;base64,${PDF_B64}\"}}
      ]
    }]
  }"

Python example

import base64
import httpx

with open("report.pdf", "rb") as f:
    b64 = base64.b64encode(f.read()).decode()

httpx.post(
    "https://api.consus.io/v1/chat/completions",
    headers={"x-api-key": "CONSUS_API_KEY"},
    json={
        "model": "claude-3-7-sonnet:il5",
        "messages": [
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Summarize this document."},
                    {"type": "file", "file": {"filename": "report.pdf", "file_data": f"data:application/pdf;base64,{b64}"}},
                ],
            }
        ],
    },
)

​Request

​Headers

​Body Parameters

​Message Object

​Response

​Non-Streaming

​Streaming

​Finish Reasons

​Reasoning

​Effort levels

​Token budget

​Sampling parameters

​Tool choice

​Reasoning token usage

​Example

​Tool Use

​Tool Definition

​Tool Call Response

​Sending Tool Results

​Rejected Tool Schemas

​Tool Call Governance Metadata

​Examples

​Basic Completion

​Streaming

​With System Prompt and Parameters

​Multi-Turn Conversation

​Image Input (Vision)

​How images are sent

​Request Format

​Size Limits

​Integrations handle this for you

​Complete Examples

​Document Input (PDF)

​How documents are sent

​Size Limits

​curl example

​Python example

Request

Headers

Body Parameters

Message Object

Response

Non-Streaming

Streaming

Finish Reasons

Reasoning

Effort levels

Token budget

Sampling parameters

Tool choice

Reasoning token usage

Example

Tool Use

Tool Definition

Tool Call Response

Sending Tool Results

Rejected Tool Schemas

Tool Call Governance Metadata

Examples

Basic Completion

Streaming

With System Prompt and Parameters

Multi-Turn Conversation

Image Input (Vision)

How images are sent

Request Format

Size Limits

Integrations handle this for you

Complete Examples

Document Input (PDF)

How documents are sent

Size Limits

curl example

Python example