Skip to main content

Documentation Index

Fetch the complete documentation index at: https://gateway.consus.io/llms.txt

Use this file to discover all available pages before exploring further.

POST /v1/responses Creates a response using OpenAI’s Responses API format. This is the endpoint Codex CLI targets when configured with wire_api = "responses". The endpoint is a thin proxy to Azure Government OpenAI’s native Responses API and is restricted to Azure-routed GPT models. For Claude, use /v1/messages. For multi-provider access via the OpenAI Chat shape, use /v1/chat/completions.

Request

Headers

HeaderRequiredDescription
x-api-keyYesYour API key
Content-TypeYesapplication/json

Body Parameters

ParameterTypeRequiredDescription
modelstringYesComposite model ID (e.g., gpt-5.1:il5). Must resolve to an Azure-routed GPT model.
inputstring or arrayYesEither a plain string or an array of typed input items (message, function_call, function_call_output).
instructionsstringNoSystem-style instructions, separate from input.
toolsarrayNoFunction tools the model may call. Responses-shape: {type: "function", name, description, parameters} (flat, not nested under a function key).
tool_choicestring or objectNoauto, none, required, or {type: "function", name: "..."}.
reasoningobjectNo{effort: "low" | "medium" | "high", summary: "auto"} for reasoning-capable models.
max_output_tokensintegerNoCap on output token count.
temperature, top_p, parallel_tool_calls, metadatavariousNoForwarded to Azure unchanged.
streambooleanNoIf true, returns SSE. See Streaming.
storebooleanNoMust be false or omitted. See Stateless mode.
previous_response_idstringNoNot accepted. See Stateless mode.
Other fields not listed here forward to Azure unchanged. Azure is the authoritative validator for shape correctness.

Response

Returns Azure’s full response payload unchanged, with an optional x_consus_governance field appended when tool-output governance flags are present.
{
  "id": "resp_abc123",
  "object": "response",
  "created_at": 1700000000,
  "status": "completed",
  "model": "gpt-5.1",
  "output": [
    {
      "type": "message",
      "id": "msg_1",
      "role": "assistant",
      "content": [{"type": "output_text", "text": "Hello!"}]
    }
  ],
  "usage": {
    "input_tokens": 10,
    "output_tokens": 5,
    "total_tokens": 15
  }
}

Stateless Mode

Consus Gateway is stateless. The gateway does not store response payloads, conversation history, or any other state across requests. Two consequences:
  • store: true is rejected with 400 invalid_request_error. Omit store or set it to false.
  • previous_response_id is rejected with 400 invalid_request_error. Send the full input array each turn.
Codex CLI handles this automatically — it sets store: false on any base URL that doesn’t look like a direct Azure endpoint, and resends conversation context each turn. You don’t need to change anything in your Codex config; the rejection only matters if you’re using a custom client.

Reasoning

gpt-5.1 supports extended reasoning. Pass reasoning: {effort: "high"} in the request body, or set model_reasoning_effort = "high" in your Codex config. Effort defaults to the catalog’s per-model default if omitted. Reasoning output items (type: "reasoning") appear in the output array alongside message and function_call items. Encrypted reasoning content for stateful multi-turn is not exposed (the gateway is stateless — see Stateless Mode).

Streaming

Streaming is not real-time today. Setting stream: true is accepted, but the gateway buffers the full upstream response before emitting SSE events. The wire shape is the standard OpenAI Responses event sequence (response.createdresponse.output_item.addedresponse.output_text.deltaresponse.output_item.doneresponse.completed) and Codex parses it correctly, but tokens do not arrive incrementally. All events flush together once the upstream model finishes. Incremental streaming is a planned optimization.
Each event is event: <type>\ndata: <json>\n\n. For a typical text response Codex sees:
  1. response.created
  2. response.in_progress
  3. response.output_item.added (for each output item)
  4. response.content_part.addedresponse.output_text.deltaresponse.output_text.doneresponse.content_part.done (for message items)
  5. response.function_call_arguments.deltaresponse.function_call_arguments.done (for function_call items)
  6. response.output_item.done
  7. response.completed — terminal event, carries the full response payload (including usage and x_consus_governance if applicable)

Tool Use

Tool definitions use the Responses-shape (flat, not nested under function):
{
  "tools": [
    {
      "type": "function",
      "name": "get_weather",
      "description": "Get weather for a location",
      "parameters": {
        "type": "object",
        "properties": {"location": {"type": "string"}},
        "required": ["location"]
      }
    }
  ]
}
When the model calls a tool, the response includes a function_call item in output:
{
  "type": "function_call",
  "id": "fc_1",
  "call_id": "call_abc",
  "name": "get_weather",
  "arguments": "{\"location\": \"Washington, DC\"}"
}
To send the result back, include a function_call_output item in the next request’s input array referencing the same call_id. The same destination-bearing parameter screener used by /v1/chat/completions runs on tool definitions — schemas with property names like destination_url, webhook_url, etc., are rejected with 400. See the Chat Completions doc for the full list.

Governance Metadata

When a function_call item’s arguments contains an outbound destination (URL, IPv4), the response body includes an advisory x_consus_governance.flags field alongside the standard Responses payload. The function call itself is not modified.
{
  "id": "resp_xyz",
  "output": [
    {
      "type": "function_call",
      "id": "fc_1",
      "call_id": "call_abc",
      "name": "save_results",
      "arguments": "{\"url\": \"https://collector.example.com/ingest\", \"data\": \"...\"}"
    }
  ],
  "usage": {...},
  "x_consus_governance": {
    "flags": [
      {
        "tool_call_id": "call_abc",
        "tool_name": "save_results",
        "destinations": ["https://collector.example.com/ingest"],
        "reason": "external_destination"
      }
    ]
  }
}
This is advisory — the gateway does not block or redact the call. Compliant clients check the field before executing the tool. The streaming endpoint embeds x_consus_governance in the final response.completed event’s response payload.

Examples

Basic completion

curl -X POST https://api.consus.io/v1/responses \
  -H "x-api-key: $CONSUS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.1:il5",
    "input": "Say hi in three words.",
    "store": false,
    "max_output_tokens": 50
  }'

With reasoning

curl -X POST https://api.consus.io/v1/responses \
  -H "x-api-key: $CONSUS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.1:il5",
    "input": "How are cranes removed from skyscrapers?",
    "store": false,
    "reasoning": {"effort": "high"},
    "max_output_tokens": 4096
  }'

Streaming

curl -X POST https://api.consus.io/v1/responses \
  -H "x-api-key: $CONSUS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.1:il5",
    "input": "Say hi.",
    "store": false,
    "stream": true
  }'

Multi-turn (stateless)

Send the full input array each turn — previous_response_id is not supported.
curl -X POST https://api.consus.io/v1/responses \
  -H "x-api-key: $CONSUS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.1:il5",
    "store": false,
    "input": [
      {"type": "message", "role": "user", "content": "What is FedRAMP?"},
      {"type": "message", "role": "assistant", "content": "FedRAMP is the Federal Risk and Authorization Management Program..."},
      {"type": "message", "role": "user", "content": "And how does IL5 differ?"}
    ]
  }'