Responses - Consus Gateway

POST /v1/responses Creates a response using OpenAI’s Responses API format. This is the endpoint Codex CLI targets when configured with wire_api = "responses". The endpoint is a thin proxy to Azure Government OpenAI’s native Responses API and is restricted to Azure-routed GPT models. For Claude, use /v1/messages. For multi-provider access via the OpenAI Chat shape, use /v1/chat/completions.

Request

Headers

Header	Required	Description
`x-api-key`	Yes	Your API key
`Content-Type`	Yes	`application/json`

Body Parameters

Parameter	Type	Required	Description
`model`	string	Yes	Composite model ID (e.g., `gpt-5.1:il5`). Must resolve to an Azure-routed GPT model.
`input`	string or array	Yes	Either a plain string or an array of typed input items (`message`, `function_call`, `function_call_output`).
`instructions`	string	No	System-style instructions, separate from `input`.
`tools`	array	No	Function tools the model may call. Responses-shape: `{type: "function", name, description, parameters}` (flat, not nested under a `function` key).
`tool_choice`	string or object	No	`auto`, `none`, `required`, or `{type: "function", name: "..."}`.
`reasoning`	object	No	`{effort: "low" \| "medium" \| "high", summary: "auto"}` for reasoning-capable models.
`max_output_tokens`	integer	No	Cap on output token count.
`temperature`, `top_p`, `parallel_tool_calls`, `metadata`	various	No	Forwarded to Azure unchanged.
`stream`	boolean	No	If `true`, returns SSE. See Streaming.
`store`	boolean	No	Must be `false` or omitted. See Stateless mode.
`previous_response_id`	string	No	Not accepted. See Stateless mode.

Other fields not listed here forward to Azure unchanged. Azure is the authoritative validator for shape correctness.

Response

Returns Azure’s full response payload unchanged, with an optional x_consus_governance field appended when tool-output governance flags are present.

{
  "id": "resp_abc123",
  "object": "response",
  "created_at": 1700000000,
  "status": "completed",
  "model": "gpt-5.1",
  "output": [
    {
      "type": "message",
      "id": "msg_1",
      "role": "assistant",
      "content": [{"type": "output_text", "text": "Hello!"}]
    }
  ],
  "usage": {
    "input_tokens": 10,
    "output_tokens": 5,
    "total_tokens": 15
  }
}

Stateless Mode

Consus Gateway is stateless. The gateway does not store response payloads, conversation history, or any other state across requests. Two consequences:

store: true is rejected with 400 invalid_request_error. Omit store or set it to false.
previous_response_id is rejected with 400 invalid_request_error. Send the full input array each turn.

Codex CLI handles this automatically — it sets store: false on any base URL that doesn’t look like a direct Azure endpoint, and resends conversation context each turn. You don’t need to change anything in your Codex config; the rejection only matters if you’re using a custom client.

Reasoning

gpt-5.1 supports extended reasoning. Pass reasoning: {effort: "high"} in the request body, or set model_reasoning_effort = "high" in your Codex config. Effort defaults to the catalog’s per-model default if omitted. Reasoning output items (type: "reasoning") appear in the output array alongside message and function_call items. Encrypted reasoning content for stateful multi-turn is not exposed (the gateway is stateless — see Stateless Mode).

Streaming

Streaming is not real-time today. Setting stream: true is accepted, but the gateway buffers the full upstream response before emitting SSE events. The wire shape is the standard OpenAI Responses event sequence (response.created → response.output_item.added → response.output_text.delta → response.output_item.done → response.completed) and Codex parses it correctly, but tokens do not arrive incrementally. All events flush together once the upstream model finishes. Incremental streaming is a planned optimization.

Each event is event: <type>\ndata: <json>\n\n. For a typical text response Codex sees:

response.created
response.in_progress
response.output_item.added (for each output item)
response.content_part.added → response.output_text.delta → response.output_text.done → response.content_part.done (for message items)
response.function_call_arguments.delta → response.function_call_arguments.done (for function_call items)
response.output_item.done
response.completed — terminal event, carries the full response payload (including usage and x_consus_governance if applicable)

Tool Use

Tool definitions use the Responses-shape (flat, not nested under function):

{
  "tools": [
    {
      "type": "function",
      "name": "get_weather",
      "description": "Get weather for a location",
      "parameters": {
        "type": "object",
        "properties": {"location": {"type": "string"}},
        "required": ["location"]
      }
    }
  ]
}

When the model calls a tool, the response includes a function_call item in output:

{
  "type": "function_call",
  "id": "fc_1",
  "call_id": "call_abc",
  "name": "get_weather",
  "arguments": "{\"location\": \"Washington, DC\"}"
}

To send the result back, include a function_call_output item in the next request’s input array referencing the same call_id. The same destination-bearing parameter screener used by /v1/chat/completions runs on tool definitions — schemas with property names like destination_url, webhook_url, etc., are rejected with 400. See the Chat Completions doc for the full list.

Governance Metadata

When a function_call item’s arguments contains an outbound destination (URL, IPv4), the response body includes an advisory x_consus_governance.flags field alongside the standard Responses payload. The function call itself is not modified.

{
  "id": "resp_xyz",
  "output": [
    {
      "type": "function_call",
      "id": "fc_1",
      "call_id": "call_abc",
      "name": "save_results",
      "arguments": "{\"url\": \"https://collector.example.com/ingest\", \"data\": \"...\"}"
    }
  ],
  "usage": {...},
  "x_consus_governance": {
    "flags": [
      {
        "tool_call_id": "call_abc",
        "tool_name": "save_results",
        "destinations": ["https://collector.example.com/ingest"],
        "reason": "external_destination"
      }
    ]
  }
}

This is advisory — the gateway does not block or redact the call. Compliant clients check the field before executing the tool. The streaming endpoint embeds x_consus_governance in the final response.completed event’s response payload.

Examples

Basic completion

curl -X POST https://api.consus.io/v1/responses \
  -H "x-api-key: $CONSUS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.1:il5",
    "input": "Say hi in three words.",
    "store": false,
    "max_output_tokens": 50
  }'

With reasoning

curl -X POST https://api.consus.io/v1/responses \
  -H "x-api-key: $CONSUS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.1:il5",
    "input": "How are cranes removed from skyscrapers?",
    "store": false,
    "reasoning": {"effort": "high"},
    "max_output_tokens": 4096
  }'

Streaming

curl -X POST https://api.consus.io/v1/responses \
  -H "x-api-key: $CONSUS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.1:il5",
    "input": "Say hi.",
    "store": false,
    "stream": true
  }'

Multi-turn (stateless)

Send the full input array each turn — previous_response_id is not supported.

curl -X POST https://api.consus.io/v1/responses \
  -H "x-api-key: $CONSUS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.1:il5",
    "store": false,
    "input": [
      {"type": "message", "role": "user", "content": "What is FedRAMP?"},
      {"type": "message", "role": "assistant", "content": "FedRAMP is the Federal Risk and Authorization Management Program..."},
      {"type": "message", "role": "user", "content": "And how does IL5 differ?"}
    ]
  }'

Documentation Index

​Request

​Headers

​Body Parameters

​Response

​Stateless Mode

​Reasoning

​Streaming

​Tool Use

​Governance Metadata

​Examples

​Basic completion

​With reasoning

​Streaming

​Multi-turn (stateless)

Request

Headers

Body Parameters

Response

Stateless Mode

Reasoning

Streaming

Tool Use

Governance Metadata

Examples

Basic completion

With reasoning

Streaming

Multi-turn (stateless)