Documentation Index
Fetch the complete documentation index at: https://gateway.consus.io/llms.txt
Use this file to discover all available pages before exploring further.
POST /v1/responses
Creates a response using OpenAI’s Responses API format. This is the endpoint Codex CLI targets when configured with wire_api = "responses".
The endpoint is a thin proxy to Azure Government OpenAI’s native Responses API and is restricted to Azure-routed GPT models. For Claude, use /v1/messages. For multi-provider access via the OpenAI Chat shape, use /v1/chat/completions.
Request
Headers
| Header | Required | Description |
|---|---|---|
x-api-key | Yes | Your API key |
Content-Type | Yes | application/json |
Body Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Composite model ID (e.g., gpt-5.1:il5). Must resolve to an Azure-routed GPT model. |
input | string or array | Yes | Either a plain string or an array of typed input items (message, function_call, function_call_output). |
instructions | string | No | System-style instructions, separate from input. |
tools | array | No | Function tools the model may call. Responses-shape: {type: "function", name, description, parameters} (flat, not nested under a function key). |
tool_choice | string or object | No | auto, none, required, or {type: "function", name: "..."}. |
reasoning | object | No | {effort: "low" | "medium" | "high", summary: "auto"} for reasoning-capable models. |
max_output_tokens | integer | No | Cap on output token count. |
temperature, top_p, parallel_tool_calls, metadata | various | No | Forwarded to Azure unchanged. |
stream | boolean | No | If true, returns SSE. See Streaming. |
store | boolean | No | Must be false or omitted. See Stateless mode. |
previous_response_id | string | No | Not accepted. See Stateless mode. |
Response
Returns Azure’s full response payload unchanged, with an optionalx_consus_governance field appended when tool-output governance flags are present.
Stateless Mode
Consus Gateway is stateless. The gateway does not store response payloads, conversation history, or any other state across requests. Two consequences:store: trueis rejected with400 invalid_request_error. Omitstoreor set it tofalse.previous_response_idis rejected with400 invalid_request_error. Send the full input array each turn.
store: false on any base URL that doesn’t look like a direct Azure endpoint, and resends conversation context each turn. You don’t need to change anything in your Codex config; the rejection only matters if you’re using a custom client.
Reasoning
gpt-5.1 supports extended reasoning. Pass reasoning: {effort: "high"} in the request body, or set model_reasoning_effort = "high" in your Codex config. Effort defaults to the catalog’s per-model default if omitted.
Reasoning output items (type: "reasoning") appear in the output array alongside message and function_call items. Encrypted reasoning content for stateful multi-turn is not exposed (the gateway is stateless — see Stateless Mode).
Streaming
Streaming is not real-time today. Setting
stream: true is accepted, but the gateway buffers the full upstream response before emitting SSE events. The wire shape is the standard OpenAI Responses event sequence (response.created → response.output_item.added → response.output_text.delta → response.output_item.done → response.completed) and Codex parses it correctly, but tokens do not arrive incrementally. All events flush together once the upstream model finishes. Incremental streaming is a planned optimization.event: <type>\ndata: <json>\n\n. For a typical text response Codex sees:
response.createdresponse.in_progressresponse.output_item.added(for each output item)response.content_part.added→response.output_text.delta→response.output_text.done→response.content_part.done(formessageitems)response.function_call_arguments.delta→response.function_call_arguments.done(forfunction_callitems)response.output_item.doneresponse.completed— terminal event, carries the full response payload (includingusageandx_consus_governanceif applicable)
Tool Use
Tool definitions use the Responses-shape (flat, not nested underfunction):
function_call item in output:
function_call_output item in the next request’s input array referencing the same call_id.
The same destination-bearing parameter screener used by /v1/chat/completions runs on tool definitions — schemas with property names like destination_url, webhook_url, etc., are rejected with 400. See the Chat Completions doc for the full list.
Governance Metadata
When afunction_call item’s arguments contains an outbound destination (URL, IPv4), the response body includes an advisory x_consus_governance.flags field alongside the standard Responses payload. The function call itself is not modified.
x_consus_governance in the final response.completed event’s response payload.
Examples
Basic completion
With reasoning
Streaming
Multi-turn (stateless)
Send the full input array each turn —previous_response_id is not supported.