requests
NexosAPIRequest
Bases: NullableBaseModel
Base class for all API requests to the NEXOS API. This class serves as a foundation for defining specific API request models. It can be extended to create more specific request models for different API endpoints.
ChatCompletionsRequest
Bases: NexosAPIRequest
Request model for the Nexos.ai Chat Completions API.
Use this model to serialize the HTTP request body for the chat completions endpoint. Fields mirror the public API and include validation where applicable.
Attributes:
| Name | Type | Description |
|---|---|---|
model |
str
|
The model ID to use for this completion (e.g., "6948fe4d-98ce-4f36-bc49-5f652cc07b65"). |
messages |
list[ChatMessage]
|
The ordered conversation so far (min length: 1). Depending on the model, different message modalities are supported (e.g., text, images, audio). Common roles include: - Developer/System: instructions the model should follow. With o1 models and newer, prefer a developer message instead of system. - User: end-user prompts or context. - Assistant: prior model responses. - Tool / Function: tool results routed back to the model. |
store |
bool | None
|
Whether to store the output of this request. |
metadata |
dict[str, str] | None
|
Developer-defined tags and values used for filtering completions in the dashboard. |
frequency_penalty |
float
|
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the likelihood of verbatim repetition. Default: 0. |
logit_bias |
dict[str, float] | None
|
Per-token bias added to the model's logits. Keys are tokenizer token IDs (as strings) and values are in [-100, 100]. Small magnitudes tweak likelihood; large magnitudes can effectively ban or force tokens. |
logprobs |
bool | None
|
Whether to return log probabilities of the output tokens. If true, returns log probabilities for each output token in the message content. |
top_logprobs |
int | None
|
The number of most likely tokens to return at each token position, each with an associated log probability. Range: [0, 20]. Requires |
max_completion_tokens |
int | None
|
Upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens. |
n |
int
|
How many chat completion choices to generate for each input message. Range: [1, 128]. Costs scale with the number of generated tokens across all choices. Default: 1. |
modalities |
list[Literal['text', 'audio']]
|
Output types to generate. Most models support "text". To request both text and audio: ["text", "audio"]. |
prediction |
PredictionType | None
|
Configuration for a Predicted Output, which can improve response times when large parts of the response are known ahead of time (e.g., static content). |
presence_penalty |
float
|
Number between -2.0 and 2.0. Positive values penalize tokens that already appeared, nudging the model toward new topics. Default: 0. |
audio |
AudioConfiguration | None
|
Parameters for audio output. Required when |
response_format |
dict[str, Any] | None
|
Constrains the output format. Supported values include: - {"type": "text"} - {"type": "json_object"} - {"type": "json_schema", "json_schema": {...}} (Structured Outputs) When using {"type": "json_object"}, also instruct the model (via messages) to produce JSON; otherwise it may stream whitespace until the token limit. Note: message content may be truncated if |
seed |
int | None
|
Best-effort deterministic sampling seed. Range: [-9223372036854776000, 9223372036854776000]. Determinism is not guaranteed; monitor changes via |
service_tier |
Literal['auto', 'default']
|
Latency tier selection. - "auto": Uses scale tier if enabled; otherwise default tier. - "default": Uses the default tier (lower uptime SLA, no latency guarantee). The response may include the service tier utilized. Default: "auto". |
stop |
str | list[str] | None
|
Up to 4 stop sequences at which token generation will halt. |
stream |
bool | None
|
If true, sends partial message deltas as server-sent events (SSE). The stream terminates with: |
stream_options |
dict[str, Any] | None
|
Options for streaming responses. Only set when |
temperature |
float
|
Sampling temperature in [0, 2]. Higher values increase randomness; lower values increase determinism. Default: 1. Generally tune either |
top_p |
float
|
Nucleus sampling probability mass in (0, 1]. For example, 0.1 considers only the tokens comprising the top 10% probability mass. Default: 1. Generally tune either |
tools |
list[dict[str, Any]] | None
|
A list of tools the model may call (max 128). Supported tool types: "function", "web_search", "rag", "tika_ocr". Provide tool-specific payloads under their respective keys. |
tool_choice |
str | dict[str, Any] | None
|
Controls tool invocation. - "none": do not call tools (generate a message instead) - "auto": model may choose to generate a message or call tools - "required": model must call one or more tools - Object to force a specific tool, e.g. {"type": "function", "function": {"name": "my_function"}} Defaults: "none" if no tools; "auto" if tools are present. |
parallel_tool_calls |
bool | None
|
Whether to enable parallel function calling during tool use. API default: true. |
thinking |
ChatThinkingModeConfiguration | None
|
Reasoning/thinking configuration. Common fields: - type (e.g., "enabled") - budget_tokens (e.g., 1024) Notes ----- - |
StorageDownloadRequest
StorageGetRequest
StorageDeleteRequest
TeamApiKeyCreateRequest
TeamApiKeyUpdateRequest
TeamApiKeyDeleteRequest
TeamApiKeyRegenerateRequest
Bases: NexosAPIRequest
Request to regenerate an API key for a team. This request does not require any additional parameters. It simply triggers the regeneration of the API key.