Skip to content

Chat Completions Endpoint

This page documents the usage of the Chat Completions endpoint controller and its request builder methods. Each method is type-hinted for IDE autocompletion and safe usage.


Overview

The ChatCompletionsEndpointController exposes a request field with a rich set of builder methods for constructing and sending chat completion requests. All methods support method chaining for a fluent interface.


Request Builder Methods

prepare

Initialize a new request with a dictionary or model:

chat.completions.request.prepare({
    "model": "6948fe4d-98ce-4f36-bc49-5f652cc07b65",
    "messages": [{"content": "Hello!", "role": "user"}]
})

with_model

Set the model to use for the request:

chat.completions.request.with_model("6948fe4d-98ce-4f36-bc49-5f652cc07b65")

add_text_message

Add a user or assistant message:

chat.completions.request.add_text_message("What's the weather in Paris?", role="user")

add_image_to_last_message

Attach an image to the last message:

chat.completions.request.add_image_to_last_message(image_url="https://sketchok.com/images/articles/06-anime/003-pokemon/01/10.jpg")

with_search_engine_tool

Enable the web search tool for the request:

from nexosapi.domain.metadata import WebSearchToolOptions, WebSearchUserLocation

search_options = WebSearchToolOptions(
    search_context_size="medium",
    user_location=WebSearchUserLocation(city="Paris", country="France")
)
chat.completions.request.with_search_engine_tool(options=search_options)

with_rag_tool

Enable the RAG tool for retrieval-augmented generation:

from nexosapi.domain.metadata import RAGToolOptions

rag_options = RAGToolOptions(
    collection_uuid="my-corpus-uuid",
    query="What is the capital of France?",
    threshold=0.8,
    top_n=5,
    model_uuid="model-uuid"
)
chat.completions.request.with_rag_tool(options=rag_options)

with_ocr_tool

Enable the OCR tool for extracting text from images:

from nexosapi.domain.metadata import OCRToolOptions

ocr_options = OCRToolOptions(file_id="file-uuid")
chat.completions.request.with_ocr_tool(options=ocr_options)

with_parallel_tool_calls

Enable or disable parallel tool calls:

chat.completions.request.with_parallel_tool_calls(enabled=True)

with_thinking

Enable or disable the thinking mode:

from nexosapi.domain.metadata import ChatThinkingModeConfiguration

thinking_config = ChatThinkingModeConfiguration(type="enabled", budget_tokens=1024)
chat.completions.request.with_thinking(config=thinking_config)

with_tool_choice

Specify which tool to use (e.g., auto or a specific function):

chat.completions.request.with_tool_choice("auto")
chat.completions.request.with_tool_choice("name:my_function")

set_response_structure

Define the expected response schema (Pydantic model or dict):

from pydantic import BaseModel
class MyResponse(BaseModel):
    result: str
chat.completions.request.set_response_structure(MyResponse)

dump

Show the current request payload:

print(chat.completions.request.dump())

send

Send the request asynchronously and get the response:

response = await chat.completions.request.send()
print(response.model_dump())

reload_last

Reload the last request for reuse:

chat.completions.request.reload_last()

Example: Full Request Flow

from nexosapi.api.endpoints import chat
from nexosapi.domain.metadata import WebSearchToolOptions, WebSearchUserLocation
from pydantic import BaseModel

class WeatherResponse(BaseModel):
    temperature: float
    description: str

chat.completions.request.prepare({
    "model": "6948fe4d-98ce-4f36-bc49-5f652cc07b65",
    "messages": [{"content": "What's the weather in Paris?", "role": "user"}]
})\
.with_search_engine_tool(options=WebSearchToolOptions(
    search_context_size="medium",
    user_location=WebSearchUserLocation(city="Paris", country="France")
))\
.set_response_structure(WeatherResponse)

response = await chat.completions.request.send()
print(response.model_dump())