> ## Documentation Index
> Fetch the complete documentation index at: https://docs-preprod.sambanova.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Chat completion

The Chat completion API generates responses based on a given conversation. It supports both text-based and multimodal inputs.

<Note>
  Please see the [Text generation](/en/features/text-generation) capabilities document for additional usage information.
</Note>

## Endpoint

```python
POST https://api.sambanova.ai/v1/chat/completions
```

<Note>
  For SambaStack, developers should check with their system administrator for the correct URL.
</Note>

## Request parameters

The following table outlines the parameters required to make a chat completion request, parameter type, and description.

### Required parameters

| Parameter  | Type   | Description                                                                                                                                                                  |
| :--------- | ------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `model`    | String | The name of the model to query. Refer to the [SambaCloud models list](/en/models/sambacloud-models).                                                                         |
| `messages` | Array  | The conversation history. Each message has a `role` and `content`. See [message object structure](/api-reference/endpoints/chat/#message-object-structure) for more details. |

### Message object structure

Each message object within the `messages` array consists of `role` and `content`.

| Field     | Type   | Description                                                                                                                                                                                                                                                            |
| --------- | ------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `role`    | String | The role of the message author. Choices: `system`, `user`, or `assistant`.                                                                                                                                                                                             |
| `content` | Mixed  | The message content. A string for text-only messages, or an array for multimodal content. See examples of [string content](/api-reference/endpoints/chat/#example-string-content) and [multimodal content](/api-reference/endpoints/chat/#example-multimodal-content). |

#### Example string content

```
"content": "Answer the question in a couple sentences."
```

#### Example multimodal content

```
[
  { "type": "text", "text": "What's in this image?" },
  { "type": "image_url", "image_url": { "url": "base64 encoded string of image" } }
]
```

### Optional parameters

The following table outlines the optional parameters that can be used to fine-tune the model's behavior. You can see the parameter type, description, and default values.

| **Parameter**    | **Type**            | **Description**                                                                                                  | **Values**       |
| ---------------- | ------------------- | ---------------------------------------------------------------------------------------------------------------- | ---------------- |
| `max_tokens`     | Integer             | Maximum number of tokens to generate. Limited by model context length.                                           | None             |
| `temperature`    | Float               | Controls randomness in response. Higher values increase randomness.                                              | `0` to `1`       |
| `top_p`          | Float               | Adjusts token selection probability, ensuring dynamic response generation.                                       | `0` to `1`       |
| `top_k`          | Integer             | Limits the number of token choices.                                                                              | `1` to `100`     |
| `stop`           | String, Array, Null | Specifies up to four sequences where the API should stop generating responses. This helps control output length. | Default: `null`  |
| `stream`         | Boolean, Null       | Enables streaming responses when set to `true`. If `false`, the full response is returned after completion.      | Default: `false` |
| `stream_options` | Object, Null        | Specifies additional streaming options (only when `stream: true`). Available option: `include_usage: boolean`.   | Default: `null`  |

### Function calling parameters

Models that support function calling will have the following three parameters available to use. You can find detailed information about these parameters and supported models on the [function calling](/en/features/function-calling) page.

| **Parameter**     | **Type**       | **Description**                                                                                                                                                                                                                  | **Values**      |
| ----------------- | -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------- |
| `tools`           | Array          | Defines external tools the model can call (currently supports only functions). See [tools parameter usage](/api-reference/endpoints/chat/#Example-usage-of-tools-parameter) table.                                               | None            |
| `response_format` | Object         | Ensures output is valid JSON. Use `{ "type": "json_object" }` for structured responses. Use `{"type":"json_schema","json_schema":{..}`to enable structured outputs which ensures the model will match your supplied JSON schema. | None            |
| `tool_choice`     | String, Object | Controls tool usage (`auto`, `required`, or specific function). See [tool\_choice value](/api-reference/endpoints/chat/#accepted-values-for-tool-choice) table.                                                                  | Default: `auto` |

#### Example usage of tools parameter

The following table outlines the structure of the `tools` parameter.

| **Type** | **Object fields**                               | **Description**                                                                                                            |
| -------- | ----------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------- |
| Function | `name` (`string`)                               | The name of the function to call.                                                                                          |
|          | `description` (`string`)                        | A short description of what the function does.                                                                             |
|          | `parameters` (`object`)                         | Defines the function parameters.                                                                                           |
|          | `parameters.type` (`string`)                    | The data type of the parameters object (always `"object"`).                                                                |
|          | `parameters.properties` (`object`)              | Defines the function parameters and their properties.                                                                      |
|          | `parameters.properties.<param_name>` (`object`) | Each function parameter is defined as an object with: `type` (data type) and `description` (description of the parameter). |
|          | `parameters.required` (`array`)                 | A list of required parameters for the function.                                                                            |

#### Accepted values for tool choice

The following table illustrates how the `tool_choice` parameter controls the model's interaction with external functions.

| **Value**  | **Description**                                                                                                                                                                        |
| ---------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `auto`     | The model chooses between generating a message or calling a function. This is the default behavior when `tool_choice` is not specified.                                                |
| `required` | Forces the model to generate a function call. The model will always select one or more functions to call.                                                                              |
| `function` | To enforce a specific function call, set tool\_choice = `{'"type": "function", "function": {"name": "solve_quadratic"}}`. This ensures the model will only use the specified function. |

## Example requests

Below is a sample request body for a streaming response for a text model.

```python Example text model request
{
   "messages": [
      {"role": "system", "content": "Answer the question in a couple sentences."},
      {"role": "user", "content": "Share a happy story with me"}
   ],
   "max_tokens": 800,
   "stop": ["[INST", "[INST]", "[/INST]", "[/INST]"],
   "model": "Meta-Llama-3.1-8B-Instruct",
   "stream": true, 
   "stream_options": {"include_usage": true}
}
```

## Example response format

The API returns a chat completion object , or a streamed sequence of chat completion chunk objects, if the request is streamed.

### Chat completion response

Represents a chat completion response returned by model, based on the provided input.

```python Chat completing response
{
    "id": "chatcmpl-123",
    "object": "chat.completion",
    "created": 1677652288,
    "model": "Llama-3-8b-chat",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "\n\nHello there, how may I assist you today?",
            },
            "logprobs": null,
            "finish_reason": "stop",
        }
    ],
}
```

### Streaming response (chunked)

Represents a streaming response (chunked) returned by model, based on the provided input.

```json Streaming chat response (chunked)
{
  "id": "chatcmpl-123",
  "object": "chat.completion.chunk",
  "created": 1694268190,
  "model": "Llama-3-8b-chat",
  "system_fingerprint": "fp_44709d6fcb",
  "choices": [
    {
      "index": 0,
      "delta": {},
      "logprobs": null,
      "finish_reason": "stop"
    }
  ]
}
```

## Response fields

The following table provides a list of key properties, parameter type, and description.

<Note>
  If a request fails, the response body provides a JSON object with details about the error. For more information on errors, please see the [API error codes](/api-reference/using-the-api/api-error-codes) page.
</Note>

| Property                       | Type    | Description                                                                                                                                                                                                      |
| ------------------------------ | ------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `id`                           | String  | A unique identifier for the chat completion.                                                                                                                                                                     |
| `choices`                      | Array   | A list containing a single chat completion.                                                                                                                                                                      |
| `created`                      | Integer | The Unix timestamp (in seconds) of when the chat completion was created. Each chunk has the same timestamp.                                                                                                      |
| `model`                        | String  | The model used to generate the completion.                                                                                                                                                                       |
| `object`                       | String  | The object type, which is always `chat.completion`.                                                                                                                                                              |
| `usage`                        | Object  | An optional field present when `stream_options: {"include_usage": true}` is set. When present, it contains a null value except for the last chunk, which includes token usage statistics for the entire request. |
| `throughput_after_first_token` | Float   | The rate (tokens per second) at which output tokens are generated after the first token has been delivered.                                                                                                      |
| `time_to_first_token`          | Float   | The time (in seconds) the model takes to generate the first token.                                                                                                                                               |
| `model_execution_time`         | Float   | The time (in seconds) required to generate a complete response or all tokens.                                                                                                                                    |
| `output_tokens_count`          | Integer | Number of tokens generated in the response.                                                                                                                                                                      |
| `input_tokens_count`           | Integer | Number of tokens in the input prompt.                                                                                                                                                                            |
| `total_tokens_count`           | Integer | The sum of input and output tokens.                                                                                                                                                                              |
| `queue_time`                   | Float   | The time (in seconds) a request spends waiting in the queue before being processed by the model.                                                                                                                 |
