Using SambaNova supported vision models, users can process multimodal inputs consisting of text and images. These models understand and analyze images to then generate text based on the context. Learn how to query SambaNova vision models using OpenAI’s Python client.Documentation Index
Fetch the complete documentation index at: https://docs-preprod.sambanova.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Make a query with an image
On SambaNova, the vision model request follows OpenAI’s multimodal input format which accepts both text and image inputs in a structured payload. While the call is similar to Text Generation, it differs by including an encoded image file, referenced via theimage_path variable. A helper function is used to convert this image into a base64 string, allowing it to be passed alongside the text in the request.
Step 1
Make a new Python file and copy the code below.;
This example uses the Llama-4-Maverick-17B-128E-Instruct model.
Step 2
Use your SambaNova API key and base URL from the API keys and URLs page to replace the string fields
"your-sambanova-api-key" and "your-sambanova-base-url"in the construction of the client.