> ## Documentation Index
> Fetch the complete documentation index at: https://docs-preprod.sambanova.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Llama Stack

In this guide, you’ll learn how to set up and use **Llama Stack**—a standardized framework that simplifies AI application development. We’ll walk you through building the SambaNova distribution server, installing the client, and running your first model inference. Whether you're prototyping or scaling up, this guide will help you get started quickly with best practices from the Llama ecosystem integrated into a modular, efficient architecture.

### Components of Llama Stack

Llama Stack includes two main components:

* Server – A running distribution of Llama Stack that hosts various adaptors.
* Client – A consumer of the server's API, interacting with the hosted adaptors.

## Get your SambaCloud API key

1. Create a [SambaCloud](https://cloud.sambanova.ai/apis) account.
2. Navigate to the API key section.
3. Generate a new key (if you don’t already have one).
4. Copy and store key securely

## Build the SambaNova Llama Stack server

* Set up a Python virtual environment

```bash
python -m venv .venv
source .venv/bin/activate
```

* Install required dependencies

```bash
pip install uv
pip install llama-stack
```

## Run the SambaNova distribution server

* Export required environment variables

```bash
export LLAMA_STACK_PORT=8321
export ENABLE_SAMBANOVA=sambanova
export SAMBANOVA_API_KEY="your-api-key-here"
```

* Run the server with Docker

```bash
docker run -it \
  -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
  -v ~/.llama:/root/.llama \
  llamastack/distribution-starter \
  --port $LLAMA_STACK_PORT \
  --env SAMBANOVA_API_KEY=$SAMBANOVA_API_KEY
```

## Install the Llama Stack client

In the same or another environment, run:

```bash
  pip install llama-stack-client
```

## Use the client to interact with the server

The following Python code demonstrates basic usage:

```python
from llama_stack_client import LlamaStackClient

LLAMA_STACK_PORT = 8321
client = LlamaStackClient(base_url=f"http://localhost:{LLAMA_STACK_PORT}")

# List all available models
models = client.models.list()
print("--- Available models: ---")
for m in models:
    print(f"- {m.identifier}")
print()

# Choose a model from the list
model = "sambanova/sambanova/Meta-Llama-3.3-70B-Instruct"

# Run chat completion
response = client.inference.chat_completion(
    messages=[
        {"role": "system", "content": "You are a friendly assistant."},
        {"role": "user", "content": "Write a two-sentence poem about llama."},
    ],
    model_id=model,
)

print(response.completion_message.content)
```

This demonstrates the full client-server loop - connecting, listing models, and running inference.

Explore the [SambaNova Llama Stack integration repo](https://github.com/sambanova/integrations/tree/main/llama_stack) to find several use cases using SambaNova distribution LLMs, Embeddings, tools, and agent adaptors .

### **Llama Stack documentation**

Refer to the [Llama Stack](https://llama-stack.readthedocs.io/en/latest/) docs to:

* Understand core concepts
* Dive into sample apps
* Learn how to extend and customize the framework