Quick Start

Quick start CLI, Config, Docker

LiteLLM Server manages:

Unified Interface: Calling 100+ LLMs Huggingface/Bedrock/TogetherAI/etc. in the OpenAI ChatCompletions & Completions format
Load Balancing: between Multiple Models + Deployments of the same model - LiteLLM proxy can handle 1.5k+ requests/second during load tests.
Cost tracking: Authentication & Spend Tracking Virtual Keys

View all the supported args for the Proxy CLI here

$ pip install litellm[proxy]

If this fails try running

$ pip install 'litellm[proxy]'

Quick Start - LiteLLM Proxy CLI

Run the following command to start the litellm proxy

$ litellm --model huggingface/bigcode/starcoder

#INFO: Proxy running on http://0.0.0.0:8000

Test

In a new shell, run, this will make an openai.chat.completions request. Ensure you're using openai v1.0.0+

litellm --test

This will now automatically route any requests for gpt-3.5-turbo to bigcode starcoder, hosted on huggingface inference endpoints.

Using LiteLLM Proxy - Curl Request, OpenAI Package, Langchain, Langchain JS

Curl Request
OpenAI v1.0.0+
Langchain
Langchain Embeddings

curl --location 'http://0.0.0.0:8000/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
      "model": "gpt-3.5-turbo",
      "messages": [
        {
          "role": "user",
          "content": "what llm are you"
        }
      ],
    }
'

import openai
client = openai.OpenAI(
    api_key="anything",
    base_url="http://0.0.0.0:8000"
)

# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [
    {
        "role": "user",
        "content": "this is a test request, write a short poem"
    }
])

print(response)

from langchain.chat_models import ChatOpenAI
from langchain.prompts.chat import (
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
    SystemMessagePromptTemplate,
)
from langchain.schema import HumanMessage, SystemMessage

chat = ChatOpenAI(
    openai_api_base="http://0.0.0.0:8000", # set openai_api_base to the LiteLLM Proxy
    model = "gpt-3.5-turbo",
    temperature=0.1
)

messages = [
    SystemMessage(
        content="You are a helpful assistant that im using to make a test request to."
    ),
    HumanMessage(
        content="test from litellm. tell me why it's amazing in 1 sentence"
    ),
]
response = chat(messages)

print(response)

from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="sagemaker-embeddings", openai_api_base="http://0.0.0.0:8000", openai_api_key="temp-key")

text = "This is a test document."

query_result = embeddings.embed_query(text)

print(f"SAGEMAKER EMBEDDINGS")
print(query_result[:5])

embeddings = OpenAIEmbeddings(model="bedrock-embeddings", openai_api_base="http://0.0.0.0:8000", openai_api_key="temp-key")

text = "This is a test document."

query_result = embeddings.embed_query(text)

print(f"BEDROCK EMBEDDINGS")
print(query_result[:5])

embeddings = OpenAIEmbeddings(model="bedrock-titan-embeddings", openai_api_base="http://0.0.0.0:8000", openai_api_key="temp-key")

text = "This is a test document."

query_result = embeddings.embed_query(text)

print(f"TITAN EMBEDDINGS")
print(query_result[:5])

Supported LLMs

All LiteLLM supported LLMs are supported on the Proxy. Seel all supported llms

$ export AWS_ACCESS_KEY_ID=
$ export AWS_REGION_NAME=
$ export AWS_SECRET_ACCESS_KEY=

$ litellm --model bedrock/anthropic.claude-v2

$ export AZURE_API_KEY=my-api-key
$ export AZURE_API_BASE=my-api-base

$ litellm --model azure/my-deployment-name

$ export OPENAI_API_KEY=my-api-key

$ litellm --model gpt-3.5-turbo

$ export OPENAI_API_KEY=my-api-key

$ litellm --model openai/<your model name> --api_base <your-api-base> # e.g. http://0.0.0.0:3000

$ export HUGGINGFACE_API_KEY=my-api-key #[OPTIONAL]

$ litellm --model huggingface/<your model name> --api_base <your-api-base> # e.g. http://0.0.0.0:3000

$ litellm --model huggingface/<your model name> --api_base http://0.0.0.0:8001

export AWS_ACCESS_KEY_ID=
export AWS_REGION_NAME=
export AWS_SECRET_ACCESS_KEY=

$ litellm --model sagemaker/jumpstart-dft-meta-textgeneration-llama-2-7b

$ export ANTHROPIC_API_KEY=my-api-key

$ litellm --model claude-instant-1

Assuming you're running vllm locally

$ litellm --model vllm/facebook/opt-125m

$ export TOGETHERAI_API_KEY=my-api-key

$ litellm --model together_ai/lmsys/vicuna-13b-v1.5-16k

$ export REPLICATE_API_KEY=my-api-key

$ litellm \
  --model replicate/meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3

$ litellm --model petals/meta-llama/Llama-2-70b-chat-hf

$ export PALM_API_KEY=my-palm-key

$ litellm --model palm/chat-bison

$ export AI21_API_KEY=my-api-key

$ litellm --model j2-light

$ export COHERE_API_KEY=my-api-key

$ litellm --model command-nightly

Quick Start - LiteLLM Proxy + Config.yaml

The config allows you to create a model list and set api_base, max_tokens (all litellm params). See more details about the config here

Create a Config for LiteLLM Proxy

Example config

model_list: 
  - model_name: gpt-3.5-turbo # user-facing model alias
    litellm_params: # all params accepted by litellm.completion() - https://docs.litellm.ai/docs/completion/input
      model: azure/<your-deployment-name>
      api_base: <your-azure-api-endpoint>
      api_key: <your-azure-api-key>
  - model_name: gpt-3.5-turbo
    litellm_params:
      model: azure/gpt-turbo-small-ca
      api_base: https://my-endpoint-canada-berri992.openai.azure.com/
      api_key: <your-azure-api-key>
  - model_name: vllm-model
    litellm_params:
      model: openai/<your-model-name>
      api_base: <your-api-base> # e.g. http://0.0.0.0:3000

Run proxy with config

litellm --config your_config.yaml

More Info

Server Endpoints

POST /chat/completions - chat completions endpoint to call 100+ LLMs
POST /completions - completions endpoint
POST /embeddings - embedding endpoint for Azure, OpenAI, Huggingface endpoints
GET /models - available models on server
POST /key/generate - generate a key to access the proxy

Gunicorn + Proxy

Command:

cmd = f"gunicorn litellm.proxy.proxy_server:app --workers {num_workers} --worker-class uvicorn.workers.UvicornWorker --bind {host}:{port}"

Code

Quick Start Docker Image: Github Container Registry

Pull the litellm ghcr docker image

See the latest available ghcr docker image here: https://github.com/berriai/litellm/pkgs/container/litellm

docker pull ghcr.io/berriai/litellm:main-v1.10.1

Run the Docker Image

docker run ghcr.io/berriai/litellm:main-v1.10.0

Run the Docker Image with LiteLLM CLI args

See all supported CLI args here:

Here's how you can run the docker image and pass your config to litellm

docker run ghcr.io/berriai/litellm:main-v1.10.0 --config your_config.yaml

Here's how you can run the docker image and start litellm on port 8002 with num_workers=8

docker run ghcr.io/berriai/litellm:main-v1.10.0 --port 8002 --num_workers 8

Run the Docker Image using docker compose

Step 1

(Recommended) Use the example file docker-compose.example.yml given in the project root. e.g. https://github.com/BerriAI/litellm/blob/main/docker-compose.example.yml
Rename the file docker-compose.example.yml to docker-compose.yml.

Here's an example docker-compose.yml file

version: "3.9"
services:
  litellm:
    image: ghcr.io/berriai/litellm:main
    ports:
      - "8000:8000" # Map the container port to the host, change the host port if necessary
    volumes:
      - ./litellm-config.yaml:/app/config.yaml # Mount the local configuration file
    # You can change the port or number of workers as per your requirements or pass any new supported CLI augument. Make sure the port passed here matches with the container port defined above in `ports` value
    command: [ "--config", "/app/config.yaml", "--port", "8000", "--num_workers", "8" ]

# ...rest of your docker-compose config if any

Step 2

Create a litellm-config.yaml file with your LiteLLM config relative to your docker-compose.yml file.

Check the config doc here

Step 3

Run the command docker-compose up or docker compose up as per your docker installation.

Use -d flag to run the container in detached mode (background) e.g. docker compose up -d

Your LiteLLM container should be running now on the defined port e.g. 8000.

Using with OpenAI compatible projects

Set base_url to the LiteLLM Proxy server

import openai
client = openai.OpenAI(
    api_key="anything",
    base_url="http://0.0.0.0:8000"
)

# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [
    {
        "role": "user",
        "content": "this is a test request, write a short poem"
    }
])

print(response)

Start the LiteLLM proxy

litellm --model gpt-3.5-turbo

#INFO: Proxy running on http://0.0.0.0:8000

1. Clone the repo

git clone https://github.com/danny-avila/LibreChat.git

2. Modify Librechat's `docker-compose.yml`

LiteLLM Proxy is running on port 8000, set 8000 as the proxy below

OPENAI_REVERSE_PROXY=http://host.docker.internal:8000/v1/chat/completions

3. Save fake OpenAI key in Librechat's `.env`

Copy Librechat's .env.example to .env and overwrite the default OPENAI_API_KEY (by default it requires the user to pass a key).

OPENAI_API_KEY=sk-1234

4. Run LibreChat:

docker compose up

Continue-Dev brings ChatGPT to VSCode. See how to install it here.

In the config.py set this as your default model.

  default=OpenAI(
      api_key="IGNORED",
      model="fake-model-name",
      context_length=2048, # customize if needed for your model
      api_base="http://localhost:8000" # your proxy server url
  ),

Credits @vividfog for this tutorial.

$ pip install aider 

$ aider --openai-api-base http://0.0.0.0:8000 --openai-api-key fake-key

pip install pyautogen

from autogen import AssistantAgent, UserProxyAgent, oai
config_list=[
    {
        "model": "my-fake-model",
        "api_base": "http://localhost:8000",  #litellm compatible endpoint
        "api_type": "open_ai",
        "api_key": "NULL", # just a placeholder
    }
]

response = oai.Completion.create(config_list=config_list, prompt="Hi")
print(response) # works fine

llm_config={
    "config_list": config_list,
}

assistant = AssistantAgent("assistant", llm_config=llm_config)
user_proxy = UserProxyAgent("user_proxy")
user_proxy.initiate_chat(assistant, message="Plot a chart of META and TESLA stock price change YTD.", config_list=config_list)

Credits @victordibia for this tutorial.

A guidance language for controlling large language models. https://github.com/guidance-ai/guidance

NOTE: Guidance sends additional params like stop_sequences which can cause some models to fail if they don't support it.

Fix: Start your proxy using the --drop_params flag

litellm --model ollama/codellama --temperature 0.3 --max_tokens 2048 --drop_params

import guidance

# set api_base to your proxy
# set api_key to anything
gpt4 = guidance.llms.OpenAI("gpt-4", api_base="http://0.0.0.0:8000", api_key="anything")

experts = guidance('''
{{#system~}}
You are a helpful and terse assistant.
{{~/system}}

{{#user~}}
I want a response to the following question:
{{query}}
Name 3 world-class experts (past or present) who would be great at answering this?
Don't answer the question yet.
{{~/user}}

{{#assistant~}}
{{gen 'expert_names' temperature=0 max_tokens=300}}
{{~/assistant}}
''', llm=gpt4)

result = experts(query='How can I be more productive?')
print(result)

Debugging Proxy

Run the proxy with --debug to easily view debug logs

litellm --model gpt-3.5-turbo --debug

When making requests you should see the POST request sent by LiteLLM to the LLM on the Terminal output

POST Request Sent from LiteLLM:
curl -X POST \
https://api.openai.com/v1/chat/completions \
-H 'content-type: application/json' -H 'Authorization: Bearer sk-qnWGUIW9****************************************' \
-d '{"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "this is a test request, write a short poem"}]}'

Quick Start

Quick Start - LiteLLM Proxy CLI​

Test​

Using LiteLLM Proxy - Curl Request, OpenAI Package, Langchain, Langchain JS​

Supported LLMs​

Quick Start - LiteLLM Proxy + Config.yaml​

Create a Config for LiteLLM Proxy​

Run proxy with config​

Server Endpoints​

Gunicorn + Proxy​

Quick Start Docker Image: Github Container Registry​

Pull the litellm ghcr docker image​

Run the Docker Image​

Run the Docker Image with LiteLLM CLI args​

Run the Docker Image using docker compose​

Using with OpenAI compatible projects​

Start the LiteLLM proxy​

1. Clone the repo​

2. Modify Librechat's docker-compose.yml​

3. Save fake OpenAI key in Librechat's .env​

4. Run LibreChat:​

Debugging Proxy​