After uploading files to an assistant, you can chat with the assistant.
This page shows you how to chat with an assistant using the OpenAI-compatible chat interface. This interface is based on the OpenAI Chat Completion API, a commonly used and adopted API. It is useful if you need inline citations or OpenAI-compatible responses, but has limited functionality compared to the standard chat interface.
The standard chat interface is the recommended way to chat with an assistant, as it offers more functionality and control over the assistant’s responses and references. Chat with an assistant
The OpenAI-compatible chat interface can return responses in two different formats:
- Default response: The assistant returns a response in a single string field, which includes citation information.
- Streaming response: The assistant returns the response as a text stream.
Default response
The following example sends a message and requests a response in the default format:
The content parameter in the request cannot be empty.
# To use the Python SDK, install the plugin:
# pip install --upgrade pinecone pinecone-plugin-assistant
from pinecone import Pinecone
from pinecone_plugins.assistant.models.chat import Message
pc = Pinecone(api_key="YOUR_API_KEY")
# Get your assistant.
assistant = pc.assistant.Assistant(
    assistant_name="example-assistant", 
)
# Chat with the assistant.
chat_context = [Message(role="user", content='What is the maximum height of a red pine?')]
response = assistant.chat_completions(messages=chat_context)
print(response)
{"chat_completion":
  {
    "id":"chatcmpl-9OtJCcR0SJQdgbCDc9JfRZy8g7VJR",
    "choices":[
      {
        "finish_reason":"stop",
        "index":0,
        "message":{
          "role":"assistant",
          "content":"The maximum height of a red pine (Pinus resinosa) is up to 25 meters."
        }
      }
    ],
    "model":"my_assistant"
  }
}
Streaming response
The following example sends a messages and requests a streaming response:
The content parameter in the request cannot be empty.
# To use the Python SDK, install the plugin:
# pip install --upgrade pinecone pinecone-plugin-assistant
from pinecone import Pinecone
from pinecone_plugins.assistant.models.chat import Message
pc = Pinecone(api_key="YOUR_API_KEY")
# Get your assistant.
assistant = pc.assistant.Assistant(
    assistant_name="example-assistant" 
)
# Streaming chat with the Assistant.
chat_context = [Message(role="user", content="What is the maximum height of a red pine?")]
response = assistant.chat_completions(messages=[chat_context], stream=True)
for data in response:
    if data:
        print(data)
{
  'id': '000000000000000009de65aa87adbcf0', 
  'choices': [
      {
      'index': 0, 
      'delta': 
        {
        'role': 'assistant', 
        'content': 'The'
        }, 
      'finish_reason': None
      }
    ], 
  'model': 'gpt-4o-2024-05-13'
}
...
{
  'id': '00000000000000007a927260910f5839',
  'choices': [
      {
      'index': 0,
      'delta':
        {
          'role': '', 
          'content': 'The'
        }, 
      'finish_reason': None
      }
    ], 
  'model': 'gpt-4o-2024-05-13'
}
...
{
  'id': '00000000000000007a927260910f5839', 
  'choices': [
    {
      'index': 0, 
      'delta': 
        {
        'role': None, 
        'content': None
        }, 
      'finish_reason': 'stop'
      }
    ], 
  'model': 'gpt-4o-2024-05-13'
}
- Message start: Includes "role":"assistant", which indicates that the assistant is responding to the user’s message.
- Content: Includes a value in the contentfield (e.g.,"content":"The"), which is part of the assistant’s streamed response to the user’s message.
- Message end: Includes "finish_reason":"stop", which indicates that the assistant has finished responding to the user’s message.
In the assistant’s response, the message string is contained in the following JSON object:
- choices.[0].message.contentfor the default chat response
- choices[0].delta.contentfor the streaming chat response
You can extract the message content and print it to the console:-  Default response 
-  Streaming response 
print(str(response.choices[0].message.content))
A red pine, scientifically known as *Pinus resinosa*, is a medium-sized tree that can grow up to 25 meters high and 75 centimeters in diameter. [1, pp. 1]
Choose a model
Pinecone Assistant supports the following models:
- gpt-4o(default)
- gpt-4.1
- o4-mini
- claude-3-5-sonnet
- claude-3-7-sonnet
- gemini-2.5-pro
To choose a non-default model for your assistant, set themodel parameter in the request:
# To use the Python SDK, install the plugin:
# pip install --upgrade pinecone pinecone-plugin-assistant
from pinecone import Pinecone
from pinecone_plugins.assistant.models.chat import Message
pc = Pinecone(api_key="YOUR_API_KEY")
# Get your assistant.
assistant = pc.assistant.Assistant(
    assistant_name="example-assistant", 
)
# Chat with the assistant.
chat_context = [Message(role="user", content="What is the maximum height of a red pine?")]
response = assistant.chat_completions(
    messages=chat_context, 
    model="gpt-4.1"
)
"resource": "encyclopedia".
# To use the Python SDK, install the plugin:
# pip install --upgrade pinecone pinecone-plugin-assistant
from pinecone import Pinecone
from pinecone_plugins.assistant.models.chat import Message
pc = Pinecone(api_key="YOUR_API_KEY")
# Get your assistant.
assistant = pc.assistant.Assistant(
    assistant_name="example-assistant", 
)
# Chat with the assistant.
chat_context = [Message(role="user", content="What is the maximum height of a red pine?")]
response = assistant.chat_completions(messages=chat_context, stream=True, filter={"resource": "encyclopedia"})
Set the sampling temperature
This is available in API versions 2025-04 and later.
temperarture parameter in the request. If a model does not support a temperature parameter, the parameter is ignored.
# To use the Python SDK, install the plugin:
# pip install --upgrade pinecone pinecone-plugin-assistant
from pinecone import Pinecone
from pinecone_plugins.assistant.models.chat import Message
pc = Pinecone(api_key="YOUR_API_KEY")
assistant = pc.assistant.Assistant(assistant_name="example-assistant")
msg = Message(role="user", content="Who is the CFO of Netflix?")
response = assistant.chat_completions(
    messages=[msg], 
    temperature=0.8
)
print(response)