Multi-Turn Chat

The Chat API is fully based on LMDeploy. You can refer to the LMDeploy documentation for private deployment of the same API.

🚀 News

Intern-S1 Released: Intern-S1 is a state-of-the-art open-source multimodal model built on a Mixture-of-Experts (MoE) architecture, with enhanced scientific reasoning capabilities. It currently delivers the best overall performance among open-source models.

To use this model series via the ChatAPI, set the model field to intern-s1 or intern-s1-mini. By default, the model operates in deep-thinking mode.
You can adjust deep thinking mode by setting the optional thinking_mode field (boolean) when using the intern-s1 model.

Intern-S1-Mini Release: An 8B lightweight scientific reasoning model

To use this model, set the model field to intern-s1-mini. It runs in deep-thinking mode by default.

InternVL3.5 Released: Our latest general multimodal model series with exceptional image-text comprehension capabilities

The API currently supports the 241B parameter version of the model. Use internvl3.5-241b-a28b in the model field. Deep reasoning mode is enabled by default.

Rate Limit

API Rate Limit: By default, each user is limited to 30 requests per minute. If you require a higher request limit, you can apply for an upgraded rate limit configuration at Rate Limiting Policy.

Request Examples

Currently, the Intern ChatAPI is compatible with some methods of the OpenAI Python SDK, and more adaptations are in progress... We still recommend using native Python requests or curl requests to access the Intern API.

For users opting to use the OpenAI SDK, please install it first:

pip install openai

Non-Streaming Request

(1) Python Example

Python Requests

import requests
import json

url = 'https://chat.intern-ai.org.cn/api/v1/chat/completions'
headers = {
    'Content-Type': 'application/json',
    "Authorization": "Bearer eyJ0eXBlIjoiSl...please provide a valid token!"
}
data = {
    "model": "intern-latest",  
    "messages": [{
        "role": "user",
        "content": "Hello!"
    }],
    "n": 1,
    "temperature": 0.8,
    "top_p": 0.9
}

res = requests.post(url, headers=headers, data=json.dumps(data))
print(res.status_code)
print(res.json())
print(res.json()["choices"][0]['message']["content"])

Applying OpenAI Python SDK

from openai import OpenAI

client = OpenAI(
    api_key="eyJ0eXBlIjoiSl...please provide a valid token!",  # Token is passed here without 'Bearer'
    base_url="https://chat.intern-ai.org.cn/api/v1/",
)

chat_rsp = client.chat.completions.create(
    model="intern-latest",
    messages=[{"role": "user", "content": "hello"}],
)

for choice in chat_rsp.choices:
    print(choice.message.content)

Parameter Description:

Supports model, messages, n, temperature, top_p, stream, max_tokens, tools
Other parameters are not supported yet

(2) CLI Example

openai -b "https://chat.intern-ai.org.cn/api/v1/" \
-k "eyJ0eXBlIjoiSl...please provide a valid token!" \
api chat.completions.create  \
-m "intern-latest" \
-g user hello

Note:

Supports parameters -g ROLE CONTENT -m MODEL [-n N] [-t TEMPERATURE] [-P TOP_P]
--stop STOP is currently not supported

(3) curl Example

curl --location 'https://chat.intern-ai.org.cn/api/v1/chat/completions' \
--header 'Authorization: Bearer xxxxxxx' \
--header 'Content-Type: application/json' \
--data '{
        "model": "intern-latest",  
        "messages": [{
                "role": "user",
                "content": "Do you know Liu Cixin?"
        }, {
                "role": "assistant",
                "content": "As an AI assistant, I know Liu Cixin. He is a renowned Chinese science fiction writer and engineer, having won multiple awards, including the Hugo and Nebula awards."
        },{
                "role": "user",
                "content": "Which of his works won the Hugo Award?"
        }],
        "temperature": 0.8,
        "top_p": 0.9
}'

Streaming Request Example

(1) Python Example

Python Requests

import requests
import json

url = 'https://chat.intern-ai.org.cn/api/v1/chat/completions'
header = {
    'Content-Type':'application/json',
    "Authorization":"Bearer eyJ0eXBlIjoiSl...please fill in the correct token!"
}
data = {
    "model": "intern-latest",  
    "messages": [{
        "role": "user",
        "content": "Hello~"
    }],
    "n": 1,
    "temperature": 0.8,
    "top_p": 0.9,
    "stream": True,
}

response = requests.post(url, headers=header,data=json.dumps(data), stream=True)
for chunk in response.iter_lines(chunk_size=8192, decode_unicode=False, delimiter=b'\n'):
    if not chunk:
        continue
    decoded = chunk.decode('utf-8')
    if not decoded.startswith("data:"):
        raise Exception(f"error message {decoded}")
    decoded = decoded.strip("data:").strip()
    if "[DONE]" == decoded:
        print("finish!")
        break
    output = json.loads(decoded)
    if output["object"] == "error":
        raise Exception(f"logic err: {output}")
    print(output["choices"][0]["delta"]["content"])

Applying OpenAI Python SDK

from openai import OpenAI

client = OpenAI(
    api_key="eyJ0eXBlIjoiSl...please fill in the correct token!",  # Pass token here, without Bearer
    base_url="https://chat.intern-ai.org.cn/api/v1/",
)

chat_rsp = client.chat.completions.create(
    model="intern-latest",
    messages=[{"role": "user", "content": "hello"}],
    stream=True,
)

for chunk in chat_rsp:
    print(chunk.choices[0].delta.content)

(2) CLI Example

openai -b "https://chat.intern-ai.org.cn/api/v1/" 
-k "eyJ0eXBlIjoiSl...please fill in the correct token!" 
api chat.completions.create  
-m "intern-latest"
--stream
-g user hello

Note:

Supports -g ROLE CONTENT -m MODEL [-n N] [-t TEMPERATURE] [-P TOP_P] --stream parameters.
Does not currently support [--stop STOP].

(3) curl Example

curl --location 'https://chat.intern-ai.org.cn/api/v1/chat/completions' \
--header 'Authorization: Bearer xxxxxxx' \
--header 'Content-Type: application/json' \
--data '{
        "model": "intern-latest",  
        "messages": [{
                "role": "user",
                "content": "Do you know Liu Cixin?"
        }, {
                "role": "assistant",
                "content": "As an AI assistant, I know Liu Cixin. He is a famous Chinese science fiction writer and engineer, having won several awards including the Hugo and Nebula Awards."
        },{
                "role": "user",
                "content": "Which of his works won the Hugo Award?"
        }],
        "temperature": 0.8,
        "top_p": 0.9,
        "stream": true
}'

Imgae-Text Request Example

The following example uses an HTTP request body for example. For specific SDK usage, you can refer to the Python Example, CLI Example, or curl Example.

// Request
{
    "model": "internvl2.5-latest",
    "messages": [
        {
            "role": "user",
            "content": "Hi"
        },
        {
            "role": "assistant",
            "content": "Hi，I'm internvl"
        },
        {
            "role": "user",
            "content": [                                    // user's input with image and text，Array
                {
                    "type": "text",                         // type field supports text/image_url
                    "text": "Describe the image please"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://static.openxlab.org.cn/internvl/demo/visionpro.png"  // image url/ base64 image
                    }
                },
                {
                    "type": "image_url",                                                     // support multiple images in one round
                    "image_url": {
                        "url": "data:image/jpeg;base64,{<encode_image(image_path)>}"          // replace <encode_image(image_path)> to your encoded base64 image
                    }
                }
            ]
        }
    ],
    "temperature": 0.8,   // float [0,1],default=0.5
    "top_p": 0.9,          // float [0,1],default=1
    "max_tokens": 100 // default=0  
}

Tool Call Request Example

The user wants to check the weather in Shanghai. The model is provided with the get_current_weather function as an available tool for retrieving weather information.

// Request
{
    "model": "intern-latest",
    "messages": [
        {
            "role": "user",
            "content": "今天上海天气怎么样"
        }
    ],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather in a given location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA"
                        },
                        "unit": {
                            "type": "string",
                            "enum": [
                                "celsius",
                                "fahrenheit"
                            ]
                        }
                    },
                    "required": [
                        "location"
                    ]
                }
            }
        }
    ]
}

After the model outputs the parameters for calling get_current_weather, the client should execute this function in the environment using the provided arguments and return the result back to the model for generating a final response.

{
    "model": "intern-latest",
    "messages": [
        {
            "role": "user",
            "content": "今天天气怎么样"
        },
        {
            "role": "assistant",
            "content": "我需要使用 get_current_weather API 来查询今天上海的天气",
            "tool_calls": [
                {
                    "id": "97102",
                    "type": "function",
                    "function": {
                        "name": "get_current_weather",
                        "arguments": "{\"location\": \"shanghai\", \"unit\": \"celsius\"}"
                    }
                }
            ]
        },
        {
            "role": "tool",
            "content": "{\"location\": \"shanghai\", \"unit\": \"celsius\", \"temperature\": \"17\"}",
            "tool_call_id": "97102"
        }
    ],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather in a given location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA"
                        },
                        "unit": {
                            "type": "string",
                            "enum": [
                                "celsius",
                                "fahrenheit"
                            ]
                        }
                    },
                    "required": [
                        "location"
                    ]
                }
            }
        }
    ]
}

Response Examples

Example of Non-streaming Response

// Request
Schema: HTTP
Path:  /api/v1/chat/completions
Method: POST
Header: 
    Authorization: $BearerToken

// Body
{
    "model":"intern-latest",    // Default model is the latest version
    "messages": [{
            "role": "user",         // Supported roles: user/assistant/system/tool
            "content": "Do you know Liu Cixin?"
    }, {
            "role": "assistant",
            "content": "As an AI assistant, I know Liu Cixin. He is a famous Chinese science fiction writer and engineer, having won several awards including the Hugo and Nebula Awards."
    },{
            "role": "user",
            "content": "Which of his works won the Hugo Award?"
    }],
    "temperature": 0.8,   // float [0,1], default=0.5
    "top_p": 0.9          // float [0,1], default=1
    "n": 1 // default=1  
}
// Response
Status Code: 200 
Body:
{
    "id": "chatcmpl-123",       // Unique identifier for this response
    "model": "intern-latest"  // Model ID
    "created": 1677652288,      // Timestamp
    "choices": [{               // Model's response content
      "index": 0,               // First choice
      "message": {
          "role": "assistant",
          "content": "Liu Cixin's 'The Three-Body Problem' series won the Hugo Award for Best Novel in 2015. This was also the first time an Asian science fiction writer won this prestigious award....", // Response from Puyu
      },
      "finish_reason": "stop"   // Options: length / stop / tool_calls, length means the result exceeded max_tokens
    }],
    "moderation": {              // Content moderation, currently omitted
      "sensitive": false
      "category": ""
    }
}

Example of Streaming Response

// Request
Schema: HTTP
Path:  /api/v1/chat/completions
Method: POST
Header: 
    Authorization: $BearerToken

// Request
{
    "model": "intern-latest",    // Default model is the latest version
    "messages": [{
            "role": "user",         // Supported roles: user/assistant/system/tool
            "content": "Do you know Liu Cixin?"
    }, {
            "role": "assistant",
            "content": "As an AI assistant, I know Liu Cixin. He is a famous Chinese science fiction writer and engineer, having won several awards including the Hugo and Nebula Awards."
    },{
            "role": "user",
            "content": "Which of his works won the Hugo Award?"
    }],
    "n": 1,
    "temperature": 0.8,
    "top_p": 0.9,
    "stream": True,
}
// Response
Status Code: 200 
Body:
{
    "id": "chatcmpl-123",       // Unique identifier for this response
    "model": "intern-latest"  // Model ID
    "created": 1677652288,      // Timestamp
    "choices": [{               // Model's response content
      "index": 0,               // First choice
      "delta": {
          "role": "assistant",
          "content": "Liu Cixin's 'The Three-Body Problem' series won the Hugo Award for Best Novel in 2015", // Response from Puyu
      },
      "finish_reason": ""   // Options: length / stop / empty string, length means the result exceeded max_tokens. Empty string means not all output returned yet
    }],
    "moderation": {              // Content moderation, currently omitted
      "sensitive": false
      "category": ""
    }
}

Example of Tool Call Response

Status Code: 200 
Body:
{
    "id": "chatcmpl-123",       // Unique identifier for this response
    "model": "intern-latest", // Model ID
    "created": 1677652288,      // Creation timestamp
    "choices": [{               // Model's response content
      "index": 0,               // Entry 0
      "message": {
            "role": "assistant",
            "content": "get_current_weather",
            "tool_calls": [
                {
                    "id": "97102",
                    "type": "function",
                    "function": {
                        "name": "get_current_weather",
                        "arguments": "{'location': 'shanghai', 'unit': 'celsius'}"
                    }
                }
            ]
        },
      "finish_reason": "tool_calls"  
    }],
    "moderation": {              
      "sensitive": false,
      "category": ""
    }
}

Parameter Explanation

The API sets a timeout for the model call. If the model has not completed its output within 120 seconds, the output will be interrupted and the current result returned.

Request Parameters Explanation

Parameter	Type	Example	Description
model	string	intern-latest	Name of the model being called
messages	array	LLM: [{"role":"user","content":"Hello"}]Multi-modal Model: Please refer to the below "image-text messages"	Conversation history and current query. The role (user/assistant/system/tool) is supported. "role": "system" is not supported for internthinker-beta model
thinking_mode	optional，boolean	True	This field is only effective when the `model` is set to `intern-s1`. It controls whether the model responds in deep thinking mode.
tools	Optional, array	See tools example below	An array of functions available for the model to call. Based on this array, the model will respond with the corresponding function's JSON object.
tool_choice	Optional, string or object	{"type": "function", "function": {"name": "my_function"}}	Options: none (forces no function call), auto (automatic function call), required (must call a function), or provide a specific function object that must be called.
temperature	Optional, float	0.8	Sampling temperature
top_p	Optional, float	0.9	The probability threshold for candidate tokens
max_tokens	Optional, int	100	The maximum number of tokens for the output. Default 0 adjusts to the maximum allowed length
stream	Optional, bool	true	Stream incremental results. Does not support streaming with tool calls.

Image-Text Messages

{
    "model": "internvl-latest",
    "messages": [
        {
            "role": "user",
            "content": "Hi"
        },
        {
            "role": "assistant",
            "content": "Hi，I'm intern-s1"
        },
        {
            "role": "user",
            "content": [                                    // user's input with image and text，Array
                {
                    "type": "text",                         // type field supports text/image_url
                    "text": "Describe the image please"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://static.openxlab.org.cn/internvl/demo/visionpro.png"  // image url/ base64 image
                    }
                },
                {
                    "type": "image_url",                                                     // support multiple images in one round
                    "image_url": {
                        "url": "https://static.openxlab.org.cn/puyu/demo/000-2x.jpg"
                    }
                }
            ]
        }
    ],
    "temperature": 0.8, // float [0,1],default=0.5
    "top_p": 0.9,       // float [0,1],default=1
    "max_tokens": 100   // default=0  
}

Tools Example

tools = [
            {
                "type": "function", 
                "function": {
                    "name": "get_current_weather", 
                    "description": "Get the current weather in a given location",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "location": {
                                "type": "string",
                                "description": "The city and state, e.g. San Francisco, CA",
                            },
                            "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                        },
                        "required": ["location"],
                    },
                },
            }   
        ]

Response Parameters Explanation

Parameter	Type	Example	Description
id	string	chatcmpl-123	Unique session ID
model	string	intern-latest	Model name used
created	int	1677652288	Creation timestamp
choices	array	[{"index":0,"message":{"role":"assistant","content":"hi"},"finish_reason":"stop"}]	Model's reply
moderation	object	{"sensitive":false,"category":""}	Content moderation info (not yet supported)

Choice Structure Explanation

Parameter	Type	Example	Description
index	int	0	Index of the result in case of multiple generations
message	object	{"role":"assistant", "content":"Hello"}	Represents a query or response. Only exists in non-streaming requests
delta	object	{"role":"assistant", "content":"Hello"}	Represents a query or response. Only exists in streaming requests, and does not support tool calls during streaming.
finish_reason	string	stop	Options: length / stop / tool_calls. length indicates that the response exceeds max_tokens.

Message Structure Explanation

Parameter	Type	Example	Description
role	string	user	Indicates user query (user) or model response (assistant)
content	string	Hello	Dialogue content. When sending a "tool" request, the content is the JSON string of the function call's response
tool_call_id	string	97102	Required when sending a "tool" request or if the model generates tool_calls content. These IDs must match.
tool_calls	array	See tools_calls example below	The JSON-formatted function call returned by the model.

Tool Calls Example

tool_calls = [
                {
                    "id": "97102",
                    "type": "function",
                    "function": {
                        "name": "get_current_weather",
                        "arguments": "{'location': 'shanghai', 'unit': 'celsius'}"
                    }
                }
            ]

Moderation Structure Explanation

Parameter	Type	Example	Description
sensitive	bool	false	Indicates whether content is sensitive
category	string	ad	Classification of sensitive content

Common Errors

Error Code	Description	Suggested Solution
-10002	Invalid parameter	Check if the query is empty
-20004	Non-whitelisted user	Apply for access through the website
-20013	Requested model does not exist	Verify if the model parameter is supported
-20018	Incorrect message format	Verify the message format
-20035	Account not linked to a phone number	Bind a phone number via the website
-20053	Exceeded frequency limit (req/min or tokens/min)	Ensure the request frequency is within allowed limits, or apply for higher request frequency
A0202	User authentication failed	Verify if the provided token is correct and formatted consistently (including Bearer)
A0211	Token expired	Confirm if the token has expired
C1114	Token is not for this service	Verify if the token is meant for this service

Multi-Turn Chat

🚀 News​

Rate Limit​

Request Examples​

Non-Streaming Request​

(1) Python Example​

(2) CLI Example​

(3) curl Example​

Streaming Request Example​

(1) Python Example​

(2) CLI Example​

(3) curl Example​

Imgae-Text Request Example​

Tool Call Request Example​

Response Examples​

Example of Non-streaming Response​

Example of Streaming Response​

Example of Tool Call Response​

Parameter Explanation​

Request Parameters Explanation​

Response Parameters Explanation​

Explanation of Related Structures​

Common Errors​

🚀 News

Rate Limit

Request Examples

Non-Streaming Request

(1) Python Example

(2) CLI Example

(3) curl Example

Streaming Request Example

(1) Python Example

(2) CLI Example

(3) curl Example

Imgae-Text Request Example

Tool Call Request Example

Response Examples

Example of Non-streaming Response

Example of Streaming Response

Example of Tool Call Response

Parameter Explanation

Request Parameters Explanation

Response Parameters Explanation

Explanation of Related Structures

Common Errors