Skip to main content

Claude-like API

版本: v1.0
更新日期: 2025-10-24
API 端点: https://chat.intern-ai.org.cn/v1/messages


目录


快速开始

第一个请求

Python 示例

import requests
import json

url = "https://chat.intern-ai.org.cn/v1/messages"
headers = {
"Content-Type": "application/json",
"x-api-key": "your-api-key",
"anthropic-version": "2023-06-01"
}

data = {
"model": "intern-s1",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "你好,请介绍一下书生大模型"}
]
}

response = requests.post(url, headers=headers, json=data)
result = response.json()

# 提取回复内容
if response.status_code == 200:
reply = result["content"][0]["text"]
print(f"模型回复: {reply}")
else:
print(f"错误: {result}")

cURL 示例

curl -X POST https://chat.intern-ai.org.cn/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: your-api-key" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "intern-s1",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "你好,请介绍一下书生大模型"}
]
}'

响应示例

{
"id": "msg_01XYZ...",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "你好!我是书生大模型,由上海人工智能实验室开发..."
}
],
"model": "intern-s1",
"stop_reason": "end_turn",
"usage": {
"input_tokens": 15,
"output_tokens": 120
}
}

认证

所有 API 请求都需要在请求头中包含认证信息。

请求头参数

参数名类型必需说明
Content-Typestring必须为 application/json
x-api-keystring您的 API 密钥,格式:sk-xxxxx
anthropic-versionstringAPI 版本号(可选)

示例

headers = {
"Content-Type": "application/json",
"x-api-key": "sk-your-api-key-here",
"anthropic-version": "2023-06-01"
}

API 端点

创建消息

创建一条新的对话消息并获取模型响应。

端点: POST /v1/messages

请求体参数

参数名类型必需默认值说明
modelstring-模型名称,如 intern-s1
max_tokensinteger-生成的最大 token 数量,范围:1-32000
messagesarray-对话消息数组
systemstring-系统提示词,定义助手的行为和角色
temperaturenumber0.7采样温度,范围:0.0-1.0,值越高输出越随机
top_pnumber1.0核采样参数,范围:0.0-1.0
top_kinteger-1Top-K 采样参数
streambooleanfalse是否启用流式输出
stop_sequencesarray[]停止序列,遇到这些序列时停止生成

请求格式

Messages 参数详解

messages 是一个消息对象数组,每个消息包含以下字段:

字段类型必需说明
rolestring消息角色,可选值:user(用户)、assistant(助手)
contentstring/array消息内容,可以是字符串或内容块数组

基础格式(字符串内容)

{
"messages": [
{"role": "user", "content": "你好"},
{"role": "assistant", "content": "你好!有什么可以帮您的吗?"},
{"role": "user", "content": "介绍一下你自己"}
]
}

高级格式(数组内容,支持多模态)

{
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "这张图片里有什么?"},
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/jpeg",
"data": "/9j/4AAQSkZJRg..."
}
}
]
}
]
}

System 参数详解

system 参数用于定义助手的行为、角色和约束条件。

示例

{
"system": "你是一位专业的 Python 编程助手,擅长解释代码和提供最佳实践建议。回答时请保持专业、简洁。"
}

响应格式

成功响应 (200 OK)

响应体结构

字段类型说明
idstring消息唯一标识符
typestring响应类型,固定为 message
rolestring角色,固定为 assistant
contentarray内容块数组
modelstring使用的模型名称
stop_reasonstring停止原因,见下表
usageobjectToken 使用统计

stop_reason 取值

说明
end_turn模型自然结束回复
max_tokens达到 max_tokens 限制
stop_sequence遇到停止序列

content 数组元素

字段类型说明
typestring内容类型,可选值:text(文本)、thinking(思考过程)
textstring文本内容(type 为 text 时)
thinkingstring思考内容(type 为 thinking 时)

usage 对象

字段类型说明
input_tokensinteger输入消息使用的 token 数量
output_tokensinteger生成内容使用的 token 数量

完整响应示例

{
"id": "msg_01XYZ123ABC",
"type": "message",
"role": "assistant",
"content": [
{
"type": "thinking",
"thinking": "用户在询问书生大模型,我需要简要介绍其特点..."
},
{
"type": "text",
"text": "书生大模型(InternLM)是由上海人工智能实验室开发的大语言模型..."
}
],
"model": "intern-s1",
"stop_reason": "end_turn",
"usage": {
"input_tokens": 158,
"output_tokens": 256
}
}

流式响应

启用流式输出可以实时获取模型生成的内容,提升用户体验。

启用流式输出

在请求中设置 "stream": true

Python 示例

import requests
import json

url = "https://chat.intern-ai.org.cn/v1/messages"
headers = {
"Content-Type": "application/json",
"x-api-key": "your-api-key",
"anthropic-version": "2023-06-01"
}

data = {
"model": "intern-s1",
"max_tokens": 1024,
"stream": True,
"messages": [{"role": "user", "content": "讲一个故事"}]
}

response = requests.post(url, headers=headers, json=data, stream=True)

for line in response.iter_lines():
if line:
line_str = line.decode('utf-8')
if line_str.startswith('data: '):
data_str = line_str[6:]
if data_str == '[DONE]':
break
try:
chunk = json.loads(data_str)
# 处理流式数据块
if chunk.get('type') == 'content_block_delta':
text = chunk.get('delta', {}).get('text', '')
print(text, end='', flush=True)
except json.JSONDecodeError:
pass

cURL 示例

curl -X POST https://chat.intern-ai.org.cn/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: your-api-key" \
-H "anthropic-version: 2023-06-01" \
-N \
-d '{
"model": "intern-s1",
"max_tokens": 1024,
"stream": true,
"messages": [{"role": "user", "content": "讲一个故事"}]
}'

流式事件类型

事件类型说明
message_start消息开始
content_block_start内容块开始
content_block_delta内容块增量更新(包含实际文本)
content_block_stop内容块结束
message_delta消息元数据更新
message_stop消息结束

流式响应示例

event: message_start
data: {"type":"message_start","message":{"id":"msg_01ABC","type":"message","role":"assistant"}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"从前"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"有一个"}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":150}}

event: message_stop
data: {"type":"message_stop"}

错误处理

错误响应格式

所有错误响应都包含以下结构:

{
"error": {
"type": "错误类型",
"code": "错误代码",
"message": "错误描述信息",
"param": "相关参数名(可选)"
}
}

常见错误代码

HTTP 状态码错误类型错误代码说明
400invalid_request_errorinvalid_request请求格式错误或参数无效
400invalid_request_error-20009模型服务不可用(通常是参数组合问题)
401authentication_errorinvalid_api_keyAPI 密钥无效
403permission_errorpermission_denied权限不足
429rate_limit_errorrate_limit_exceeded请求频率超限
500api_errorinternal_server_error服务器内部错误

错误示例

{
"error": {
"type": "invalid_request_error",
"code": "invalid_request",
"message": "max_tokens 是必需参数",
"param": "max_tokens"
}
}

Python 错误处理示例

response = requests.post(url, headers=headers, json=data)

if response.status_code == 200:
result = response.json()
print(result["content"][0]["text"])
elif response.status_code == 400:
error = response.json()["error"]
print(f"请求错误: {error['message']}")
elif response.status_code == 401:
print("认证失败: API 密钥无效")
elif response.status_code == 429:
print("请求过于频繁,请稍后重试")
else:
print(f"未知错误: {response.status_code}")

与 OpenAI API 的差异

对比总览

特性OpenAI API书生 Claude-like API
端点/v1/chat/completions/v1/messages
认证头Authorization: Bearer sk-xxxx-api-key: sk-xxx
API 版本不需要anthropic-version(可选)
必需参数model, messagesmodel, messages, max_tokens
系统提示messages 数组中独立的 system 参数
响应格式choices 数组content 数组
Token 统计prompt_tokens, completion_tokensinput_tokens, output_tokens
停止原因finish_reasonstop_reason

详细差异说明

1. 端点差异

OpenAI

POST https://api.openai.com/v1/chat/completions

书生 Claude-like

POST https://chat.intern-ai.org.cn/v1/messages

2. 认证方式差异

OpenAI

headers = {
"Authorization": "Bearer sk-xxxxx",
"Content-Type": "application/json"
}

书生 Claude-like

headers = {
"x-api-key": "sk-xxxxx",
"Content-Type": "application/json"
# "anthropic-version": "2023-06-01" # 可选
}

3. 请求参数差异

OpenAI 请求

{
"model": "gpt-4",
"messages": [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Hello"}
],
"temperature": 0.7
}

书生 Claude-like 请求

{
"model": "intern-s1",
"max_tokens": 1024,
"system": "You are a helpful assistant",
"messages": [
{"role": "user", "content": "Hello"}
],
"temperature": 0.7
}

关键差异

  • ✅ 书生 API 的 max_tokens必需参数
  • ✅ 书生 API 的系统提示使用独立的 system 参数
  • ✅ 书生 API 的 messages不包含 system 角色

4. 响应格式差异

OpenAI 响应

{
"id": "chatcmpl-123",
"object": "chat.completion",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 20,
"total_tokens": 30
}
}

书生 Claude-like 响应

{
"id": "msg_123",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Hello! How can I help you?"
}
],
"model": "intern-s1",
"stop_reason": "end_turn",
"usage": {
"input_tokens": 10,
"output_tokens": 20
}
}

提取响应内容的差异

# OpenAI
text = response["choices"][0]["message"]["content"]

# 书生 Claude-like
text = response["content"][0]["text"]

5. 流式响应差异

OpenAI 流式

data: {"choices":[{"delta":{"content":"Hello"}}]}
data: [DONE]

书生 Claude-like 流式

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":"Hello"}}

event: message_stop
data: {"type":"message_stop"}

迁移指南

如果您正在从 OpenAI API 迁移到书生 Claude-like API,请注意:

  1. 更新端点 URL
  2. 修改认证头(从 Authorization 改为 x-api-key
  3. 添加 max_tokens 参数(必需)
  4. 调整系统提示(从 messages 移到 system 参数)
  5. 更新响应解析逻辑(从 choices 改为 content

迁移代码示例

# 原 OpenAI 代码
import openai

openai.api_key = "sk-xxxxx"
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are helpful"},
{"role": "user", "content": "Hello"}
]
)
text = response.choices[0].message.content

# 迁移后的书生 API 代码
import requests

url = "https://chat.intern-ai.org.cn/v1/messages"
headers = {
"x-api-key": "sk-xxxxx",
"Content-Type": "application/json"
}
data = {
"model": "intern-s1",
"max_tokens": 1024,
"system": "You are helpful",
"messages": [{"role": "user", "content": "Hello"}]
}
response = requests.post(url, headers=headers, json=data)
text = response.json()["content"][0]["text"]

完整示例

示例 1: 基础对话

Python

import requests

def chat_with_intern(user_message):
url = "https://chat.intern-ai.org.cn/v1/messages"
headers = {
"Content-Type": "application/json",
"x-api-key": "your-api-key",
"anthropic-version": "2023-06-01"
}

data = {
"model": "intern-s1",
"max_tokens": 2048,
"messages": [
{"role": "user", "content": user_message}
]
}

response = requests.post(url, headers=headers, json=data)

if response.status_code == 200:
result = response.json()
return result["content"][0]["text"]
else:
return f"错误: {response.json()}"

# 使用
reply = chat_with_intern("介绍一下书生大模型")
print(reply)

示例 2: 多轮对话

Python

def multi_turn_chat():
url = "https://chat.intern-ai.org.cn/v1/messages"
headers = {
"Content-Type": "application/json",
"x-api-key": "your-api-key",
"anthropic-version": "2023-06-01"
}

# 对话历史
messages = [
{"role": "user", "content": "我想学习 Python"},
{"role": "assistant", "content": "太好了!Python 是一门很适合初学者的语言。你想从哪里开始?"},
{"role": "user", "content": "从数据类型开始吧"}
]

data = {
"model": "intern-s1",
"max_tokens": 2048,
"messages": messages
}

response = requests.post(url, headers=headers, json=data)
result = response.json()

return result["content"][0]["text"]

reply = multi_turn_chat()
print(reply)

示例 3: 带系统提示的专业助手

Python

def code_reviewer(code):
url = "https://chat.intern-ai.org.cn/v1/messages"
headers = {
"Content-Type": "application/json",
"x-api-key": "your-api-key",
"anthropic-version": "2023-06-01"
}

data = {
"model": "intern-s1",
"max_tokens": 2048,
"system": "你是一位资深的代码审查专家。请仔细检查代码的:1) 正确性 2) 性能 3) 可读性 4) 最佳实践",
"messages": [
{"role": "user", "content": f"请审查以下代码:\n\n```python\n{code}\n```"}
]
}

response = requests.post(url, headers=headers, json=data)
return response.json()["content"][0]["text"]

# 使用
code = """
def fibonacci(n):
if n <= 1:
return n
return fibonacci(n-1) + fibonacci(n-2)
"""

review = code_reviewer(code)
print(review)

示例 4: 流式输出

Python

def stream_chat(user_message):
url = "https://chat.intern-ai.org.cn/v1/messages"
headers = {
"Content-Type": "application/json",
"x-api-key": "your-api-key",
"anthropic-version": "2023-06-01"
}

data = {
"model": "intern-s1",
"max_tokens": 2048,
"stream": True,
"messages": [{"role": "user", "content": user_message}]
}

response = requests.post(url, headers=headers, json=data, stream=True)

print("模型回复: ", end='')
for line in response.iter_lines():
if line:
line_str = line.decode('utf-8')
if line_str.startswith('data: '):
data_str = line_str[6:]
if data_str == '[DONE]':
break
try:
chunk = json.loads(data_str)
if chunk.get('type') == 'content_block_delta':
text = chunk.get('delta', {}).get('text', '')
print(text, end='', flush=True)
except:
pass
print()

# 使用
stream_chat("写一首关于春天的诗")

常见问题

Q1: 为什么 max_tokens 是必需参数?

A: 书生 Claude-like API 要求明确指定最大生成长度,这有助于:

  • 控制响应成本
  • 防止意外的超长响应
  • 确保响应时间可预测

建议值

  • 简短回答:512-1024
  • 一般对话:1024-2048
  • 长文本生成:2048-4096
  • 极长内容:4096-32000

Q2: system 参数和 messages 中的 system 角色有什么区别?

A: 在书生 Claude-like API 中:

  • ✅ 使用独立的 system 参数(推荐)
  • 不要messages 数组中使用 {"role": "system"}

正确做法

{
"system": "你是一个有用的助手",
"messages": [
{"role": "user", "content": "你好"}
]
}

错误做法

{
"messages": [
{"role": "system", "content": "你是一个有用的助手"},
{"role": "user", "content": "你好"}
]
}

Q3: 如何处理模型返回的 thinking 内容?

A: 书生模型可能返回 thinking 类型的内容块,这是模型的思考过程:

result = response.json()
for block in result["content"]:
if block["type"] == "text":
print("回复:", block["text"])
elif block["type"] == "thinking":
print("思考:", block["thinking"])

您可以选择:

  • 显示给用户(增加透明度)
  • 仅记录用于调试
  • 完全忽略

Q4: 如何计算请求成本?

A: 使用响应中的 usage 字段:

result = response.json()
input_tokens = result["usage"]["input_tokens"]
output_tokens = result["usage"]["output_tokens"]
total_tokens = input_tokens + output_tokens

print(f"输入: {input_tokens} tokens")
print(f"输出: {output_tokens} tokens")
print(f"总计: {total_tokens} tokens")

Q5: 支持哪些停止序列?

A: 可以指定多个停止序列,模型遇到任一序列时停止生成:

{
"stop_sequences": ["\n\n", "结束", "END"]
}

Q6: temperature、top_p、top_k 如何选择?

A:

参数范围用途建议
temperature0.0-1.0控制随机性0.7-1.0: 创意任务
0.0-0.3: 精确任务
top_p0.0-1.0核采样0.9-1.0: 多样性
0.5-0.9: 平衡
top_k整数Top-K采样40-100: 常用
-1: 禁用

示例

{
"temperature": 0.9,
"top_p": 0.95,
"top_k": 40
}

Q7: 如何处理长对话?

A: 当对话历史很长时:

  1. 保留最近的消息(推荐)
# 保留最近10轮对话
recent_messages = messages[-20:] # 每轮2条消息
  1. 使用摘要
# 将旧对话摘要后作为 system 提示
system = f"对话历史摘要:{summary}\n\n你是一个有用的助手。"
  1. 监控 token 使用
if result["usage"]["input_tokens"] > 30000:
# 清理对话历史
messages = messages[-10:]