Claude-like API
Version: v1.0
Last Updated: 2025-10-24
API Endpoint:https://chat.intern-ai.org.cn/v1/messages
Table of Contents
- Quick Start
- Authentication
- API Endpoints
- Request Format
- Response Format
- Streaming
- Error Handling
- Differences from OpenAI API
- Complete Examples
- FAQ
Quick Start
Your First Request
Python Example
import requests
import json
url = "https://chat.intern-ai.org.cn/v1/messages"
headers = {
"Content-Type": "application/json",
"x-api-key": "your-api-key",
"anthropic-version": "2023-06-01"
}
data = {
"model": "intern-s1",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "Hello, please introduce InternLM"}
]
}
response = requests.post(url, headers=headers, json=data)
result = response.json()
# Extract the reply
if response.status_code == 200:
reply = result["content"][0]["text"]
print(f"Model reply: {reply}")
else:
print(f"Error: {result}")
cURL Example
curl -X POST https://chat.intern-ai.org.cn/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: your-api-key" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "intern-s1",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "Hello, please introduce InternLM"}
]
}'
Response Example
{
"id": "msg_01XYZ...",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Hello! I'm InternLM, a large language model developed by Shanghai AI Laboratory..."
}
],
"model": "intern-s1",
"stop_reason": "end_turn",
"usage": {
"input_tokens": 15,
"output_tokens": 120
}
}
Authentication
All API requests must include authentication information in the request headers.
Header Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
Content-Type | string | ✅ | Must be application/json |
x-api-key | string | ✅ | Your API key, format: sk-xxxxx |
anthropic-version | string | ❌ | API version (optional) |
Example
headers = {
"Content-Type": "application/json",
"x-api-key": "sk-your-api-key-here",
"anthropic-version": "2023-06-01"
}
API Endpoints
Create Message
Create a new conversation message and get a model response.
Endpoint: POST /v1/messages
Request Body Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
model | string | ✅ | - | Model name, e.g., intern-s1 |
max_tokens | integer | ✅ | - | Maximum number of tokens to generate, range: 1-32000 |
messages | array | ✅ | - | Array of conversation messages |
system | string | ❌ | - | System prompt defining assistant behavior and role |
temperature | number | ❌ | 0.7 | Sampling temperature, range: 0.0-1.0, higher = more random |
top_p | number | ❌ | 1.0 | Nucleus sampling parameter, range: 0.0-1.0 |
top_k | integer | ❌ | -1 | Top-K sampling parameter |
stream | boolean | ❌ | false | Enable streaming output |
stop_sequences | array | ❌ | [] | Stop sequences that will halt generation |
Request Format
Messages Parameter Details
messages is an array of message objects, each containing:
| Field | Type | Required | Description |
|---|---|---|---|
role | string | ✅ | Message role, values: user or assistant |
content | string/array | ✅ | Message content, can be string or array of content blocks |
Basic Format (String Content)
{
"messages": [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi! How can I help you?"},
{"role": "user", "content": "Tell me about yourself"}
]
}
Advanced Format (Array Content, Multi-modal Support)
{
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/jpeg",
"data": "/9j/4AAQSkZJRg..."
}
}
]
}
]
}
System Parameter Details
The system parameter defines the assistant's behavior, role, and constraints.
Example
{
"system": "You are a professional Python programming assistant, skilled at explaining code and providing best practices. Keep responses professional and concise."
}
Response Format
Success Response (200 OK)
Response Structure
| Field | Type | Description |
|---|---|---|
id | string | Unique message identifier |
type | string | Response type, always message |
role | string | Role, always assistant |
content | array | Array of content blocks |
model | string | Model name used |
stop_reason | string | Reason for stopping, see table below |
usage | object | Token usage statistics |
stop_reason Values
| Value | Description |
|---|---|
end_turn | Model naturally finished the response |
max_tokens | Reached max_tokens limit |
stop_sequence | Encountered a stop sequence |
content Array Elements
| Field | Type | Description |
|---|---|---|
type | string | Content type, values: text, thinking |
text | string | Text content (when type is text) |
thinking | string | Thinking content (when type is thinking) |
usage Object
| Field | Type | Description |
|---|---|---|
input_tokens | integer | Number of tokens in input messages |
output_tokens | integer | Number of tokens generated |
Complete Response Example
{
"id": "msg_01XYZ123ABC",
"type": "message",
"role": "assistant",
"content": [
{
"type": "thinking",
"thinking": "The user is asking about InternLM, I need to briefly introduce its features..."
},
{
"type": "text",
"text": "InternLM is a large language model developed by Shanghai AI Laboratory..."
}
],
"model": "intern-s1",
"stop_reason": "end_turn",
"usage": {
"input_tokens": 158,
"output_tokens": 256
}
}
Streaming
Enable streaming output to receive model-generated content in real-time, improving user experience.
Enable Streaming
Set "stream": true in your request
Python Example
import requests
import json
url = "https://chat.intern-ai.org.cn/v1/messages"
headers = {
"Content-Type": "application/json",
"x-api-key": "your-api-key",
"anthropic-version": "2023-06-01"
}
data = {
"model": "intern-s1",
"max_tokens": 1024,
"stream": True,
"messages": [{"role": "user", "content": "Tell me a story"}]
}
response = requests.post(url, headers=headers, json=data, stream=True)
for line in response.iter_lines():
if line:
line_str = line.decode('utf-8')
if line_str.startswith('data: '):
data_str = line_str[6:]
if data_str == '[DONE]':
break
try:
chunk = json.loads(data_str)
# Process streaming chunk
if chunk.get('type') == 'content_block_delta':
text = chunk.get('delta', {}).get('text', '')
print(text, end='', flush=True)
except json.JSONDecodeError:
pass
cURL Example
curl -X POST https://chat.intern-ai.org.cn/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: your-api-key" \
-H "anthropic-version: 2023-06-01" \
-N \
-d '{
"model": "intern-s1",
"max_tokens": 1024,
"stream": true,
"messages": [{"role": "user", "content": "Tell me a story"}]
}'
Streaming Event Types
| Event Type | Description |
|---|---|
message_start | Message started |
content_block_start | Content block started |
content_block_delta | Content block delta (contains actual text) |
content_block_stop | Content block stopped |
message_delta | Message metadata update |
message_stop | Message stopped |
Streaming Response Example
event: message_start
data: {"type":"message_start","message":{"id":"msg_01ABC","type":"message","role":"assistant"}}
event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Once"}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" upon"}}
event: content_block_stop
data: {"type":"content_block_stop","index":0}
event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":150}}
event: message_stop
data: {"type":"message_stop"}
Error Handling
Error Response Format
All error responses contain the following structure:
{
"error": {
"type": "error_type",
"code": "error_code",
"message": "Error description",
"param": "related_parameter (optional)"
}
}
Common Error Codes
| HTTP Status | Error Type | Error Code | Description |
|---|---|---|---|
| 400 | invalid_request_error | invalid_request | Request format error or invalid parameters |
| 400 | invalid_request_error | -20009 | Model service unavailable (usually parameter combination issue) |
| 401 | authentication_error | invalid_api_key | Invalid API key |
| 403 | permission_error | permission_denied | Permission denied |
| 429 | rate_limit_error | rate_limit_exceeded | Rate limit exceeded |
| 500 | api_error | internal_server_error | Internal server error |
Error Example
{
"error": {
"type": "invalid_request_error",
"code": "invalid_request",
"message": "max_tokens is a required parameter",
"param": "max_tokens"
}
}
Python Error Handling Example
response = requests.post(url, headers=headers, json=data)
if response.status_code == 200:
result = response.json()
print(result["content"][0]["text"])
elif response.status_code == 400:
error = response.json()["error"]
print(f"Request error: {error['message']}")
elif response.status_code == 401:
print("Authentication failed: Invalid API key")
elif response.status_code == 429:
print("Rate limit exceeded, please retry later")
else:
print(f"Unknown error: {response.status_code}")
Differences from OpenAI API
Comparison Overview
| Feature | OpenAI API | InternLM Claude-like API |
|---|---|---|
| Endpoint | /v1/chat/completions | /v1/messages |
| Auth Header | Authorization: Bearer sk-xxx | x-api-key: sk-xxx |
| API Version | Not required | anthropic-version (optional) |
| Required Params | model, messages | model, messages, max_tokens |
| System Prompt | In messages array | Separate system parameter |
| Response Format | choices array | content array |
| Token Stats | prompt_tokens, completion_tokens | input_tokens, output_tokens |
| Stop Reason | finish_reason | stop_reason |
Detailed Differences
1. Endpoint Difference
OpenAI
POST https://api.openai.com/v1/chat/completions
InternLM Claude-like
POST https://chat.intern-ai.org.cn/v1/messages
2. Authentication Difference
OpenAI
headers = {
"Authorization": "Bearer sk-xxxxx",
"Content-Type": "application/json"
}
InternLM Claude-like
headers = {
"x-api-key": "sk-xxxxx",
"Content-Type": "application/json"
# "anthropic-version": "2023-06-01" # Optional
}
3. Request Parameters Difference
OpenAI Request
{
"model": "gpt-4",
"messages": [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Hello"}
],
"temperature": 0.7
}
InternLM Claude-like Request
{
"model": "intern-s1",
"max_tokens": 1024,
"system": "You are a helpful assistant",
"messages": [
{"role": "user", "content": "Hello"}
],
"temperature": 0.7
}
Key Differences:
- ✅ InternLM API requires
max_tokensparameter - ✅ InternLM API uses separate
systemparameter - ✅ InternLM API
messagesdoes not include system role
4. Response Format Difference
OpenAI Response
{
"id": "chatcmpl-123",
"object": "chat.completion",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 20,
"total_tokens": 30
}
}
InternLM Claude-like Response
{
"id": "msg_123",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Hello! How can I help you?"
}
],
"model": "intern-s1",
"stop_reason": "end_turn",
"usage": {
"input_tokens": 10,
"output_tokens": 20
}
}
Extracting Response Content
# OpenAI
text = response["choices"][0]["message"]["content"]
# InternLM Claude-like
text = response["content"][0]["text"]
5. Streaming Difference
OpenAI Streaming
data: {"choices":[{"delta":{"content":"Hello"}}]}
data: [DONE]
InternLM Claude-like Streaming
event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":"Hello"}}
event: message_stop
data: {"type":"message_stop"}
Migration Guide
If you're migrating from OpenAI API to InternLM Claude-like API:
- Update endpoint URL
- Modify authentication header (from
Authorizationtox-api-key) - Add
max_tokensparameter (required) - Adjust system prompt (move from messages to system parameter)
- Update response parsing (from
choicestocontent)
Migration Code Example
# Original OpenAI code
import openai
openai.api_key = "sk-xxxxx"
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are helpful"},
{"role": "user", "content": "Hello"}
]
)
text = response.choices[0].message.content
# Migrated InternLM code
import requests
url = "https://chat.intern-ai.org.cn/v1/messages"
headers = {
"x-api-key": "sk-xxxxx",
"Content-Type": "application/json"
}
data = {
"model": "intern-s1",
"max_tokens": 1024,
"system": "You are helpful",
"messages": [{"role": "user", "content": "Hello"}]
}
response = requests.post(url, headers=headers, json=data)
text = response.json()["content"][0]["text"]
Complete Examples
Example 1: Basic Chat
Python
import requests
def chat_with_intern(user_message):
url = "https://chat.intern-ai.org.cn/v1/messages"
headers = {
"Content-Type": "application/json",
"x-api-key": "your-api-key",
"anthropic-version": "2023-06-01"
}
data = {
"model": "intern-s1",
"max_tokens": 2048,
"messages": [
{"role": "user", "content": user_message}
]
}
response = requests.post(url, headers=headers, json=data)
if response.status_code == 200:
result = response.json()
return result["content"][0]["text"]
else:
return f"Error: {response.json()}"
# Usage
reply = chat_with_intern("Tell me about InternLM")
print(reply)
Example 2: Multi-turn Conversation
Python
def multi_turn_chat():
url = "https://chat.intern-ai.org.cn/v1/messages"
headers = {
"Content-Type": "application/json",
"x-api-key": "your-api-key",
"anthropic-version": "2023-06-01"
}
# Conversation history
messages = [
{"role": "user", "content": "I want to learn Python"},
{"role": "assistant", "content": "Great! Python is perfect for beginners. Where would you like to start?"},
{"role": "user", "content": "Let's start with data types"}
]
data = {
"model": "intern-s1",
"max_tokens": 2048,
"messages": messages
}
response = requests.post(url, headers=headers, json=data)
result = response.json()
return result["content"][0]["text"]
reply = multi_turn_chat()
print(reply)
Example 3: Professional Assistant with System Prompt
Python
def code_reviewer(code):
url = "https://chat.intern-ai.org.cn/v1/messages"
headers = {
"Content-Type": "application/json",
"x-api-key": "your-api-key",
"anthropic-version": "2023-06-01"
}
data = {
"model": "intern-s1",
"max_tokens": 2048,
"system": "You are a senior code review expert. Carefully check code for: 1) Correctness 2) Performance 3) Readability 4) Best practices",
"messages": [
{"role": "user", "content": f"Please review this code:\n\n```python\n{code}\n```"}
]
}
response = requests.post(url, headers=headers, json=data)
return response.json()["content"][0]["text"]
# Usage
code = """
def fibonacci(n):
if n <= 1:
return n
return fibonacci(n-1) + fibonacci(n-2)
"""
review = code_reviewer(code)
print(review)
Example 4: Streaming Output
Python
def stream_chat(user_message):
url = "https://chat.intern-ai.org.cn/v1/messages"
headers = {
"Content-Type": "application/json",
"x-api-key": "your-api-key",
"anthropic-version": "2023-06-01"
}
data = {
"model": "intern-s1",
"max_tokens": 2048,
"stream": True,
"messages": [{"role": "user", "content": user_message}]
}
response = requests.post(url, headers=headers, json=data, stream=True)
print("Model reply: ", end='')
for line in response.iter_lines():
if line:
line_str = line.decode('utf-8')
if line_str.startswith('data: '):
data_str = line_str[6:]
if data_str == '[DONE]':
break
try:
chunk = json.loads(data_str)
if chunk.get('type') == 'content_block_delta':
text = chunk.get('delta', {}).get('text', '')
print(text, end='', flush=True)
except:
pass
print()
# Usage
stream_chat("Write a poem about spring")
FAQ
Q1: Why is max_tokens a required parameter?
A: InternLM Claude-like API requires explicitly specifying the maximum generation length to:
- Control response cost
- Prevent unexpectedly long responses
- Ensure predictable response time
Recommended values:
- Short answers: 512-1024
- General conversation: 1024-2048
- Long text generation: 2048-4096
- Very long content: 4096-32000
Q2: What's the difference between system parameter and system role in messages?
A: In InternLM Claude-like API:
- ✅ Use separate
systemparameter (recommended) - ❌ Do NOT use
{"role": "system"}inmessagesarray
Correct:
{
"system": "You are a helpful assistant",
"messages": [
{"role": "user", "content": "Hello"}
]
}
Incorrect:
{
"messages": [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Hello"}
]
}
Q3: How to handle thinking content returned by the model?
A: InternLM may return thinking content blocks showing the model's reasoning:
result = response.json()
for block in result["content"]:
if block["type"] == "text":
print("Reply:", block["text"])
elif block["type"] == "thinking":
print("Thinking:", block["thinking"])
You can choose to:
- Display to users (increases transparency)
- Log for debugging only
- Ignore completely
Q4: How to calculate request cost?
A: Use the usage field in the response:
result = response.json()
input_tokens = result["usage"]["input_tokens"]
output_tokens = result["usage"]["output_tokens"]
total_tokens = input_tokens + output_tokens
print(f"Input: {input_tokens} tokens")
print(f"Output: {output_tokens} tokens")
print(f"Total: {total_tokens} tokens")
Q5: What stop sequences are supported?
A: You can specify multiple stop sequences; the model stops when it encounters any:
{
"stop_sequences": ["\n\n", "END", "STOP"]
}
Q6: How to choose temperature, top_p, top_k?
A:
| Parameter | Range | Purpose | Recommendations |
|---|---|---|---|
temperature | 0.0-1.0 | Control randomness | 0.7-1.0: Creative tasks 0.0-0.3: Precise tasks |
top_p | 0.0-1.0 | Nucleus sampling | 0.9-1.0: Diversity 0.5-0.9: Balance |
top_k | integer | Top-K sampling | 40-100: Common -1: Disable |
Example:
{
"temperature": 0.9,
"top_p": 0.95,
"top_k": 40
}
Q7: How to handle long conversations?
A: When conversation history gets long:
- Keep recent messages (recommended)
# Keep last 10 turns
recent_messages = messages[-20:] # 2 messages per turn
- Use summarization
# Summarize old conversation for system prompt
system = f"Conversation summary: {summary}\n\nYou are a helpful assistant."
- Monitor token usage
if result["usage"]["input_tokens"] > 30000:
# Trim conversation history
messages = messages[-10:]