Skip to main content

Claude-like API

Version: v1.0
Last Updated: 2025-10-24
API Endpoint: https://chat.intern-ai.org.cn/v1/messages


Table of Contents


Quick Start

Your First Request

Python Example

import requests
import json

url = "https://chat.intern-ai.org.cn/v1/messages"
headers = {
"Content-Type": "application/json",
"x-api-key": "your-api-key",
"anthropic-version": "2023-06-01"
}

data = {
"model": "intern-s1",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "Hello, please introduce InternLM"}
]
}

response = requests.post(url, headers=headers, json=data)
result = response.json()

# Extract the reply
if response.status_code == 200:
reply = result["content"][0]["text"]
print(f"Model reply: {reply}")
else:
print(f"Error: {result}")

cURL Example

curl -X POST https://chat.intern-ai.org.cn/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: your-api-key" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "intern-s1",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "Hello, please introduce InternLM"}
]
}'

Response Example

{
"id": "msg_01XYZ...",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Hello! I'm InternLM, a large language model developed by Shanghai AI Laboratory..."
}
],
"model": "intern-s1",
"stop_reason": "end_turn",
"usage": {
"input_tokens": 15,
"output_tokens": 120
}
}

Authentication

All API requests must include authentication information in the request headers.

Header Parameters

ParameterTypeRequiredDescription
Content-TypestringMust be application/json
x-api-keystringYour API key, format: sk-xxxxx
anthropic-versionstringAPI version (optional)

Example

headers = {
"Content-Type": "application/json",
"x-api-key": "sk-your-api-key-here",
"anthropic-version": "2023-06-01"
}

API Endpoints

Create Message

Create a new conversation message and get a model response.

Endpoint: POST /v1/messages

Request Body Parameters

ParameterTypeRequiredDefaultDescription
modelstring-Model name, e.g., intern-s1
max_tokensinteger-Maximum number of tokens to generate, range: 1-32000
messagesarray-Array of conversation messages
systemstring-System prompt defining assistant behavior and role
temperaturenumber0.7Sampling temperature, range: 0.0-1.0, higher = more random
top_pnumber1.0Nucleus sampling parameter, range: 0.0-1.0
top_kinteger-1Top-K sampling parameter
streambooleanfalseEnable streaming output
stop_sequencesarray[]Stop sequences that will halt generation

Request Format

Messages Parameter Details

messages is an array of message objects, each containing:

FieldTypeRequiredDescription
rolestringMessage role, values: user or assistant
contentstring/arrayMessage content, can be string or array of content blocks

Basic Format (String Content)

{
"messages": [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi! How can I help you?"},
{"role": "user", "content": "Tell me about yourself"}
]
}

Advanced Format (Array Content, Multi-modal Support)

{
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/jpeg",
"data": "/9j/4AAQSkZJRg..."
}
}
]
}
]
}

System Parameter Details

The system parameter defines the assistant's behavior, role, and constraints.

Example

{
"system": "You are a professional Python programming assistant, skilled at explaining code and providing best practices. Keep responses professional and concise."
}

Response Format

Success Response (200 OK)

Response Structure

FieldTypeDescription
idstringUnique message identifier
typestringResponse type, always message
rolestringRole, always assistant
contentarrayArray of content blocks
modelstringModel name used
stop_reasonstringReason for stopping, see table below
usageobjectToken usage statistics

stop_reason Values

ValueDescription
end_turnModel naturally finished the response
max_tokensReached max_tokens limit
stop_sequenceEncountered a stop sequence

content Array Elements

FieldTypeDescription
typestringContent type, values: text, thinking
textstringText content (when type is text)
thinkingstringThinking content (when type is thinking)

usage Object

FieldTypeDescription
input_tokensintegerNumber of tokens in input messages
output_tokensintegerNumber of tokens generated

Complete Response Example

{
"id": "msg_01XYZ123ABC",
"type": "message",
"role": "assistant",
"content": [
{
"type": "thinking",
"thinking": "The user is asking about InternLM, I need to briefly introduce its features..."
},
{
"type": "text",
"text": "InternLM is a large language model developed by Shanghai AI Laboratory..."
}
],
"model": "intern-s1",
"stop_reason": "end_turn",
"usage": {
"input_tokens": 158,
"output_tokens": 256
}
}

Streaming

Enable streaming output to receive model-generated content in real-time, improving user experience.

Enable Streaming

Set "stream": true in your request

Python Example

import requests
import json

url = "https://chat.intern-ai.org.cn/v1/messages"
headers = {
"Content-Type": "application/json",
"x-api-key": "your-api-key",
"anthropic-version": "2023-06-01"
}

data = {
"model": "intern-s1",
"max_tokens": 1024,
"stream": True,
"messages": [{"role": "user", "content": "Tell me a story"}]
}

response = requests.post(url, headers=headers, json=data, stream=True)

for line in response.iter_lines():
if line:
line_str = line.decode('utf-8')
if line_str.startswith('data: '):
data_str = line_str[6:]
if data_str == '[DONE]':
break
try:
chunk = json.loads(data_str)
# Process streaming chunk
if chunk.get('type') == 'content_block_delta':
text = chunk.get('delta', {}).get('text', '')
print(text, end='', flush=True)
except json.JSONDecodeError:
pass

cURL Example

curl -X POST https://chat.intern-ai.org.cn/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: your-api-key" \
-H "anthropic-version: 2023-06-01" \
-N \
-d '{
"model": "intern-s1",
"max_tokens": 1024,
"stream": true,
"messages": [{"role": "user", "content": "Tell me a story"}]
}'

Streaming Event Types

Event TypeDescription
message_startMessage started
content_block_startContent block started
content_block_deltaContent block delta (contains actual text)
content_block_stopContent block stopped
message_deltaMessage metadata update
message_stopMessage stopped

Streaming Response Example

event: message_start
data: {"type":"message_start","message":{"id":"msg_01ABC","type":"message","role":"assistant"}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Once"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" upon"}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":150}}

event: message_stop
data: {"type":"message_stop"}

Error Handling

Error Response Format

All error responses contain the following structure:

{
"error": {
"type": "error_type",
"code": "error_code",
"message": "Error description",
"param": "related_parameter (optional)"
}
}

Common Error Codes

HTTP StatusError TypeError CodeDescription
400invalid_request_errorinvalid_requestRequest format error or invalid parameters
400invalid_request_error-20009Model service unavailable (usually parameter combination issue)
401authentication_errorinvalid_api_keyInvalid API key
403permission_errorpermission_deniedPermission denied
429rate_limit_errorrate_limit_exceededRate limit exceeded
500api_errorinternal_server_errorInternal server error

Error Example

{
"error": {
"type": "invalid_request_error",
"code": "invalid_request",
"message": "max_tokens is a required parameter",
"param": "max_tokens"
}
}

Python Error Handling Example

response = requests.post(url, headers=headers, json=data)

if response.status_code == 200:
result = response.json()
print(result["content"][0]["text"])
elif response.status_code == 400:
error = response.json()["error"]
print(f"Request error: {error['message']}")
elif response.status_code == 401:
print("Authentication failed: Invalid API key")
elif response.status_code == 429:
print("Rate limit exceeded, please retry later")
else:
print(f"Unknown error: {response.status_code}")

Differences from OpenAI API

Comparison Overview

FeatureOpenAI APIInternLM Claude-like API
Endpoint/v1/chat/completions/v1/messages
Auth HeaderAuthorization: Bearer sk-xxxx-api-key: sk-xxx
API VersionNot requiredanthropic-version (optional)
Required Paramsmodel, messagesmodel, messages, max_tokens
System PromptIn messages arraySeparate system parameter
Response Formatchoices arraycontent array
Token Statsprompt_tokens, completion_tokensinput_tokens, output_tokens
Stop Reasonfinish_reasonstop_reason

Detailed Differences

1. Endpoint Difference

OpenAI

POST https://api.openai.com/v1/chat/completions

InternLM Claude-like

POST https://chat.intern-ai.org.cn/v1/messages

2. Authentication Difference

OpenAI

headers = {
"Authorization": "Bearer sk-xxxxx",
"Content-Type": "application/json"
}

InternLM Claude-like

headers = {
"x-api-key": "sk-xxxxx",
"Content-Type": "application/json"
# "anthropic-version": "2023-06-01" # Optional
}

3. Request Parameters Difference

OpenAI Request

{
"model": "gpt-4",
"messages": [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Hello"}
],
"temperature": 0.7
}

InternLM Claude-like Request

{
"model": "intern-s1",
"max_tokens": 1024,
"system": "You are a helpful assistant",
"messages": [
{"role": "user", "content": "Hello"}
],
"temperature": 0.7
}

Key Differences:

  • ✅ InternLM API requires max_tokens parameter
  • ✅ InternLM API uses separate system parameter
  • ✅ InternLM API messages does not include system role

4. Response Format Difference

OpenAI Response

{
"id": "chatcmpl-123",
"object": "chat.completion",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 20,
"total_tokens": 30
}
}

InternLM Claude-like Response

{
"id": "msg_123",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Hello! How can I help you?"
}
],
"model": "intern-s1",
"stop_reason": "end_turn",
"usage": {
"input_tokens": 10,
"output_tokens": 20
}
}

Extracting Response Content

# OpenAI
text = response["choices"][0]["message"]["content"]

# InternLM Claude-like
text = response["content"][0]["text"]

5. Streaming Difference

OpenAI Streaming

data: {"choices":[{"delta":{"content":"Hello"}}]}
data: [DONE]

InternLM Claude-like Streaming

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":"Hello"}}

event: message_stop
data: {"type":"message_stop"}

Migration Guide

If you're migrating from OpenAI API to InternLM Claude-like API:

  1. Update endpoint URL
  2. Modify authentication header (from Authorization to x-api-key)
  3. Add max_tokens parameter (required)
  4. Adjust system prompt (move from messages to system parameter)
  5. Update response parsing (from choices to content)

Migration Code Example

# Original OpenAI code
import openai

openai.api_key = "sk-xxxxx"
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are helpful"},
{"role": "user", "content": "Hello"}
]
)
text = response.choices[0].message.content

# Migrated InternLM code
import requests

url = "https://chat.intern-ai.org.cn/v1/messages"
headers = {
"x-api-key": "sk-xxxxx",
"Content-Type": "application/json"
}
data = {
"model": "intern-s1",
"max_tokens": 1024,
"system": "You are helpful",
"messages": [{"role": "user", "content": "Hello"}]
}
response = requests.post(url, headers=headers, json=data)
text = response.json()["content"][0]["text"]

Complete Examples

Example 1: Basic Chat

Python

import requests

def chat_with_intern(user_message):
url = "https://chat.intern-ai.org.cn/v1/messages"
headers = {
"Content-Type": "application/json",
"x-api-key": "your-api-key",
"anthropic-version": "2023-06-01"
}

data = {
"model": "intern-s1",
"max_tokens": 2048,
"messages": [
{"role": "user", "content": user_message}
]
}

response = requests.post(url, headers=headers, json=data)

if response.status_code == 200:
result = response.json()
return result["content"][0]["text"]
else:
return f"Error: {response.json()}"

# Usage
reply = chat_with_intern("Tell me about InternLM")
print(reply)

Example 2: Multi-turn Conversation

Python

def multi_turn_chat():
url = "https://chat.intern-ai.org.cn/v1/messages"
headers = {
"Content-Type": "application/json",
"x-api-key": "your-api-key",
"anthropic-version": "2023-06-01"
}

# Conversation history
messages = [
{"role": "user", "content": "I want to learn Python"},
{"role": "assistant", "content": "Great! Python is perfect for beginners. Where would you like to start?"},
{"role": "user", "content": "Let's start with data types"}
]

data = {
"model": "intern-s1",
"max_tokens": 2048,
"messages": messages
}

response = requests.post(url, headers=headers, json=data)
result = response.json()

return result["content"][0]["text"]

reply = multi_turn_chat()
print(reply)

Example 3: Professional Assistant with System Prompt

Python

def code_reviewer(code):
url = "https://chat.intern-ai.org.cn/v1/messages"
headers = {
"Content-Type": "application/json",
"x-api-key": "your-api-key",
"anthropic-version": "2023-06-01"
}

data = {
"model": "intern-s1",
"max_tokens": 2048,
"system": "You are a senior code review expert. Carefully check code for: 1) Correctness 2) Performance 3) Readability 4) Best practices",
"messages": [
{"role": "user", "content": f"Please review this code:\n\n```python\n{code}\n```"}
]
}

response = requests.post(url, headers=headers, json=data)
return response.json()["content"][0]["text"]

# Usage
code = """
def fibonacci(n):
if n <= 1:
return n
return fibonacci(n-1) + fibonacci(n-2)
"""

review = code_reviewer(code)
print(review)

Example 4: Streaming Output

Python

def stream_chat(user_message):
url = "https://chat.intern-ai.org.cn/v1/messages"
headers = {
"Content-Type": "application/json",
"x-api-key": "your-api-key",
"anthropic-version": "2023-06-01"
}

data = {
"model": "intern-s1",
"max_tokens": 2048,
"stream": True,
"messages": [{"role": "user", "content": user_message}]
}

response = requests.post(url, headers=headers, json=data, stream=True)

print("Model reply: ", end='')
for line in response.iter_lines():
if line:
line_str = line.decode('utf-8')
if line_str.startswith('data: '):
data_str = line_str[6:]
if data_str == '[DONE]':
break
try:
chunk = json.loads(data_str)
if chunk.get('type') == 'content_block_delta':
text = chunk.get('delta', {}).get('text', '')
print(text, end='', flush=True)
except:
pass
print()

# Usage
stream_chat("Write a poem about spring")

FAQ

Q1: Why is max_tokens a required parameter?

A: InternLM Claude-like API requires explicitly specifying the maximum generation length to:

  • Control response cost
  • Prevent unexpectedly long responses
  • Ensure predictable response time

Recommended values:

  • Short answers: 512-1024
  • General conversation: 1024-2048
  • Long text generation: 2048-4096
  • Very long content: 4096-32000

Q2: What's the difference between system parameter and system role in messages?

A: In InternLM Claude-like API:

  • ✅ Use separate system parameter (recommended)
  • Do NOT use {"role": "system"} in messages array

Correct:

{
"system": "You are a helpful assistant",
"messages": [
{"role": "user", "content": "Hello"}
]
}

Incorrect:

{
"messages": [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Hello"}
]
}

Q3: How to handle thinking content returned by the model?

A: InternLM may return thinking content blocks showing the model's reasoning:

result = response.json()
for block in result["content"]:
if block["type"] == "text":
print("Reply:", block["text"])
elif block["type"] == "thinking":
print("Thinking:", block["thinking"])

You can choose to:

  • Display to users (increases transparency)
  • Log for debugging only
  • Ignore completely

Q4: How to calculate request cost?

A: Use the usage field in the response:

result = response.json()
input_tokens = result["usage"]["input_tokens"]
output_tokens = result["usage"]["output_tokens"]
total_tokens = input_tokens + output_tokens

print(f"Input: {input_tokens} tokens")
print(f"Output: {output_tokens} tokens")
print(f"Total: {total_tokens} tokens")

Q5: What stop sequences are supported?

A: You can specify multiple stop sequences; the model stops when it encounters any:

{
"stop_sequences": ["\n\n", "END", "STOP"]
}

Q6: How to choose temperature, top_p, top_k?

A:

ParameterRangePurposeRecommendations
temperature0.0-1.0Control randomness0.7-1.0: Creative tasks
0.0-0.3: Precise tasks
top_p0.0-1.0Nucleus sampling0.9-1.0: Diversity
0.5-0.9: Balance
top_kintegerTop-K sampling40-100: Common
-1: Disable

Example:

{
"temperature": 0.9,
"top_p": 0.95,
"top_k": 40
}

Q7: How to handle long conversations?

A: When conversation history gets long:

  1. Keep recent messages (recommended)
# Keep last 10 turns
recent_messages = messages[-20:] # 2 messages per turn
  1. Use summarization
# Summarize old conversation for system prompt
system = f"Conversation summary: {summary}\n\nYou are a helpful assistant."
  1. Monitor token usage
if result["usage"]["input_tokens"] > 30000:
# Trim conversation history
messages = messages[-10:]