Claude-like API

Version: v1.0
Last Updated: 2025-10-24
API Endpoint: https://chat.intern-ai.org.cn/v1/messages

Quick Start
Authentication
API Endpoints
Request Format
Response Format
Streaming
Error Handling
Differences from OpenAI API
Complete Examples
FAQ

Quick Start

Your First Request

Python Example

import requests
import json

url = "https://chat.intern-ai.org.cn/v1/messages"
headers = {
    "Content-Type": "application/json",
    "x-api-key": "your-api-key",
    "anthropic-version": "2023-06-01"
}

data = {
    "model": "intern-s1",
    "max_tokens": 1024,
    "messages": [
        {"role": "user", "content": "Hello, please introduce InternLM"}
    ]
}

response = requests.post(url, headers=headers, json=data)
result = response.json()

# Extract the reply
if response.status_code == 200:
    reply = result["content"][0]["text"]
    print(f"Model reply: {reply}")
else:
    print(f"Error: {result}")

cURL Example

curl -X POST https://chat.intern-ai.org.cn/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: your-api-key" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "intern-s1",
    "max_tokens": 1024,
    "messages": [
      {"role": "user", "content": "Hello, please introduce InternLM"}
    ]
  }'

Response Example

{
  "id": "msg_01XYZ...",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello! I'm InternLM, a large language model developed by Shanghai AI Laboratory..."
    }
  ],
  "model": "intern-s1",
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 15,
    "output_tokens": 120
  }
}

Authentication

All API requests must include authentication information in the request headers.

Header Parameters

Parameter	Type	Required	Description
`Content-Type`	string	✅	Must be `application/json`
`x-api-key`	string	✅	Your API key, format: `sk-xxxxx`
`anthropic-version`	string	❌	API version (optional)

Example

headers = {
    "Content-Type": "application/json",
    "x-api-key": "sk-your-api-key-here",
    "anthropic-version": "2023-06-01"
}

API Endpoints

Create Message

Create a new conversation message and get a model response.

Endpoint: POST /v1/messages

Request Body Parameters

Parameter	Type	Required	Default	Description
`model`	string	✅	-	Model name, e.g., `intern-s1`
`max_tokens`	integer	✅	-	Maximum number of tokens to generate, range: 1-32000
`messages`	array	✅	-	Array of conversation messages
`system`	string	❌	-	System prompt defining assistant behavior and role
`temperature`	number	❌	0.7	Sampling temperature, range: 0.0-1.0, higher = more random
`top_p`	number	❌	1.0	Nucleus sampling parameter, range: 0.0-1.0
`top_k`	integer	❌	-1	Top-K sampling parameter
`stream`	boolean	❌	false	Enable streaming output
`stop_sequences`	array	❌	[]	Stop sequences that will halt generation

Request Format

Messages Parameter Details

messages is an array of message objects, each containing:

Field	Type	Required	Description
`role`	string	✅	Message role, values: `user` or `assistant`
`content`	string/array	✅	Message content, can be string or array of content blocks

Basic Format (String Content)

{
  "messages": [
    {"role": "user", "content": "Hello"},
    {"role": "assistant", "content": "Hi! How can I help you?"},
    {"role": "user", "content": "Tell me about yourself"}
  ]
}

Advanced Format (Array Content, Multi-modal Support)

{
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What's in this image?"},
        {
          "type": "image",
          "source": {
            "type": "base64",
            "media_type": "image/jpeg",
            "data": "/9j/4AAQSkZJRg..."
          }
        }
      ]
    }
  ]
}

System Parameter Details

The system parameter defines the assistant's behavior, role, and constraints.

Example

{
  "system": "You are a professional Python programming assistant, skilled at explaining code and providing best practices. Keep responses professional and concise."
}

Response Format

Success Response (200 OK)

Response Structure

Field	Type	Description
`id`	string	Unique message identifier
`type`	string	Response type, always `message`
`role`	string	Role, always `assistant`
`content`	array	Array of content blocks
`model`	string	Model name used
`stop_reason`	string	Reason for stopping, see table below
`usage`	object	Token usage statistics

stop_reason Values

Value	Description
`end_turn`	Model naturally finished the response
`max_tokens`	Reached max_tokens limit
`stop_sequence`	Encountered a stop sequence

content Array Elements

Field	Type	Description
`type`	string	Content type, values: `text`, `thinking`
`text`	string	Text content (when type is text)
`thinking`	string	Thinking content (when type is thinking)

usage Object

Field	Type	Description
`input_tokens`	integer	Number of tokens in input messages
`output_tokens`	integer	Number of tokens generated

Complete Response Example

{
  "id": "msg_01XYZ123ABC",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "thinking",
      "thinking": "The user is asking about InternLM, I need to briefly introduce its features..."
    },
    {
      "type": "text",
      "text": "InternLM is a large language model developed by Shanghai AI Laboratory..."
    }
  ],
  "model": "intern-s1",
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 158,
    "output_tokens": 256
  }
}

Streaming

Enable streaming output to receive model-generated content in real-time, improving user experience.

Enable Streaming

Set "stream": true in your request

Python Example

import requests
import json

url = "https://chat.intern-ai.org.cn/v1/messages"
headers = {
    "Content-Type": "application/json",
    "x-api-key": "your-api-key",
    "anthropic-version": "2023-06-01"
}

data = {
    "model": "intern-s1",
    "max_tokens": 1024,
    "stream": True,
    "messages": [{"role": "user", "content": "Tell me a story"}]
}

response = requests.post(url, headers=headers, json=data, stream=True)

for line in response.iter_lines():
    if line:
        line_str = line.decode('utf-8')
        if line_str.startswith('data: '):
            data_str = line_str[6:]
            if data_str == '[DONE]':
                break
            try:
                chunk = json.loads(data_str)
                # Process streaming chunk
                if chunk.get('type') == 'content_block_delta':
                    text = chunk.get('delta', {}).get('text', '')
                    print(text, end='', flush=True)
            except json.JSONDecodeError:
                pass

cURL Example

curl -X POST https://chat.intern-ai.org.cn/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: your-api-key" \
  -H "anthropic-version: 2023-06-01" \
  -N \
  -d '{
    "model": "intern-s1",
    "max_tokens": 1024,
    "stream": true,
    "messages": [{"role": "user", "content": "Tell me a story"}]
  }'

Streaming Event Types

Event Type	Description
`message_start`	Message started
`content_block_start`	Content block started
`content_block_delta`	Content block delta (contains actual text)
`content_block_stop`	Content block stopped
`message_delta`	Message metadata update
`message_stop`	Message stopped

Streaming Response Example

event: message_start
data: {"type":"message_start","message":{"id":"msg_01ABC","type":"message","role":"assistant"}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Once"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" upon"}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":150}}

event: message_stop
data: {"type":"message_stop"}

Error Handling

Error Response Format

All error responses contain the following structure:

{
  "error": {
    "type": "error_type",
    "code": "error_code",
    "message": "Error description",
    "param": "related_parameter (optional)"
  }
}

Common Error Codes

HTTP Status	Error Type	Error Code	Description
400	`invalid_request_error`	`invalid_request`	Request format error or invalid parameters
400	`invalid_request_error`	`-20009`	Model service unavailable (usually parameter combination issue)
401	`authentication_error`	`invalid_api_key`	Invalid API key
403	`permission_error`	`permission_denied`	Permission denied
429	`rate_limit_error`	`rate_limit_exceeded`	Rate limit exceeded
500	`api_error`	`internal_server_error`	Internal server error

Error Example

{
  "error": {
    "type": "invalid_request_error",
    "code": "invalid_request",
    "message": "max_tokens is a required parameter",
    "param": "max_tokens"
  }
}

Python Error Handling Example

response = requests.post(url, headers=headers, json=data)

if response.status_code == 200:
    result = response.json()
    print(result["content"][0]["text"])
elif response.status_code == 400:
    error = response.json()["error"]
    print(f"Request error: {error['message']}")
elif response.status_code == 401:
    print("Authentication failed: Invalid API key")
elif response.status_code == 429:
    print("Rate limit exceeded, please retry later")
else:
    print(f"Unknown error: {response.status_code}")

Differences from OpenAI API

Comparison Overview

Feature	OpenAI API	InternLM Claude-like API
Endpoint	`/v1/chat/completions`	`/v1/messages`
Auth Header	`Authorization: Bearer sk-xxx`	`x-api-key: sk-xxx`
API Version	Not required	`anthropic-version` (optional)
Required Params	`model`, `messages`	`model`, `messages`, `max_tokens`
System Prompt	In `messages` array	Separate `system` parameter
Response Format	`choices` array	`content` array
Token Stats	`prompt_tokens`, `completion_tokens`	`input_tokens`, `output_tokens`
Stop Reason	`finish_reason`	`stop_reason`

Detailed Differences

1. Endpoint Difference

OpenAI

POST https://api.openai.com/v1/chat/completions

InternLM Claude-like

POST https://chat.intern-ai.org.cn/v1/messages

2. Authentication Difference

OpenAI

headers = {
    "Authorization": "Bearer sk-xxxxx",
    "Content-Type": "application/json"
}

InternLM Claude-like

headers = {
    "x-api-key": "sk-xxxxx",
    "Content-Type": "application/json"
    # "anthropic-version": "2023-06-01"  # Optional
}

3. Request Parameters Difference

OpenAI Request

{
  "model": "gpt-4",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "Hello"}
  ],
  "temperature": 0.7
}

InternLM Claude-like Request

{
  "model": "intern-s1",
  "max_tokens": 1024,
  "system": "You are a helpful assistant",
  "messages": [
    {"role": "user", "content": "Hello"}
  ],
  "temperature": 0.7
}

Key Differences:

✅ InternLM API requires max_tokens parameter
✅ InternLM API uses separate system parameter
✅ InternLM API messages does not include system role

4. Response Format Difference

OpenAI Response

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 20,
    "total_tokens": 30
  }
}

InternLM Claude-like Response

{
  "id": "msg_123",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello! How can I help you?"
    }
  ],
  "model": "intern-s1",
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 10,
    "output_tokens": 20
  }
}

Extracting Response Content

# OpenAI
text = response["choices"][0]["message"]["content"]

# InternLM Claude-like
text = response["content"][0]["text"]

5. Streaming Difference

OpenAI Streaming

data: {"choices":[{"delta":{"content":"Hello"}}]}
data: [DONE]

InternLM Claude-like Streaming

event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":"Hello"}}

event: message_stop
data: {"type":"message_stop"}

Migration Guide

If you're migrating from OpenAI API to InternLM Claude-like API:

Update endpoint URL
Modify authentication header (from Authorization to x-api-key)
Add max_tokens parameter (required)
Adjust system prompt (move from messages to system parameter)
Update response parsing (from choices to content)

Migration Code Example

# Original OpenAI code
import openai

openai.api_key = "sk-xxxxx"
response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": "You are helpful"},
        {"role": "user", "content": "Hello"}
    ]
)
text = response.choices[0].message.content

# Migrated InternLM code
import requests

url = "https://chat.intern-ai.org.cn/v1/messages"
headers = {
    "x-api-key": "sk-xxxxx",
    "Content-Type": "application/json"
}
data = {
    "model": "intern-s1",
    "max_tokens": 1024,
    "system": "You are helpful",
    "messages": [{"role": "user", "content": "Hello"}]
}
response = requests.post(url, headers=headers, json=data)
text = response.json()["content"][0]["text"]

Complete Examples

Example 1: Basic Chat

Python

import requests

def chat_with_intern(user_message):
    url = "https://chat.intern-ai.org.cn/v1/messages"
    headers = {
        "Content-Type": "application/json",
        "x-api-key": "your-api-key",
        "anthropic-version": "2023-06-01"
    }
    
    data = {
        "model": "intern-s1",
        "max_tokens": 2048,
        "messages": [
            {"role": "user", "content": user_message}
        ]
    }
    
    response = requests.post(url, headers=headers, json=data)
    
    if response.status_code == 200:
        result = response.json()
        return result["content"][0]["text"]
    else:
        return f"Error: {response.json()}"

# Usage
reply = chat_with_intern("Tell me about InternLM")
print(reply)

Example 2: Multi-turn Conversation

Python

def multi_turn_chat():
    url = "https://chat.intern-ai.org.cn/v1/messages"
    headers = {
        "Content-Type": "application/json",
        "x-api-key": "your-api-key",
        "anthropic-version": "2023-06-01"
    }
    
    # Conversation history
    messages = [
        {"role": "user", "content": "I want to learn Python"},
        {"role": "assistant", "content": "Great! Python is perfect for beginners. Where would you like to start?"},
        {"role": "user", "content": "Let's start with data types"}
    ]
    
    data = {
        "model": "intern-s1",
        "max_tokens": 2048,
        "messages": messages
    }
    
    response = requests.post(url, headers=headers, json=data)
    result = response.json()
    
    return result["content"][0]["text"]

reply = multi_turn_chat()
print(reply)

Example 3: Professional Assistant with System Prompt

Python

def code_reviewer(code):
    url = "https://chat.intern-ai.org.cn/v1/messages"
    headers = {
        "Content-Type": "application/json",
        "x-api-key": "your-api-key",
        "anthropic-version": "2023-06-01"
    }
    
    data = {
        "model": "intern-s1",
        "max_tokens": 2048,
        "system": "You are a senior code review expert. Carefully check code for: 1) Correctness 2) Performance 3) Readability 4) Best practices",
        "messages": [
            {"role": "user", "content": f"Please review this code:\n\n```python\n{code}\n```"}
        ]
    }
    
    response = requests.post(url, headers=headers, json=data)
    return response.json()["content"][0]["text"]

# Usage
code = """
def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n-1) + fibonacci(n-2)
"""

review = code_reviewer(code)
print(review)

Example 4: Streaming Output

Python

def stream_chat(user_message):
    url = "https://chat.intern-ai.org.cn/v1/messages"
    headers = {
        "Content-Type": "application/json",
        "x-api-key": "your-api-key",
        "anthropic-version": "2023-06-01"
    }
    
    data = {
        "model": "intern-s1",
        "max_tokens": 2048,
        "stream": True,
        "messages": [{"role": "user", "content": user_message}]
    }
    
    response = requests.post(url, headers=headers, json=data, stream=True)
    
    print("Model reply: ", end='')
    for line in response.iter_lines():
        if line:
            line_str = line.decode('utf-8')
            if line_str.startswith('data: '):
                data_str = line_str[6:]
                if data_str == '[DONE]':
                    break
                try:
                    chunk = json.loads(data_str)
                    if chunk.get('type') == 'content_block_delta':
                        text = chunk.get('delta', {}).get('text', '')
                        print(text, end='', flush=True)
                except:
                    pass
    print()

# Usage
stream_chat("Write a poem about spring")

FAQ

Q1: Why is max_tokens a required parameter?

A: InternLM Claude-like API requires explicitly specifying the maximum generation length to:

Control response cost
Prevent unexpectedly long responses
Ensure predictable response time

Recommended values:

Short answers: 512-1024
General conversation: 1024-2048
Long text generation: 2048-4096
Very long content: 4096-32000

Q2: What's the difference between system parameter and system role in messages?

A: In InternLM Claude-like API:

✅ Use separate system parameter (recommended)
❌ Do NOT use {"role": "system"} in messages array

Correct:

{
  "system": "You are a helpful assistant",
  "messages": [
    {"role": "user", "content": "Hello"}
  ]
}

Incorrect:

{
  "messages": [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "Hello"}
  ]
}

Q3: How to handle thinking content returned by the model?

A: InternLM may return thinking content blocks showing the model's reasoning:

result = response.json()
for block in result["content"]:
    if block["type"] == "text":
        print("Reply:", block["text"])
    elif block["type"] == "thinking":
        print("Thinking:", block["thinking"])

You can choose to:

Display to users (increases transparency)
Log for debugging only
Ignore completely

Q4: How to calculate request cost?

A: Use the usage field in the response:

result = response.json()
input_tokens = result["usage"]["input_tokens"]
output_tokens = result["usage"]["output_tokens"]
total_tokens = input_tokens + output_tokens

print(f"Input: {input_tokens} tokens")
print(f"Output: {output_tokens} tokens")
print(f"Total: {total_tokens} tokens")

Q5: What stop sequences are supported?

A: You can specify multiple stop sequences; the model stops when it encounters any:

{
  "stop_sequences": ["\n\n", "END", "STOP"]
}

Q6: How to choose temperature, top_p, top_k?

Parameter	Range	Purpose	Recommendations
`temperature`	0.0-1.0	Control randomness	0.7-1.0: Creative tasks 0.0-0.3: Precise tasks
`top_p`	0.0-1.0	Nucleus sampling	0.9-1.0: Diversity 0.5-0.9: Balance
`top_k`	integer	Top-K sampling	40-100: Common -1: Disable

Example:

{
  "temperature": 0.9,
  "top_p": 0.95,
  "top_k": 40
}

Q7: How to handle long conversations?

A: When conversation history gets long:

Keep recent messages (recommended)

# Keep last 10 turns
recent_messages = messages[-20:]  # 2 messages per turn

Use summarization

# Summarize old conversation for system prompt
system = f"Conversation summary: {summary}\n\nYou are a helpful assistant."

Monitor token usage

if result["usage"]["input_tokens"] > 30000:
    # Trim conversation history
    messages = messages[-10:]

Claude-like API

Table of Contents​

Quick Start​

Your First Request​

Authentication​

Header Parameters​

API Endpoints​

Create Message​

Request Format​

Messages Parameter Details​

System Parameter Details​

Response Format​

Success Response (200 OK)​

Streaming​

Enable Streaming​

Streaming Event Types​

Error Handling​

Error Response Format​

Common Error Codes​

Differences from OpenAI API​

Comparison Overview​

Detailed Differences​

1. Endpoint Difference​

2. Authentication Difference​

3. Request Parameters Difference​

4. Response Format Difference​

5. Streaming Difference​

Migration Guide​

Complete Examples​

Example 1: Basic Chat​

Example 2: Multi-turn Conversation​

Example 3: Professional Assistant with System Prompt​

Example 4: Streaming Output​

FAQ​

Q1: Why is max_tokens a required parameter?​

Q2: What's the difference between system parameter and system role in messages?​

Q3: How to handle thinking content returned by the model?​

Q4: How to calculate request cost?​

Q5: What stop sequences are supported?​

Q6: How to choose temperature, top_p, top_k?​

Q7: How to handle long conversations?​

Table of Contents

Quick Start

Your First Request

Authentication

Header Parameters

API Endpoints

Create Message

Request Format

Messages Parameter Details

System Parameter Details

Response Format

Success Response (200 OK)

Streaming

Enable Streaming

Streaming Event Types

Error Handling

Error Response Format

Common Error Codes

Differences from OpenAI API

Comparison Overview

Detailed Differences

1. Endpoint Difference

2. Authentication Difference

3. Request Parameters Difference

4. Response Format Difference

5. Streaming Difference

Migration Guide

Complete Examples

Example 1: Basic Chat

Example 2: Multi-turn Conversation

Example 3: Professional Assistant with System Prompt

Example 4: Streaming Output

FAQ

Q1: Why is max_tokens a required parameter?

Q2: What's the difference between system parameter and system role in messages?

Q3: How to handle thinking content returned by the model?

Q4: How to calculate request cost?

Q5: What stop sequences are supported?

Q6: How to choose temperature, top_p, top_k?

Q7: How to handle long conversations?