API Documentation
API Base URL
/api/v1Compatible with OpenAI SDK. Just replace base_url and api_key to use.
Chat Completions
POST/api/v1/chat/completionsRequest Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | Yes | Model ID |
| messages | array | Yes | Message list |
| stream | boolean | No | Stream output |
| temperature | number | No | Temperature 0-2 |
| max_tokens | integer | No | Max output tokens |
Response Format
{
"id": "chatcmpl-xxx",
"object": "chat.completion",
"model": "deepseek-chat",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello!"
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 5,
"total_tokens": 15
}
}from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="/api/v1"
)
# Non-streaming
response = client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
],
max_tokens=1000
)
print(response.choices[0].message.content)
# Streaming
stream = client.chat.completions.create(
model="deepseek-chat",
messages=[{"role": "user", "content": "Hello!"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")Embeddings
POST/api/v1/embeddingsRequest Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | Yes | Embedding model ID |
| input | string | array | Yes | Input text |
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="/api/v1"
)
response = client.embeddings.create(
model="text-embedding-3-small",
input="Hello world"
)
print(f"Dimensions: {len(response.data[0].embedding)}")
print(f"Vector: {response.data[0].embedding[:5]}...")Model List
GET/api/v1/modelsList all available models. Returns OpenAI-compatible format.
curl /api/v1/models \
-H "Authorization: Bearer YOUR_API_KEY"Error Codes
| HTTP Status | Description | Handling |
|---|---|---|
| 200 | Success | - |
| 400 | Bad request (missing model/messages or other required fields) | Check request body format |
| 401 | Invalid, expired, or disabled API Key | Check Authorization header |
| 402 | Insufficient balance or wallet suspended | Recharge and retry |
| 403 | No access to this model (model allowlist restriction) | Contact admin to enable model access |
| 429 | Rate limit exceeded / Token quota exhausted / Daily limit reached | Reduce request frequency or contact admin to increase limit |
| 500 | Internal server error | Retry later |
| 502 | Upstream provider unavailable | Retry later or switch model |
Rate Limiting
API requests are protected by rate limiting. Limit info is returned via response headers:
| Response Header | Description |
|---|---|
| X-RateLimit-Limit | Max requests per minute |
| X-RateLimit-Remaining | Remaining requests in current window |
| X-RateLimit-Reset | Limit reset time (ISO 8601) |
When exceeded, returns 429 status code and Retry-After: 60 response header.
Frequently Asked Questions
What should I do if I get a 401 error?
A 401 means your API Key is invalid, expired, or disabled. Check: 1) Authorization header format is "Bearer YOUR_API_KEY"; 2) The key is active in your dashboard; 3) The key hasn't expired.
How do I handle a 402 (insufficient balance) error?
A 402 means your wallet balance is insufficient or suspended. Log in to the dashboard to top up your wallet, or contact your agent/admin for a recharge. Service resumes immediately after top-up.
How do I handle 429 rate limit errors?
A 429 means you've exceeded the rate limit. Solutions: 1) Add exponential backoff retry logic; 2) Check the X-RateLimit-Remaining response header to throttle requests; 3) Contact admin to increase your limits.
What's the difference between streaming and non-streaming? Which should I use?
Non-streaming (default) waits for the complete response before returning — good for batch processing. Streaming (stream: true) returns tokens as they're generated — ideal for chat UIs where users see real-time output. Both cost the same.
Can I use multiple models at the same time?
Yes. The same API Key can specify different model parameters in different requests. Switch freely between models without creating separate keys for each.