API Documentation

API Base URL

/api/v1

Compatible with OpenAI SDK. Just replace base_url and api_key to use.

Chat Completions

POST/api/v1/chat/completions

Request Parameters

ParameterTypeRequiredDescription
modelstringYesModel ID
messagesarrayYesMessage list
streambooleanNoStream output
temperaturenumberNoTemperature 0-2
max_tokensintegerNoMax output tokens

Response Format

{
  "id": "chatcmpl-xxx",
  "object": "chat.completion",
  "model": "deepseek-chat",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Hello!"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 5,
    "total_tokens": 15
  }
}
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="/api/v1"
)

# Non-streaming
response = client.chat.completions.create(
    model="deepseek-chat",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ],
    max_tokens=1000
)
print(response.choices[0].message.content)

# Streaming
stream = client.chat.completions.create(
    model="deepseek-chat",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Embeddings

POST/api/v1/embeddings

Request Parameters

ParameterTypeRequiredDescription
modelstringYesEmbedding model ID
inputstring | arrayYesInput text
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="/api/v1"
)

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="Hello world"
)

print(f"Dimensions: {len(response.data[0].embedding)}")
print(f"Vector: {response.data[0].embedding[:5]}...")

Model List

GET/api/v1/models

List all available models. Returns OpenAI-compatible format.

curl /api/v1/models \
  -H "Authorization: Bearer YOUR_API_KEY"

Error Codes

HTTP StatusDescriptionHandling
200Success-
400Bad request (missing model/messages or other required fields)Check request body format
401Invalid, expired, or disabled API KeyCheck Authorization header
402Insufficient balance or wallet suspendedRecharge and retry
403No access to this model (model allowlist restriction)Contact admin to enable model access
429Rate limit exceeded / Token quota exhausted / Daily limit reachedReduce request frequency or contact admin to increase limit
500Internal server errorRetry later
502Upstream provider unavailableRetry later or switch model

Rate Limiting

API requests are protected by rate limiting. Limit info is returned via response headers:

Response HeaderDescription
X-RateLimit-LimitMax requests per minute
X-RateLimit-RemainingRemaining requests in current window
X-RateLimit-ResetLimit reset time (ISO 8601)

When exceeded, returns 429 status code and Retry-After: 60 response header.

Frequently Asked Questions

What should I do if I get a 401 error?

A 401 means your API Key is invalid, expired, or disabled. Check: 1) Authorization header format is "Bearer YOUR_API_KEY"; 2) The key is active in your dashboard; 3) The key hasn't expired.

How do I handle a 402 (insufficient balance) error?

A 402 means your wallet balance is insufficient or suspended. Log in to the dashboard to top up your wallet, or contact your agent/admin for a recharge. Service resumes immediately after top-up.

How do I handle 429 rate limit errors?

A 429 means you've exceeded the rate limit. Solutions: 1) Add exponential backoff retry logic; 2) Check the X-RateLimit-Remaining response header to throttle requests; 3) Contact admin to increase your limits.

What's the difference between streaming and non-streaming? Which should I use?

Non-streaming (default) waits for the complete response before returning — good for batch processing. Streaming (stream: true) returns tokens as they're generated — ideal for chat UIs where users see real-time output. Both cost the same.

Can I use multiple models at the same time?

Yes. The same API Key can specify different model parameters in different requests. Switch freely between models without creating separate keys for each.