介绍

我们提供与 OpenAI API 标准兼容的 API 服务,方便您轻松集成到现有应用程序中。

支持的 API 列表

API Base URL

https://api.ppinfra.com/v3/openai

当前支持的所有大语言模型

DeepSeek R1 和 V3 community 版本仅供大家尝鲜,也是全参数满血版模型,稳定性和效果无差异,如需大量调用则须充值并切换到非 community 版本

请访问 https://ppinfra.com/model-api/product/llm-api

Python 客户端示例

pip install 'openai>=1.0.0'
  • ChatCompletion API
from openai import OpenAI

client = OpenAI(
    base_url="https://api.ppinfra.com/v3/openai",
    # 请通过 https://ppinfra.com/settings#key-management 获取 API 密钥。
    api_key="{{API 密钥}}",
)

model = "deepseek/deepseek-r1"
stream = True  # 或 False
max_tokens = 512

chat_completion_res = client.chat.completions.create(
    model=model,
    messages=[
        {
            "role": "system",
            "content": "你是一个专业的 AI 文档助手。",
        },
        {
            "role": "user",
            "content": "派欧算力云提供 GPU 云产品能用于哪些场景?",
        }
    ],
    stream=stream,
    max_tokens=max_tokens,
)

if stream:
    for chunk in chat_completion_res:
        print(chunk.choices[0].delta.content or "", end="")
else:
    print(chat_completion_res.choices[0].message.content)
  • Completion API
from openai import OpenAI

client = OpenAI(
    base_url="https://api.ppinfra.com/v3/openai",
    # 请通过 https://ppinfra.com/settings#key-management 获取 API 密钥。
    api_key="{{API 密钥}}",
)

model = "deepseek/deepseek-r1"
stream = True  # 或 False
max_tokens = 512

completion_res = client.completions.create(
    model=model,
    prompt="派欧算力云提供 GPU 云产品能用于哪些场景?",
    stream=stream,
    max_tokens=max_tokens,
)

if stream:
    for chunk in completion_res:
        print(chunk.choices[0].text or "", end="")
else:
    print(completion_res.choices[0].text)

Curl 客户端示例

  • ChatCompletion API
# 请通过 https://ppinfra.com/settings#key-management 获取 API 密钥。
export API_KEY="{API 密钥}"

curl "https://api.ppinfra.com/v3/openai/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${API_KEY}" \
  -d '{
    "model": "deepseek/deepseek-r1",
    "messages": [
        {
            "role": "system",
            "content": "你是一个专业的 AI 文档助手。"
        },
       {
        "role": "user",
            "content": "派欧算力云提供 GPU 云产品能用于哪些场景?"
        }
    ],
    "max_tokens": 512
}'
  • Completion API
# 请通过 https://ppinfra.com/settings#key-management 获取 API 密钥。
export API_KEY="{API 密钥}"

curl "https://api.ppinfra.com/v3/openai/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${API_KEY}" \
  -d '{
    "model": "deepseek/deepseek-r1",
    "prompt": "派欧算力云提供 GPU 云产品能用于哪些场景?",
    "max_tokens": 512
}'

如果您已经在使用 OpenAI 的 ChatCompletion 或 Completion API,您只需将基础 URL 设置为 https://api.ppinfra.com/v3/openai,获取并设置您的 API 密钥(请参考获取 API 密钥教程),并根据需求更新模型名称。完成这些步骤后,您已成功接入派欧算力云平台的大语言模型 API 服务。

模型参数

ChatCompletions 和 Completions:

  • model, find all the models we support here: https://ppinfra.com/model-api/product/llm-api.
  • messages, including roles system, user, assistant. (only availabe for ChatCompletion Endpoint)
    • content, the contents of the message.
    • role, the role of the messages author, in this case system.
    • name, an optional name for the participant. Provides the model information to differentiate between participants of the same role.
  • prompt, the prompt(s) to generate completions for, encoded as a string, array of strings, array of tokens, or array of token arrays. (only availabe for Completion Endpoint)
  • max_tokens, the maximum number of tokens that can be generated in the chat completion.
  • stream, if set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message.
  • temperature, what sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
  • top_p, an alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
  • top_k, integer that controls the number of top tokens to consider. Set to -1 to consider all tokens.
  • min_p, float that represents the minimum probability for a token to be considered, relative to the probability of the most likely token. Must be in [0, 1]. Set to 0 to disable this.
  • stop, up to 4 sequences where the API will stop generating further tokens.
  • n, how many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
  • presence_penalty, float that penalizes new tokens based on whether they appear in the generated text so far. Values > 0 encourage the model to use new tokens, while values < 0 encourage the model to repeat tokens.
  • frequency_penalty, float that penalizes new tokens based on their frequency in the generated text so far. Values > 0 encourage the model to use new tokens, while values < 0 encourage the model to repeat tokens.
  • repetition_penalty, float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values > 1 encourage the model to use new tokens, while values < 1 encourage the model to repeat tokens.
  • logit_bias, an optional parameter that modifies the likelihood of specified tokens appearing in a model generated output.