大语言
大语言模型 API
介绍
我们提供与 OpenAI API 标准兼容的 API 服务,方便您轻松集成到现有应用程序中。
支持的 API 列表
- ChatCompletion,支持 streaming 模式和常规模式。
- Completion,支持 streaming 模式和常规模式。
API Base URL
当前支持的所有大语言模型
DeepSeek R1 和 V3 community 版本仅供大家尝鲜,也是全参数满血版模型,稳定性和效果无差异,如需大量调用则须充值并切换到非 community 版本。
请访问 https://ppinfra.com/model-api/product/llm-api。
Python 客户端示例
- ChatCompletion API
- Completion API
Curl 客户端示例
- ChatCompletion API
- Completion API
如果您已经在使用 OpenAI 的 ChatCompletion 或 Completion API,您只需将基础 URL 设置为 https://api.ppinfra.com/v3/openai
,获取并设置您的 API 密钥(请参考获取 API 密钥教程),并根据需求更新模型名称。完成这些步骤后,您已成功接入派欧算力云平台的大语言模型 API 服务。
模型参数
ChatCompletions 和 Completions:
model
, find all the models we support here: https://ppinfra.com/model-api/product/llm-api.messages
, including rolessystem
,user
,assistant
. (only availabe forChatCompletion
Endpoint)content
, the contents of the message.role
, the role of the messages author, in this case system.name
, an optional name for the participant. Provides the model information to differentiate between participants of the same role.
prompt
, the prompt(s) to generate completions for, encoded as a string, array of strings, array of tokens, or array of token arrays. (only availabe forCompletion
Endpoint)max_tokens
, the maximum number of tokens that can be generated in the chat completion.stream
, if set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message.temperature
, what sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.top_p
, an alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this ortemperature
but not both.top_k
, integer that controls the number of top tokens to consider. Set to -1 to consider all tokens.min_p
, float that represents the minimum probability for a token to be considered, relative to the probability of the most likely token. Must be in [0, 1]. Set to 0 to disable this.stop
, up to 4 sequences where the API will stop generating further tokens.n
, how many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.presence_penalty
, float that penalizes new tokens based on whether they appear in the generated text so far. Values > 0 encourage the model to use new tokens, while values < 0 encourage the model to repeat tokens.frequency_penalty
, float that penalizes new tokens based on their frequency in the generated text so far. Values > 0 encourage the model to use new tokens, while values < 0 encourage the model to repeat tokens.repetition_penalty
, float that penalizes new tokens based on whether they appear in the prompt and the generated text so far. Values > 1 encourage the model to use new tokens, while values < 1 encourage the model to repeat tokens.logit_bias
, an optional parameter that modifies the likelihood of specified tokens appearing in a model generated output.