接口概览
Gemini 提供两种调用方式:
| 方式 | 路径 | 适用场景 |
|---|
| OpenAI 兼容 | /v1/chat/completions | 推荐,便于多云切换 |
| 原生接口 | /v1beta/models/{model}:generateContent | 需要 Gemini 特有功能 |
所有接口统一使用 Authorization: Bearer <Token> 认证。
推荐使用 OpenAI 兼容接口,代码可在不同模型供应商间无缝切换。
快速开始
OpenAI 兼容方式(推荐)
curl -X POST "$BASE_URL/v1/chat/completions" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-2.5-flash",
"messages": [
{"role": "system", "content": "你是一个友好的助手"},
{"role": "user", "content": "你好"}
]
}'
原生接口方式
curl -X POST "$BASE_URL/v1beta/models/gemini-2.5-flash:generateContent" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"contents": [{
"parts": [{"text": "你好"}]
}]
}'
流式输出
OpenAI 兼容
curl -N "$BASE_URL/v1/chat/completions" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-2.5-flash",
"messages": [{"role": "user", "content": "写一首诗"}],
"stream": true
}'
原生接口
curl -N "$BASE_URL/v1beta/models/gemini-2.5-flash:streamGenerateContent?alt=sse" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"contents": [{"parts": [{"text": "写一首诗"}]}]
}'
多模态输入
图片理解
curl -X POST "$BASE_URL/v1/chat/completions" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-2.5-flash",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "描述这张图片"},
{"type": "image_url", "image_url": {"url": "data:image/png;base64,<BASE64>"}}
]
}]
}'
音频理解
curl -X POST "$BASE_URL/v1beta/models/gemini-2.5-flash:generateContent" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"contents": [{
"parts": [
{"text": "转写这段音频"},
{"inline_data": {"mime_type": "audio/mp3", "data": "<BASE64>"}}
]
}]
}'
视频分析(gemini-3-pro-preview-file)
gemini-3-pro-preview-file 模型支持通过 URL 直接分析视频文件。详细用法请参阅 视频分析 文档。
思考模式(Reasoning)
Gemini 2.5 和 3 系列支持思考推理能力。
OpenAI 兼容方式
curl -X POST "$BASE_URL/v1/chat/completions" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-2.5-flash",
"messages": [{"role": "user", "content": "逐步推理 23*47 的结果"}],
"reasoning_effort": "medium"
}'
reasoning_effort 可选值:low、medium、high
原生接口方式
Gemini 3 系列使用 thinkingLevel:
{
"generationConfig": {
"thinkingConfig": {
"thinkingLevel": "high"
}
}
}
Gemini 2.5 系列使用 thinkingBudget:
{
"generationConfig": {
"thinkingConfig": {
"thinkingBudget": 8192
}
}
}
返回思考过程
curl -X POST "$BASE_URL/v1/chat/completions" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-2.5-flash",
"messages": [{"role": "user", "content": "解释量子纠缠"}],
"reasoning_effort": "high",
"extra_body": {
"google": {
"thinking_config": {"include_thoughts": true}
}
}
}'
思考内容会在响应的 reasoning_content 字段中返回。
工具调用
curl -X POST "$BASE_URL/v1/chat/completions" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-2.5-flash",
"messages": [{"role": "user", "content": "北京今天天气怎么样?"}],
"tools": [{
"type": "function",
"function": {
"name": "get_weather",
"description": "获取城市天气",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "城市名"}
},
"required": ["city"]
}
}
}]
}'
JSON 模式
curl -X POST "$BASE_URL/v1/chat/completions" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-2.5-flash",
"messages": [{"role": "user", "content": "生成一个用户信息 JSON"}],
"response_format": {"type": "json_object"}
}'
生成参数
| 参数 | 说明 | 默认值 |
|---|
temperature | 随机性,0-2 | 1.0 |
max_tokens | 最大输出 token 数 | 模型默认 |
top_p | 核采样概率 | 0.95 |
stop | 停止序列 | - |
Gemini 3 建议保持 temperature 为 1.0,过低可能导致推理性能下降。
SDK 示例
Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
api_key="your-token",
base_url="https://models.kapon.cloud/v1"
)
response = client.chat.completions.create(
model="gemini-2.5-flash",
messages=[{"role": "user", "content": "你好"}]
)
print(response.choices[0].message.content)
Python (Google SDK)
from google import genai
client = genai.Client(
api_key="your-token",
http_options={"base_url": "https://models.kapon.cloud"}
)
response = client.models.generate_content(
model="gemini-2.5-flash",
contents="你好"
)
print(response.text)