17 Commits

Author SHA1 Message Date
fawney19
3bbc1c6b66 feat: add provider compatibility error detection for intelligent failover
- Introduce ProviderCompatibilityException for unsupported parameter/feature errors
- Add COMPATIBILITY_ERROR_PATTERNS to detect provider-specific limitations
- Implement _is_compatibility_error() method in ErrorClassifier
- Prioritize compatibility error checking before client error validation
- Remove 'max_tokens' from CLIENT_ERROR_PATTERNS as it can indicate compatibility issues
- Enable automatic failover when provider doesn't support requested features
- Improve error classification accuracy with pattern matching for common compatibility issues
2025-12-19 13:28:26 +08:00
fawney19
c69a0a8506 refactor: remove stream smoothing config from system settings and improve base image caching
- Remove stream_smoothing configuration from SystemConfigService (moved to handler default)
- Remove stream smoothing UI controls from admin settings page
- Add AdminClearSingleAffinityAdapter for targeted cache invalidation
- Add clearSingleAffinity API endpoint to clear specific affinity cache entries
- Include global_model_id in affinity list response for UI deletion support
- Improve CI/CD workflow with hash-based base image change detection
- Add hash label to base image for reliable cache invalidation detection
- Use remote image inspection to determine if base image rebuild is needed
- Include Dockerfile.base in hash calculation for proper dependency tracking
2025-12-19 13:09:56 +08:00
fawney19
1fae202bde Merge pull request #30 from AAEE86/master
chore: Modify the order of API format enumeration
2025-12-19 12:34:22 +08:00
fawney19
b9a26c4550 fix: add SETUPTOOLS_SCM_PRETEND_VERSION for CI builds 2025-12-19 12:01:19 +08:00
AAEE86
e42bd35d48 chore: Modify the order of API format enumeration - Move CLAUDE_CLI before OPENAI 2025-12-19 11:44:10 +08:00
fawney19
f22a073fd9 fix: rebuild app image when base image changes during deployment
- Track BASE_REBUILT flag to detect base image rebuilds
- Force app image rebuild when base image is rebuilt
- Prevents stale app images built with outdated base images
- Ensures consistent deployment when base dependencies change
2025-12-19 11:32:43 +08:00
fawney19
5c7ad089d2 fix: disable nginx buffering for streaming responses
- Add X-Accel-Buffering: no header to prevent nginx from buffering streamed content
- Ensures immediate delivery of each chunk without proxy buffering delays
- Improves real-time streaming performance and responsiveness
- Applies to both production and local Dockerfiles
2025-12-19 11:26:15 +08:00
fawney19
97425ac68f refactor: make stream smoothing parameters configurable and add models cache invalidation
- Move stream smoothing parameters (chunk_size, delay_ms) to database config
- Remove hardcoded stream smoothing constants from StreamProcessor
- Simplify dynamic delay calculation by using config values directly
- Add invalidate_models_list_cache() function to clear /v1/models endpoint cache
- Call cache invalidation on model create, update, delete, and bulk operations
- Update admin UI to allow runtime configuration of smoothing parameters
- Improve model listing freshness when models are modified
2025-12-19 11:03:46 +08:00
fawney19
912f6643e2 tune: adjust stream smoothing parameters for better user experience
- Increase chunk size from 5 to 20 characters for fewer delays
- Reduce min delay from 15ms to 8ms for faster playback
- Reduce max delay from 24ms to 15ms for better responsiveness
- Adjust text thresholds to better differentiate content types
- Apply parameter tuning to both StreamProcessor and _LightweightSmoother
2025-12-19 09:51:09 +08:00
fawney19
6c0373fda6 refactor: simplify text splitting logic in stream processor
- Remove complex conditional logic for short/medium/long text differentiation
- Unify text splitting to always use consistent CHUNK_SIZE-based splitting
- Rely on dynamic delay calculation for output speed adjustment
- Reduce code complexity in both main smoother and lightweight smoother
2025-12-19 09:48:11 +08:00
fawney19
070121717d refactor: consolidate stream smoothing into StreamProcessor with intelligent timing
- Move StreamSmoother functionality directly into StreamProcessor for better integration
- Create ContentExtractor strategy pattern for format-agnostic content extraction
- Implement intelligent dynamic delay calculation based on text length
- Support three text length tiers: short (char-by-char), medium (chunked), long (chunked)
- Remove manual chunk_size and delay_ms configuration - now auto-calculated
- Simplify admin UI to single toggle switch with auto timing adjustment
- Extract format detection logic to reusable content_extractors module
- Improve code maintainability with cleaner architecture
2025-12-19 09:46:22 +08:00
fawney19
85fafeacb8 feat: add stream smoothing feature for improved user experience
- Implement StreamSmoother class to split large content chunks into smaller pieces with delay
- Support OpenAI, Claude, and Gemini API response formats for smooth playback
- Add stream smoothing configuration to system settings (enable, chunk size, delay)
- Create streamlined API for stream smoothing with StreamSmoothingConfig dataclass
- Add admin UI controls for configuring stream smoothing parameters
- Use batch configuration loading to minimize database queries
- Enable typing effect simulation for better user experience in streaming responses
2025-12-19 03:15:19 +08:00
fawney19
daf8b870f0 fix: include Dockerfile.base.local in dependency hash calculation
- Add Dockerfile.base.local to deps hash to detect Docker configuration changes
- Ensures deployment rebuilds when nginx proxy settings are modified
- Prevents stale Docker images from being reused after config changes
2025-12-19 02:38:46 +08:00
fawney19
880fb61c66 fix: disable gzip compression in nginx proxy configuration
- Add gzip off directive to prevent nginx from compressing proxied responses
- Ensures stream integrity for chunked transfer encoding
- Applies to both production and local Dockerfiles
2025-12-19 02:17:07 +08:00
fawney19
7e792dabfc refactor: use background task for client disconnection monitoring
- Replace time-based throttling with background task for disconnect checks
- Remove time.monotonic() and related throttling logic
- Prevent blocking of stream transmission during connection checks
- Properly clean up background task with try/finally block
- Improve throughput and responsiveness of stream processing
2025-12-19 01:59:56 +08:00
fawney19
cd06169b2f fix: detect OpenAI format stream completion via finish_reason
- Add detection of finish_reason in OpenAI API responses to mark stream completion
- Ensures OpenAI API streams are properly marked as complete even without explicit completion events
- Complements existing completion event detection for other API formats
2025-12-19 01:44:35 +08:00
fawney19
50ffd47546 fix: handle client disconnection after stream completion gracefully
- Check has_completion flag before marking client disconnection as failure
- Allow graceful termination if response already completed when client disconnects
- Change logging level to info for post-completion disconnections
- Prevent false error reporting when client closes connection after receiving full response
2025-12-19 01:36:20 +08:00
18 changed files with 931 additions and 44 deletions

View File

@@ -15,6 +15,8 @@ env:
REGISTRY: ghcr.io
BASE_IMAGE_NAME: fawney19/aether-base
APP_IMAGE_NAME: fawney19/aether
# Files that affect base image - used for hash calculation
BASE_FILES: "Dockerfile.base pyproject.toml frontend/package.json frontend/package-lock.json"
jobs:
check-base-changes:
@@ -23,8 +25,13 @@ jobs:
base_changed: ${{ steps.check.outputs.base_changed }}
steps:
- uses: actions/checkout@v4
- name: Log in to Container Registry
uses: docker/login-action@v3
with:
fetch-depth: 2
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Check if base image needs rebuild
id: check
@@ -34,10 +41,26 @@ jobs:
exit 0
fi
# Check if base-related files changed
if git diff --name-only HEAD~1 HEAD | grep -qE '^(Dockerfile\.base|pyproject\.toml|frontend/package.*\.json)$'; then
# Calculate current hash of base-related files
CURRENT_HASH=$(cat ${{ env.BASE_FILES }} 2>/dev/null | sha256sum | cut -d' ' -f1)
echo "Current base files hash: $CURRENT_HASH"
# Try to get hash label from remote image config
# Pull the image config and extract labels
REMOTE_HASH=""
if docker pull ${{ env.REGISTRY }}/${{ env.BASE_IMAGE_NAME }}:latest 2>/dev/null; then
REMOTE_HASH=$(docker inspect ${{ env.REGISTRY }}/${{ env.BASE_IMAGE_NAME }}:latest --format '{{ index .Config.Labels "org.opencontainers.image.base.hash" }}' 2>/dev/null) || true
fi
if [ -z "$REMOTE_HASH" ] || [ "$REMOTE_HASH" == "<no value>" ]; then
# No remote image or no hash label, need to rebuild
echo "No remote base image or hash label found, need rebuild"
echo "base_changed=true" >> $GITHUB_OUTPUT
elif [ "$CURRENT_HASH" != "$REMOTE_HASH" ]; then
echo "Hash mismatch: remote=$REMOTE_HASH, current=$CURRENT_HASH"
echo "base_changed=true" >> $GITHUB_OUTPUT
else
echo "Hash matches, no rebuild needed"
echo "base_changed=false" >> $GITHUB_OUTPUT
fi
@@ -61,6 +84,12 @@ jobs:
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Calculate base files hash
id: hash
run: |
HASH=$(cat ${{ env.BASE_FILES }} 2>/dev/null | sha256sum | cut -d' ' -f1)
echo "hash=$HASH" >> $GITHUB_OUTPUT
- name: Extract metadata for base image
id: meta
uses: docker/metadata-action@v5
@@ -69,6 +98,8 @@ jobs:
tags: |
type=raw,value=latest
type=sha,prefix=
labels: |
org.opencontainers.image.base.hash=${{ steps.hash.outputs.hash }}
- name: Build and push base image
uses: docker/build-push-action@v5

View File

@@ -19,7 +19,7 @@ RUN apt-get update && apt-get install -y \
# Python 依赖(安装到系统,不用 -e 模式)
COPY pyproject.toml README.md ./
RUN mkdir -p src && touch src/__init__.py && \
pip install --no-cache-dir .
SETUPTOOLS_SCM_PRETEND_VERSION=0.1.0 pip install --no-cache-dir .
# 前端依赖
COPY frontend/package*.json /tmp/frontend/
@@ -70,6 +70,8 @@ RUN printf '%s\n' \
' proxy_cache off;' \
' proxy_request_buffering off;' \
' chunked_transfer_encoding on;' \
' gzip off;' \
' add_header X-Accel-Buffering no;' \
' proxy_connect_timeout 600s;' \
' proxy_send_timeout 600s;' \
' proxy_read_timeout 600s;' \

View File

@@ -74,6 +74,8 @@ RUN printf '%s\n' \
' proxy_cache off;' \
' proxy_request_buffering off;' \
' chunked_transfer_encoding on;' \
' gzip off;' \
' add_header X-Accel-Buffering no;' \
' proxy_connect_timeout 600s;' \
' proxy_send_timeout 600s;' \
' proxy_read_timeout 600s;' \

View File

@@ -21,9 +21,9 @@ HASH_FILE=".deps-hash"
CODE_HASH_FILE=".code-hash"
MIGRATION_HASH_FILE=".migration-hash"
# 计算依赖文件的哈希值
# 计算依赖文件的哈希值(包含 Dockerfile.base.local
calc_deps_hash() {
cat pyproject.toml frontend/package.json frontend/package-lock.json 2>/dev/null | md5sum | cut -d' ' -f1
cat pyproject.toml frontend/package.json frontend/package-lock.json Dockerfile.base.local 2>/dev/null | md5sum | cut -d' ' -f1
}
# 计算代码文件的哈希值
@@ -162,25 +162,32 @@ git pull
# 标记是否需要重启
NEED_RESTART=false
BASE_REBUILT=false
# 检查基础镜像是否存在,或依赖是否变化
if ! docker image inspect aether-base:latest >/dev/null 2>&1; then
echo ">>> Base image not found, building..."
build_base
BASE_REBUILT=true
NEED_RESTART=true
elif check_deps_changed; then
echo ">>> Dependencies changed, rebuilding base image..."
build_base
BASE_REBUILT=true
NEED_RESTART=true
else
echo ">>> Dependencies unchanged."
fi
# 检查代码是否变化
# 检查代码是否变化,或者 base 重建了app 依赖 base
if ! docker image inspect aether-app:latest >/dev/null 2>&1; then
echo ">>> App image not found, building..."
build_app
NEED_RESTART=true
elif [ "$BASE_REBUILT" = true ]; then
echo ">>> Base image rebuilt, rebuilding app image..."
build_app
NEED_RESTART=true
elif check_code_changed; then
echo ">>> Code changed, rebuilding app image..."
build_app

View File

@@ -66,6 +66,7 @@ export interface UserAffinity {
key_name: string | null
key_prefix: string | null // Provider Key 脱敏显示前4...后4
rate_multiplier: number
global_model_id: string | null // 原始的 global_model_id用于删除
model_name: string | null // 模型名称(如 claude-haiku-4-5-20250514
model_display_name: string | null // 模型显示名称(如 Claude Haiku 4.5
api_format: string | null // API 格式 (claude/openai)
@@ -119,6 +120,18 @@ export const cacheApi = {
await api.delete(`/api/admin/monitoring/cache/users/${userIdentifier}`)
},
/**
* 清除单条缓存亲和性
*
* @param affinityKey API Key ID
* @param endpointId Endpoint ID
* @param modelId GlobalModel ID
* @param apiFormat API 格式 (claude/openai)
*/
async clearSingleAffinity(affinityKey: string, endpointId: string, modelId: string, apiFormat: string): Promise<void> {
await api.delete(`/api/admin/monitoring/cache/affinity/${affinityKey}/${endpointId}/${modelId}/${apiFormat}`)
},
/**
* 清除所有缓存
*/

View File

@@ -142,32 +142,37 @@ async function resetAffinitySearch() {
await fetchAffinityList()
}
async function clearUserCache(identifier: string, displayName?: string) {
const target = identifier?.trim()
if (!target) {
showError('无法识别标识符')
async function clearSingleAffinity(item: UserAffinity) {
const affinityKey = item.affinity_key?.trim()
const endpointId = item.endpoint_id?.trim()
const modelId = item.global_model_id?.trim()
const apiFormat = item.api_format?.trim()
if (!affinityKey || !endpointId || !modelId || !apiFormat) {
showError('缓存记录信息不完整,无法删除')
return
}
const label = displayName || target
const label = item.user_api_key_name || affinityKey
const modelLabel = item.model_display_name || item.model_name || modelId
const confirmed = await showConfirm({
title: '确认清除',
message: `确定要清除 ${label} 的缓存吗?`,
message: `确定要清除 ${label} 在模型 ${modelLabel} 上的缓存亲和性吗?`,
confirmText: '确认清除',
variant: 'destructive'
})
if (!confirmed) return
clearingRowAffinityKey.value = target
clearingRowAffinityKey.value = affinityKey
try {
await cacheApi.clearUserCache(target)
await cacheApi.clearSingleAffinity(affinityKey, endpointId, modelId, apiFormat)
showSuccess('清除成功')
await fetchCacheStats()
await fetchAffinityList(tableKeyword.value.trim() || undefined)
} catch (error) {
showError('清除失败')
log.error('清除用户缓存失败', error)
log.error('清除单条缓存失败', error)
} finally {
clearingRowAffinityKey.value = null
}
@@ -618,7 +623,7 @@ onBeforeUnmount(() => {
class="h-7 w-7 text-muted-foreground/70 hover:text-destructive"
:disabled="clearingRowAffinityKey === item.affinity_key"
title="清除缓存"
@click="clearUserCache(item.affinity_key, item.user_api_key_name || item.affinity_key)"
@click="clearSingleAffinity(item)"
>
<Trash2 class="h-3.5 w-3.5" />
</Button>
@@ -668,7 +673,7 @@ onBeforeUnmount(() => {
variant="ghost"
class="h-7 w-7 text-muted-foreground/70 hover:text-destructive shrink-0"
:disabled="clearingRowAffinityKey === item.affinity_key"
@click="clearUserCache(item.affinity_key, item.user_api_key_name || item.affinity_key)"
@click="clearSingleAffinity(item)"
>
<Trash2 class="h-3.5 w-3.5" />
</Button>

View File

@@ -464,6 +464,7 @@
</div>
</div>
</CardSection>
</div>
<!-- 导入配置对话框 -->

View File

@@ -186,6 +186,30 @@ async def clear_user_cache(
return await pipeline.run(adapter=adapter, http_request=request, db=db, mode=adapter.mode)
@router.delete("/affinity/{affinity_key}/{endpoint_id}/{model_id}/{api_format}")
async def clear_single_affinity(
affinity_key: str,
endpoint_id: str,
model_id: str,
api_format: str,
request: Request,
db: Session = Depends(get_db),
) -> Any:
"""
Clear a single cache affinity entry
Parameters:
- affinity_key: API Key ID
- endpoint_id: Endpoint ID
- model_id: Model ID (GlobalModel ID)
- api_format: API format (claude/openai)
"""
adapter = AdminClearSingleAffinityAdapter(
affinity_key=affinity_key, endpoint_id=endpoint_id, model_id=model_id, api_format=api_format
)
return await pipeline.run(adapter=adapter, http_request=request, db=db, mode=adapter.mode)
@router.delete("")
async def clear_all_cache(
request: Request,
@@ -655,6 +679,7 @@ class AdminListAffinitiesAdapter(AdminApiAdapter):
"key_name": key.name if key else None,
"key_prefix": provider_key_masked,
"rate_multiplier": key.rate_multiplier if key else 1.0,
"global_model_id": affinity.get("model_name"), # 原始的 global_model_id
"model_name": (
global_model_map.get(affinity.get("model_name")).name
if affinity.get("model_name") and global_model_map.get(affinity.get("model_name"))
@@ -817,6 +842,65 @@ class AdminClearUserCacheAdapter(AdminApiAdapter):
raise HTTPException(status_code=500, detail=f"清除失败: {exc}")
@dataclass
class AdminClearSingleAffinityAdapter(AdminApiAdapter):
affinity_key: str
endpoint_id: str
model_id: str
api_format: str
async def handle(self, context: ApiRequestContext) -> Dict[str, Any]: # type: ignore[override]
db = context.db
try:
redis_client = get_redis_client_sync()
affinity_mgr = await get_affinity_manager(redis_client)
# 直接获取指定的亲和性记录(无需遍历全部)
existing_affinity = await affinity_mgr.get_affinity(
self.affinity_key, self.api_format, self.model_id
)
if not existing_affinity:
raise HTTPException(status_code=404, detail="未找到指定的缓存亲和性记录")
# 验证 endpoint_id 是否匹配
if existing_affinity.endpoint_id != self.endpoint_id:
raise HTTPException(status_code=404, detail="未找到指定的缓存亲和性记录")
# 失效单条记录
await affinity_mgr.invalidate_affinity(
self.affinity_key, self.api_format, self.model_id, endpoint_id=self.endpoint_id
)
# 获取用于日志的信息
api_key = db.query(ApiKey).filter(ApiKey.id == self.affinity_key).first()
api_key_name = api_key.name if api_key else None
logger.info(
f"已清除单条缓存亲和性: affinity_key={self.affinity_key[:8]}..., "
f"endpoint_id={self.endpoint_id[:8]}..., model_id={self.model_id[:8]}..."
)
context.add_audit_metadata(
action="cache_clear_single",
affinity_key=self.affinity_key,
endpoint_id=self.endpoint_id,
model_id=self.model_id,
)
return {
"status": "ok",
"message": f"已清除缓存亲和性: {api_key_name or self.affinity_key[:8]}",
"affinity_key": self.affinity_key,
"endpoint_id": self.endpoint_id,
"model_id": self.model_id,
}
except HTTPException:
raise
except Exception as exc:
logger.exception(f"清除单条缓存亲和性失败: {exc}")
raise HTTPException(status_code=500, detail=f"清除失败: {exc}")
class AdminClearAllCacheAdapter(AdminApiAdapter):
async def handle(self, context: ApiRequestContext) -> Dict[str, Any]: # type: ignore[override]
try:

View File

@@ -9,6 +9,7 @@ from fastapi import APIRouter, Depends, Request
from sqlalchemy.orm import Session, joinedload
from src.api.base.admin_adapter import AdminApiAdapter
from src.api.base.models_service import invalidate_models_list_cache
from src.api.base.pipeline import ApiRequestPipeline
from src.core.exceptions import InvalidRequestException, NotFoundException
from src.core.logger import logger
@@ -419,4 +420,8 @@ class AdminBatchAssignModelsToProviderAdapter(AdminApiAdapter):
f"Batch assigned {len(success)} GlobalModels to provider {provider.name} by {context.user.username}"
)
# 清除 /v1/models 列表缓存
if success:
await invalidate_models_list_cache()
return BatchAssignModelsToProviderResponse(success=success, errors=errors)

View File

@@ -55,6 +55,23 @@ async def _set_cached_models(api_formats: list[str], models: list["ModelInfo"])
logger.warning(f"[ModelsService] 缓存写入失败: {e}")
async def invalidate_models_list_cache() -> None:
"""
清除所有 /v1/models 列表缓存
在模型创建、更新、删除时调用,确保模型列表实时更新
"""
# 清除所有格式的缓存
all_formats = ["CLAUDE", "OPENAI", "GEMINI"]
for fmt in all_formats:
cache_key = f"{_CACHE_KEY_PREFIX}:{fmt}"
try:
await CacheService.delete(cache_key)
logger.debug(f"[ModelsService] 已清除缓存: {cache_key}")
except Exception as e:
logger.warning(f"[ModelsService] 清除缓存失败 {cache_key}: {e}")
@dataclass
class ModelInfo:
"""统一的模型信息结构"""

View File

@@ -1114,6 +1114,8 @@ class CliMessageHandlerBase(BaseMessageHandler):
async for chunk in stream_generator:
yield chunk
except asyncio.CancelledError:
# 如果响应已完成,不标记为失败
if not ctx.has_completion:
ctx.status_code = 499
ctx.error_message = "Client disconnected"
raise

View File

@@ -0,0 +1,274 @@
"""
流式内容提取器 - 策略模式实现
为不同 API 格式OpenAI、Claude、Gemini提供内容提取和 chunk 构造的抽象。
StreamSmoother 使用这些提取器来处理不同格式的 SSE 事件。
"""
import copy
import json
from abc import ABC, abstractmethod
from typing import Optional
class ContentExtractor(ABC):
"""
流式内容提取器抽象基类
定义从 SSE 事件中提取文本内容和构造新 chunk 的接口。
每种 API 格式OpenAI、Claude、Gemini需要实现自己的提取器。
"""
@abstractmethod
def extract_content(self, data: dict) -> Optional[str]:
"""
从 SSE 数据中提取可拆分的文本内容
Args:
data: 解析后的 JSON 数据
Returns:
提取的文本内容,如果无法提取则返回 None
"""
pass
@abstractmethod
def create_chunk(
self,
original_data: dict,
new_content: str,
event_type: str = "",
is_first: bool = False,
) -> bytes:
"""
使用新内容构造 SSE chunk
Args:
original_data: 原始 JSON 数据
new_content: 新的文本内容
event_type: SSE 事件类型(某些格式需要)
is_first: 是否是第一个 chunk用于保留 role 等字段)
Returns:
编码后的 SSE 字节数据
"""
pass
class OpenAIContentExtractor(ContentExtractor):
"""
OpenAI 格式内容提取器
处理 OpenAI Chat Completions API 的流式响应格式:
- 数据结构: choices[0].delta.content
- 只在 delta 仅包含 role/content 时允许拆分,避免破坏 tool_calls 等结构
"""
def extract_content(self, data: dict) -> Optional[str]:
if not isinstance(data, dict):
return None
choices = data.get("choices")
if not isinstance(choices, list) or len(choices) != 1:
return None
first_choice = choices[0]
if not isinstance(first_choice, dict):
return None
delta = first_choice.get("delta")
if not isinstance(delta, dict):
return None
content = delta.get("content")
if not isinstance(content, str):
return None
# 只有 delta 仅包含 role/content 时才允许拆分
# 避免破坏 tool_calls、function_call 等复杂结构
allowed_keys = {"role", "content"}
if not all(key in allowed_keys for key in delta.keys()):
return None
return content
def create_chunk(
self,
original_data: dict,
new_content: str,
event_type: str = "",
is_first: bool = False,
) -> bytes:
new_data = original_data.copy()
if "choices" in new_data and new_data["choices"]:
new_choices = []
for choice in new_data["choices"]:
new_choice = choice.copy()
if "delta" in new_choice:
new_delta = {}
# 只有第一个 chunk 保留 role
if is_first and "role" in new_choice["delta"]:
new_delta["role"] = new_choice["delta"]["role"]
new_delta["content"] = new_content
new_choice["delta"] = new_delta
new_choices.append(new_choice)
new_data["choices"] = new_choices
return f"data: {json.dumps(new_data, ensure_ascii=False)}\n\n".encode("utf-8")
class ClaudeContentExtractor(ContentExtractor):
"""
Claude 格式内容提取器
处理 Claude Messages API 的流式响应格式:
- 事件类型: content_block_delta
- 数据结构: delta.type=text_delta, delta.text
"""
def extract_content(self, data: dict) -> Optional[str]:
if not isinstance(data, dict):
return None
# 检查事件类型
if data.get("type") != "content_block_delta":
return None
delta = data.get("delta", {})
if not isinstance(delta, dict):
return None
# 检查 delta 类型
if delta.get("type") != "text_delta":
return None
text = delta.get("text")
if not isinstance(text, str):
return None
return text
def create_chunk(
self,
original_data: dict,
new_content: str,
event_type: str = "",
is_first: bool = False,
) -> bytes:
new_data = original_data.copy()
if "delta" in new_data:
new_delta = new_data["delta"].copy()
new_delta["text"] = new_content
new_data["delta"] = new_delta
# Claude 格式需要 event: 前缀
event_name = event_type or "content_block_delta"
return f"event: {event_name}\ndata: {json.dumps(new_data, ensure_ascii=False)}\n\n".encode(
"utf-8"
)
class GeminiContentExtractor(ContentExtractor):
"""
Gemini 格式内容提取器
处理 Gemini API 的流式响应格式:
- 数据结构: candidates[0].content.parts[0].text
- 只有纯文本块才拆分
"""
def extract_content(self, data: dict) -> Optional[str]:
if not isinstance(data, dict):
return None
candidates = data.get("candidates")
if not isinstance(candidates, list) or len(candidates) != 1:
return None
first_candidate = candidates[0]
if not isinstance(first_candidate, dict):
return None
content = first_candidate.get("content", {})
if not isinstance(content, dict):
return None
parts = content.get("parts", [])
if not isinstance(parts, list) or len(parts) != 1:
return None
first_part = parts[0]
if not isinstance(first_part, dict):
return None
text = first_part.get("text")
# 只有纯文本块(只有 text 字段)才拆分
if not isinstance(text, str) or len(first_part) != 1:
return None
return text
def create_chunk(
self,
original_data: dict,
new_content: str,
event_type: str = "",
is_first: bool = False,
) -> bytes:
new_data = copy.deepcopy(original_data)
if "candidates" in new_data and new_data["candidates"]:
first_candidate = new_data["candidates"][0]
if "content" in first_candidate:
content = first_candidate["content"]
if "parts" in content and content["parts"]:
content["parts"][0]["text"] = new_content
return f"data: {json.dumps(new_data, ensure_ascii=False)}\n\n".encode("utf-8")
# 提取器注册表
_EXTRACTORS: dict[str, type[ContentExtractor]] = {
"openai": OpenAIContentExtractor,
"claude": ClaudeContentExtractor,
"gemini": GeminiContentExtractor,
}
def get_extractor(format_name: str) -> Optional[ContentExtractor]:
"""
根据格式名获取对应的内容提取器实例
Args:
format_name: 格式名称openai, claude, gemini
Returns:
对应的提取器实例,如果格式不支持则返回 None
"""
extractor_class = _EXTRACTORS.get(format_name.lower())
if extractor_class:
return extractor_class()
return None
def register_extractor(format_name: str, extractor_class: type[ContentExtractor]) -> None:
"""
注册新的内容提取器
Args:
format_name: 格式名称
extractor_class: 提取器类
"""
_EXTRACTORS[format_name.lower()] = extractor_class
def get_extractor_formats() -> list[str]:
"""
获取所有已注册的格式名称列表
Returns:
格式名称列表
"""
return list(_EXTRACTORS.keys())

View File

@@ -6,16 +6,22 @@
2. 响应流生成
3. 预读和嵌套错误检测
4. 客户端断开检测
5. 流式平滑输出
"""
import asyncio
import codecs
import json
import time
from dataclasses import dataclass
from typing import Any, AsyncGenerator, Callable, Optional
import httpx
from src.api.handlers.base.content_extractors import (
ContentExtractor,
get_extractor,
get_extractor_formats,
)
from src.api.handlers.base.parsers import get_parser_for_format
from src.api.handlers.base.response_parser import ResponseParser
from src.api.handlers.base.stream_context import StreamContext
@@ -25,11 +31,20 @@ from src.models.database import Provider, ProviderEndpoint
from src.utils.sse_parser import SSEEventParser
@dataclass
class StreamSmoothingConfig:
"""流式平滑输出配置"""
enabled: bool = False
chunk_size: int = 20
delay_ms: int = 8
class StreamProcessor:
"""
流式响应处理器
负责处理 SSE 流的解析、错误检测响应生成。
负责处理 SSE 流的解析、错误检测响应生成和平滑输出
从 ChatHandlerBase 中提取,使其职责更加单一。
"""
@@ -40,6 +55,7 @@ class StreamProcessor:
on_streaming_start: Optional[Callable[[], None]] = None,
*,
collect_text: bool = False,
smoothing_config: Optional[StreamSmoothingConfig] = None,
):
"""
初始化流处理器
@@ -48,11 +64,17 @@ class StreamProcessor:
request_id: 请求 ID用于日志
default_parser: 默认响应解析器
on_streaming_start: 流开始时的回调(用于更新状态)
collect_text: 是否收集文本内容
smoothing_config: 流式平滑输出配置
"""
self.request_id = request_id
self.default_parser = default_parser
self.on_streaming_start = on_streaming_start
self.collect_text = collect_text
self.smoothing_config = smoothing_config or StreamSmoothingConfig()
# 内容提取器缓存
self._extractors: dict[str, ContentExtractor] = {}
def get_parser_for_provider(self, ctx: StreamContext) -> ResponseParser:
"""
@@ -127,6 +149,13 @@ class StreamProcessor:
if event_type in ("response.completed", "message_stop"):
ctx.has_completion = True
# 检查 OpenAI 格式的 finish_reason
choices = data.get("choices", [])
if choices and isinstance(choices, list) and len(choices) > 0:
finish_reason = choices[0].get("finish_reason")
if finish_reason is not None:
ctx.has_completion = True
async def prefetch_and_check_error(
self,
byte_iterator: Any,
@@ -369,7 +398,7 @@ class StreamProcessor:
sse_parser: SSE 解析器
line: 原始行数据
"""
# SSEEventParser 以去掉换行符的单行文本作为输入;这里统一剔除 CR/LF
# SSEEventParser 以"去掉换行符"的单行文本作为输入;这里统一剔除 CR/LF
# 避免把空行误判成 "\n" 并导致事件边界解析错误。
normalized_line = line.rstrip("\r\n")
events = sse_parser.feed_line(normalized_line)
@@ -400,32 +429,201 @@ class StreamProcessor:
响应数据块
"""
try:
# 断连检查频率:每次 await 都会引入调度开销,过于频繁会让流式"发一段停一段"
# 这里按时间间隔节流,兼顾及时停止上游读取与吞吐平滑性。
next_disconnect_check_at = 0.0
disconnect_check_interval_s = 0.25
# 使用后台任务检查断连,完全不阻塞流式传输
disconnected = False
async for chunk in stream_generator:
now = time.monotonic()
if now >= next_disconnect_check_at:
next_disconnect_check_at = now + disconnect_check_interval_s
async def check_disconnect_background() -> None:
nonlocal disconnected
while not disconnected and not ctx.has_completion:
await asyncio.sleep(0.5)
if await is_disconnected():
logger.warning(f"ID:{self.request_id} | Client disconnected")
ctx.status_code = 499 # Client Closed Request
ctx.error_message = "client_disconnected"
disconnected = True
break
yield chunk
except asyncio.CancelledError:
# 启动后台检查任务
check_task = asyncio.create_task(check_disconnect_background())
try:
async for chunk in stream_generator:
if disconnected:
# 如果响应已完成,客户端断开不算失败
if ctx.has_completion:
logger.info(
f"ID:{self.request_id} | Client disconnected after completion"
)
else:
logger.warning(f"ID:{self.request_id} | Client disconnected")
ctx.status_code = 499
ctx.error_message = "client_disconnected"
break
yield chunk
finally:
check_task.cancel()
try:
await check_task
except asyncio.CancelledError:
pass
except asyncio.CancelledError:
# 如果响应已完成,不标记为失败
if not ctx.has_completion:
ctx.status_code = 499
ctx.error_message = "client_disconnected"
raise
except Exception as e:
ctx.status_code = 500
ctx.error_message = str(e)
raise
async def create_smoothed_stream(
self,
stream_generator: AsyncGenerator[bytes, None],
) -> AsyncGenerator[bytes, None]:
"""
创建平滑输出的流生成器
如果启用了平滑输出,将大 chunk 拆分成小块并添加微小延迟。
否则直接透传原始流。
Args:
stream_generator: 原始流生成器
Yields:
平滑处理后的响应数据块
"""
if not self.smoothing_config.enabled:
# 未启用平滑输出,直接透传
async for chunk in stream_generator:
yield chunk
return
# 启用平滑输出
buffer = b""
is_first_content = True
async for chunk in stream_generator:
buffer += chunk
# 按双换行分割 SSE 事件(标准 SSE 格式)
while b"\n\n" in buffer:
event_block, buffer = buffer.split(b"\n\n", 1)
event_str = event_block.decode("utf-8", errors="replace")
# 解析事件块
lines = event_str.strip().split("\n")
data_str = None
event_type = ""
for line in lines:
line = line.rstrip("\r")
if line.startswith("event: "):
event_type = line[7:].strip()
elif line.startswith("data: "):
data_str = line[6:]
# 没有 data 行,直接透传
if data_str is None:
yield event_block + b"\n\n"
continue
# [DONE] 直接透传
if data_str.strip() == "[DONE]":
yield event_block + b"\n\n"
continue
# 尝试解析 JSON
try:
data = json.loads(data_str)
except json.JSONDecodeError:
yield event_block + b"\n\n"
continue
# 检测格式并提取内容
content, extractor = self._detect_format_and_extract(data)
# 只有内容长度大于 1 才需要平滑处理
if content and len(content) > 1 and extractor:
# 获取配置的延迟
delay_seconds = self._calculate_delay()
# 拆分内容
content_chunks = self._split_content(content)
for i, sub_content in enumerate(content_chunks):
is_first = is_first_content and i == 0
# 使用提取器创建新 chunk
sse_chunk = extractor.create_chunk(
data,
sub_content,
event_type=event_type,
is_first=is_first,
)
yield sse_chunk
# 除了最后一个块,其他块之间加延迟
if i < len(content_chunks) - 1:
await asyncio.sleep(delay_seconds)
is_first_content = False
else:
# 不需要拆分,直接透传
yield event_block + b"\n\n"
if content:
is_first_content = False
# 处理剩余数据
if buffer:
yield buffer
def _get_extractor(self, format_name: str) -> Optional[ContentExtractor]:
"""获取或创建格式对应的提取器(带缓存)"""
if format_name not in self._extractors:
extractor = get_extractor(format_name)
if extractor:
self._extractors[format_name] = extractor
return self._extractors.get(format_name)
def _detect_format_and_extract(
self, data: dict
) -> tuple[Optional[str], Optional[ContentExtractor]]:
"""
检测数据格式并提取内容
依次尝试各格式的提取器,返回第一个成功提取内容的结果。
Returns:
(content, extractor): 提取的内容和对应的提取器
"""
for format_name in get_extractor_formats():
extractor = self._get_extractor(format_name)
if extractor:
content = extractor.extract_content(data)
if content is not None:
return content, extractor
return None, None
def _calculate_delay(self) -> float:
"""获取配置的延迟(秒)"""
return self.smoothing_config.delay_ms / 1000.0
def _split_content(self, content: str) -> list[str]:
"""
按块拆分文本
"""
chunk_size = self.smoothing_config.chunk_size
text_length = len(content)
if text_length <= chunk_size:
return [content]
# 按块拆分
chunks = []
for i in range(0, text_length, chunk_size):
chunks.append(content[i : i + chunk_size])
return chunks
async def _cleanup(
self,
response_ctx: Any,
@@ -440,3 +638,128 @@ class StreamProcessor:
await http_client.aclose()
except Exception:
pass
async def create_smoothed_stream(
stream_generator: AsyncGenerator[bytes, None],
chunk_size: int = 20,
delay_ms: int = 8,
) -> AsyncGenerator[bytes, None]:
"""
独立的平滑流生成函数
供 CLI handler 等场景使用,无需创建完整的 StreamProcessor 实例。
Args:
stream_generator: 原始流生成器
chunk_size: 每块字符数
delay_ms: 每块之间的延迟毫秒数
Yields:
平滑处理后的响应数据块
"""
processor = _LightweightSmoother(chunk_size=chunk_size, delay_ms=delay_ms)
async for chunk in processor.smooth(stream_generator):
yield chunk
class _LightweightSmoother:
"""
轻量级平滑处理器
只包含平滑输出所需的最小逻辑,不依赖 StreamProcessor 的其他功能。
"""
def __init__(self, chunk_size: int = 20, delay_ms: int = 8) -> None:
self.chunk_size = chunk_size
self.delay_ms = delay_ms
self._extractors: dict[str, ContentExtractor] = {}
def _get_extractor(self, format_name: str) -> Optional[ContentExtractor]:
if format_name not in self._extractors:
extractor = get_extractor(format_name)
if extractor:
self._extractors[format_name] = extractor
return self._extractors.get(format_name)
def _detect_format_and_extract(
self, data: dict
) -> tuple[Optional[str], Optional[ContentExtractor]]:
for format_name in get_extractor_formats():
extractor = self._get_extractor(format_name)
if extractor:
content = extractor.extract_content(data)
if content is not None:
return content, extractor
return None, None
def _calculate_delay(self) -> float:
return self.delay_ms / 1000.0
def _split_content(self, content: str) -> list[str]:
text_length = len(content)
if text_length <= self.chunk_size:
return [content]
return [content[i : i + self.chunk_size] for i in range(0, text_length, self.chunk_size)]
async def smooth(
self, stream_generator: AsyncGenerator[bytes, None]
) -> AsyncGenerator[bytes, None]:
buffer = b""
is_first_content = True
async for chunk in stream_generator:
buffer += chunk
while b"\n\n" in buffer:
event_block, buffer = buffer.split(b"\n\n", 1)
event_str = event_block.decode("utf-8", errors="replace")
lines = event_str.strip().split("\n")
data_str = None
event_type = ""
for line in lines:
line = line.rstrip("\r")
if line.startswith("event: "):
event_type = line[7:].strip()
elif line.startswith("data: "):
data_str = line[6:]
if data_str is None:
yield event_block + b"\n\n"
continue
if data_str.strip() == "[DONE]":
yield event_block + b"\n\n"
continue
try:
data = json.loads(data_str)
except json.JSONDecodeError:
yield event_block + b"\n\n"
continue
content, extractor = self._detect_format_and_extract(data)
if content and len(content) > 1 and extractor:
delay_seconds = self._calculate_delay()
content_chunks = self._split_content(content)
for i, sub_content in enumerate(content_chunks):
is_first = is_first_content and i == 0
sse_chunk = extractor.create_chunk(
data, sub_content, event_type=event_type, is_first=is_first
)
yield sse_chunk
if i < len(content_chunks) - 1:
await asyncio.sleep(delay_seconds)
is_first_content = False
else:
yield event_block + b"\n\n"
if content:
is_first_content = False
if buffer:
yield buffer

View File

@@ -10,8 +10,8 @@ class APIFormat(Enum):
"""API格式枚举 - 决定请求/响应的处理方式"""
CLAUDE = "CLAUDE" # Claude API 格式
OPENAI = "OPENAI" # OpenAI API 格式
CLAUDE_CLI = "CLAUDE_CLI" # Claude CLI API 格式(使用 authorization: Bearer
OPENAI = "OPENAI" # OpenAI API 格式
OPENAI_CLI = "OPENAI_CLI" # OpenAI CLI/Responses API 格式(用于 Claude Code 等客户端)
GEMINI = "GEMINI" # Google Gemini API 格式
GEMINI_CLI = "GEMINI_CLI" # Gemini CLI API 格式

View File

@@ -442,6 +442,36 @@ class EmbeddedErrorException(ProviderException):
self.error_status = error_status
class ProviderCompatibilityException(ProviderException):
"""Provider 兼容性错误异常 - 应该触发故障转移
用于处理因 Provider 不支持某些参数或功能导致的错误。
这类错误不是用户请求本身的问题,换一个 Provider 可能就能成功,应该触发故障转移。
常见场景:
- Unsupported parameter不支持的参数
- Unsupported model不支持的模型
- Unsupported feature不支持的功能
"""
def __init__(
self,
message: str,
provider_name: Optional[str] = None,
status_code: int = 400,
upstream_error: Optional[str] = None,
request_metadata: Optional[Any] = None,
):
self.upstream_error = upstream_error
super().__init__(
message=message,
provider_name=provider_name,
request_metadata=request_metadata,
)
# 覆盖状态码为 400保持与上游一致
self.status_code = status_code
class UpstreamClientException(ProxyException):
"""上游返回的客户端错误异常 - HTTP 4xx 错误,不应该重试

View File

@@ -13,6 +13,7 @@ from src.core.exceptions import InvalidRequestException, NotFoundException
from src.core.logger import logger
from src.models.api import ModelCreate, ModelResponse, ModelUpdate
from src.models.database import Model, Provider
from src.api.base.models_service import invalidate_models_list_cache
from src.services.cache.invalidation import get_cache_invalidation_service
from src.services.cache.model_cache import ModelCacheService
@@ -75,6 +76,10 @@ class ModelService:
)
logger.info(f"创建模型成功: provider={provider.name}, model={model.provider_model_name}, global_model_id={model.global_model_id}")
# 清除 /v1/models 列表缓存
asyncio.create_task(invalidate_models_list_cache())
return model
except IntegrityError as e:
@@ -197,6 +202,9 @@ class ModelService:
cache_service = get_cache_invalidation_service()
cache_service.on_model_changed(model.provider_id, model.global_model_id)
# 清除 /v1/models 列表缓存
asyncio.create_task(invalidate_models_list_cache())
logger.info(f"更新模型成功: id={model_id}, 最终 supports_vision: {model.supports_vision}, supports_function_calling: {model.supports_function_calling}, supports_extended_thinking: {model.supports_extended_thinking}")
return model
except IntegrityError as e:
@@ -261,6 +269,9 @@ class ModelService:
cache_service = get_cache_invalidation_service()
cache_service.on_model_changed(cache_info["provider_id"], cache_info["global_model_id"])
# 清除 /v1/models 列表缓存
asyncio.create_task(invalidate_models_list_cache())
logger.info(f"删除模型成功: id={model_id}, provider_model_name={cache_info['provider_model_name']}, "
f"global_model_id={cache_info['global_model_id'][:8] if cache_info['global_model_id'] else 'None'}...")
except Exception as e:
@@ -295,6 +306,9 @@ class ModelService:
cache_service = get_cache_invalidation_service()
cache_service.on_model_changed(model.provider_id, model.global_model_id)
# 清除 /v1/models 列表缓存
asyncio.create_task(invalidate_models_list_cache())
status = "可用" if is_available else "不可用"
logger.info(f"更新模型可用状态: id={model_id}, status={status}")
return model
@@ -358,6 +372,9 @@ class ModelService:
for model in created_models:
db.refresh(model)
logger.info(f"批量创建 {len(created_models)} 个模型成功")
# 清除 /v1/models 列表缓存
asyncio.create_task(invalidate_models_list_cache())
except IntegrityError as e:
db.rollback()
logger.error(f"批量创建模型失败: {str(e)}")

View File

@@ -15,6 +15,7 @@ from src.core.enums import APIFormat
from src.core.exceptions import (
ConcurrencyLimitError,
ProviderAuthException,
ProviderCompatibilityException,
ProviderException,
ProviderNotAvailableException,
ProviderRateLimitException,
@@ -81,7 +82,9 @@ class ErrorClassifier:
"context_length_exceeded", # 上下文长度超限
"content_length_limit", # 请求内容长度超限 (Claude API)
"content_length_exceeds", # 内容长度超限变体 (AWS CodeWhisperer)
"max_tokens", # token 数超限
# 注意:移除了 "max_tokens",因为 max_tokens 相关错误可能是 Provider 兼容性问题
# 如 "Unsupported parameter: 'max_tokens' is not supported with this model"
# 这类错误应由 COMPATIBILITY_ERROR_PATTERNS 处理
"invalid_prompt", # 无效的提示词
"content too long", # 内容过长
"input is too long", # 输入过长 (AWS)
@@ -136,6 +139,19 @@ class ErrorClassifier:
"CONTENT_POLICY_VIOLATION",
)
# Provider 兼容性错误模式 - 这类错误应该触发故障转移
# 因为换一个 Provider 可能就能成功
COMPATIBILITY_ERROR_PATTERNS: Tuple[str, ...] = (
"unsupported parameter", # 不支持的参数
"unsupported model", # 不支持的模型
"unsupported feature", # 不支持的功能
"not supported with this model", # 此模型不支持
"model does not support", # 模型不支持
"parameter is not supported", # 参数不支持
"feature is not supported", # 功能不支持
"not available for this model", # 此模型不可用
)
def _parse_error_response(self, error_text: Optional[str]) -> Dict[str, Any]:
"""
解析错误响应为结构化数据
@@ -261,6 +277,25 @@ class ErrorClassifier:
search_text = f"{parsed['message']} {parsed['raw']}".lower()
return any(pattern.lower() in search_text for pattern in self.CLIENT_ERROR_PATTERNS)
def _is_compatibility_error(self, error_text: Optional[str]) -> bool:
"""
检测错误响应是否为 Provider 兼容性错误(应触发故障转移)
这类错误是因为 Provider 不支持某些参数或功能导致的,
换一个 Provider 可能就能成功。
Args:
error_text: 错误响应文本
Returns:
是否为兼容性错误
"""
if not error_text:
return False
search_text = error_text.lower()
return any(pattern.lower() in search_text for pattern in self.COMPATIBILITY_ERROR_PATTERNS)
def _extract_error_message(self, error_text: Optional[str]) -> Optional[str]:
"""
从错误响应中提取错误消息
@@ -425,6 +460,16 @@ class ErrorClassifier:
),
)
# 400 错误:先检查是否为 Provider 兼容性错误(应触发故障转移)
if status == 400 and self._is_compatibility_error(error_response_text):
logger.info(f"检测到 Provider 兼容性错误,将触发故障转移: {extracted_message}")
return ProviderCompatibilityException(
message=extracted_message or "Provider 不支持此请求",
provider_name=provider_name,
status_code=400,
upstream_error=error_response_text,
)
# 400 错误:检查是否为客户端请求错误(不应重试)
if status == 400 and self._is_client_error(error_response_text):
logger.info(f"检测到客户端请求错误,不进行重试: {extracted_message}")

View File

@@ -12,7 +12,6 @@ from src.core.logger import logger
from src.models.database import Provider, SystemConfig
class LogLevel(str, Enum):
"""日志记录级别"""
@@ -94,6 +93,35 @@ class SystemConfigService:
return default
@classmethod
def get_configs(cls, db: Session, keys: List[str]) -> Dict[str, Any]:
"""
批量获取系统配置值
Args:
db: 数据库会话
keys: 配置键列表
Returns:
配置键值字典
"""
result = {}
# 一次查询获取所有配置
configs = db.query(SystemConfig).filter(SystemConfig.key.in_(keys)).all()
config_map = {c.key: c.value for c in configs}
# 填充结果,不存在的使用默认值
for key in keys:
if key in config_map:
result[key] = config_map[key]
elif key in cls.DEFAULT_CONFIGS:
result[key] = cls.DEFAULT_CONFIGS[key]["value"]
else:
result[key] = None
return result
@staticmethod
def set_config(db: Session, key: str, value: Any, description: str = None) -> SystemConfig:
"""设置系统配置值"""
@@ -111,6 +139,7 @@ class SystemConfigService:
db.commit()
db.refresh(config)
return config
@staticmethod
@@ -153,8 +182,8 @@ class SystemConfigService:
for config in configs
]
@staticmethod
def delete_config(db: Session, key: str) -> bool:
@classmethod
def delete_config(cls, db: Session, key: str) -> bool:
"""删除系统配置"""
config = db.query(SystemConfig).filter(SystemConfig.key == key).first()
if config: