Commit Graph

94 Commits

Author SHA1 Message Date
fawney19
97425ac68f refactor: make stream smoothing parameters configurable and add models cache invalidation
- Move stream smoothing parameters (chunk_size, delay_ms) to database config
- Remove hardcoded stream smoothing constants from StreamProcessor
- Simplify dynamic delay calculation by using config values directly
- Add invalidate_models_list_cache() function to clear /v1/models endpoint cache
- Call cache invalidation on model create, update, delete, and bulk operations
- Update admin UI to allow runtime configuration of smoothing parameters
- Improve model listing freshness when models are modified
2025-12-19 11:03:46 +08:00
fawney19
912f6643e2 tune: adjust stream smoothing parameters for better user experience
- Increase chunk size from 5 to 20 characters for fewer delays
- Reduce min delay from 15ms to 8ms for faster playback
- Reduce max delay from 24ms to 15ms for better responsiveness
- Adjust text thresholds to better differentiate content types
- Apply parameter tuning to both StreamProcessor and _LightweightSmoother
2025-12-19 09:51:09 +08:00
fawney19
6c0373fda6 refactor: simplify text splitting logic in stream processor
- Remove complex conditional logic for short/medium/long text differentiation
- Unify text splitting to always use consistent CHUNK_SIZE-based splitting
- Rely on dynamic delay calculation for output speed adjustment
- Reduce code complexity in both main smoother and lightweight smoother
2025-12-19 09:48:11 +08:00
fawney19
070121717d refactor: consolidate stream smoothing into StreamProcessor with intelligent timing
- Move StreamSmoother functionality directly into StreamProcessor for better integration
- Create ContentExtractor strategy pattern for format-agnostic content extraction
- Implement intelligent dynamic delay calculation based on text length
- Support three text length tiers: short (char-by-char), medium (chunked), long (chunked)
- Remove manual chunk_size and delay_ms configuration - now auto-calculated
- Simplify admin UI to single toggle switch with auto timing adjustment
- Extract format detection logic to reusable content_extractors module
- Improve code maintainability with cleaner architecture
2025-12-19 09:46:22 +08:00
fawney19
85fafeacb8 feat: add stream smoothing feature for improved user experience
- Implement StreamSmoother class to split large content chunks into smaller pieces with delay
- Support OpenAI, Claude, and Gemini API response formats for smooth playback
- Add stream smoothing configuration to system settings (enable, chunk size, delay)
- Create streamlined API for stream smoothing with StreamSmoothingConfig dataclass
- Add admin UI controls for configuring stream smoothing parameters
- Use batch configuration loading to minimize database queries
- Enable typing effect simulation for better user experience in streaming responses
2025-12-19 03:15:19 +08:00
fawney19
7e792dabfc refactor: use background task for client disconnection monitoring
- Replace time-based throttling with background task for disconnect checks
- Remove time.monotonic() and related throttling logic
- Prevent blocking of stream transmission during connection checks
- Properly clean up background task with try/finally block
- Improve throughput and responsiveness of stream processing
2025-12-19 01:59:56 +08:00
fawney19
cd06169b2f fix: detect OpenAI format stream completion via finish_reason
- Add detection of finish_reason in OpenAI API responses to mark stream completion
- Ensures OpenAI API streams are properly marked as complete even without explicit completion events
- Complements existing completion event detection for other API formats
2025-12-19 01:44:35 +08:00
fawney19
50ffd47546 fix: handle client disconnection after stream completion gracefully
- Check has_completion flag before marking client disconnection as failure
- Allow graceful termination if response already completed when client disconnects
- Change logging level to info for post-completion disconnections
- Prevent false error reporting when client closes connection after receiving full response
2025-12-19 01:36:20 +08:00
fawney19
5f0c1fb347 refactor: remove unused response normalizer module
- Delete unused ResponseNormalizer class and its initialization logic
- Remove response_normalizer and enable_response_normalization parameters from handlers
- Simplify chat adapter base initialization by removing normalizer setup
- Clean up unused imports in handler modules
2025-12-19 01:20:30 +08:00
fawney19
7b932d7afb refactor: optimize middleware with pure ASGI implementation and enhance security measures
- Replace BaseHTTPMiddleware with pure ASGI implementation in plugin middleware for better streaming response handling
- Add trusted proxy count configuration for client IP extraction in reverse proxy environments
- Implement audit log cleanup scheduler with configurable retention period
- Replace plaintext token logging with SHA256 hash fingerprints for security
- Fix database session lifecycle management in middleware
- Improve request tracing and error logging throughout the system
- Add comprehensive tests for pipeline architecture
2025-12-18 19:07:20 +08:00
fawney19
293bb592dc fix: enhance proxy configuration with password preservation and UI improvements
- Add 'enabled' field to ProxyConfig for preserving config when disabled
- Mask proxy password in API responses (return '***' instead of actual password)
- Preserve existing password on update when new password not provided
- Add URL encoding for proxy credentials (handle special chars like @, :, /)
- Enhanced URL validation: block SOCKS4, require valid host, forbid embedded auth
- UI improvements: use Switch component, dynamic password placeholder
- Add confirmation dialog for orphaned credentials (URL empty but has username/password)
- Prevent browser password autofill with randomized IDs and CSS text-security
- Unify ProxyConfig type definition in types.ts
2025-12-18 16:14:37 +08:00
fawney19
3e50c157be feat: add HTTP/SOCKS5 proxy support for API endpoints
- Add proxy field to ProviderEndpoint database model with migration
- Add ProxyConfig Pydantic model for proxy URL validation
- Extend HTTP client pool with create_client_with_proxy method
- Integrate proxy configuration in chat_handler_base.py and cli_handler_base.py
- Update admin API endpoints to support proxy configuration CRUD
- Add proxy configuration UI in frontend EndpointFormDialog

Fixes #28
2025-12-18 14:46:47 +08:00
fawney19
21587449c8 fix: improve error classification and logging system
- Enhance error classifier to properly handle API key failures with fallback support
- Add error reason/code parsing for better AWS and multi-provider compatibility
- Improve error message structure detection for non-standard formats
- Refactor file logging with size-based rotation (100MB) instead of daily
- Optimize production logging by disabling backtrace and diagnose
- Clean up model validation and remove redundant configurations
2025-12-18 10:57:31 +08:00
fawney19
3d0ab353d3 refactor: migrate Pydantic Config to v2 ConfigDict 2025-12-18 02:20:53 +08:00
fawney19
b2a857c164 refactor: consolidate transaction management and remove legacy modules
- Remove unused context.py module (replaced by request.state)
- Remove provider_cache.py (no longer needed)
- Unify environment loading in config/settings.py instead of __init__.py
- Add deprecation warning for get_async_db() (consolidating on sync Session)
- Enhance database.py documentation with comprehensive transaction strategy
- Simplify audit logging to reuse request-level Session (no separate connections)
- Extract UsageService._build_usage_params() helper to reduce code duplication
- Update model and user cache implementations with refined transaction handling
- Remove unnecessary sessionmaker from pipeline
- Clean up audit service exception handling
2025-12-18 01:59:40 +08:00
fawney19
4d1d863916 refactor: improve authentication and user data handling
- Replace user cache queries with direct database queries to ensure data consistency
- Fix token_type parameter in verify_token calls (access token verification)
- Fix role-based permission check using dictionary ranking instead of string comparison
- Fix logout operation to use correct JWT claim name (user_id instead of sub)
- Simplify user authentication flow by removing unnecessary cache layer
- Optimize session initialization in main.py using create_session helper
- Remove unused imports and exception variables
2025-12-18 01:09:22 +08:00
fawney19
b579420690 refactor: optimize database session lifecycle and middleware architecture
- Improve database pool capacity logging with detailed configuration parameters
- Optimize database session dependency injection with middleware-managed lifecycle
- Simplify plugin middleware by delegating session creation to FastAPI dependencies
- Fix import path in auth routes (relative to absolute)
- Add safety checks for database session management across middleware exception handlers
- Ensure session cleanup only when not managed by middleware (avoid premature cleanup)
2025-12-18 00:35:46 +08:00
fawney19
9d5c84f9d3 refactor: add scheduling mode support and optimize system settings UI
- Add fixed_order and cache_affinity scheduling modes to CacheAwareScheduler
- Only apply cache affinity in cache_affinity mode; use fixed order otherwise
- Simplify Dialog components with title/description props
- Remove unnecessary button shadows in SystemSettings
- Optimize import dialog UI structure
- Update ModelAliasesTab shadow styling
- Fix fallback orchestrator type hints
- Add scheduling_mode configuration in system config
2025-12-17 19:15:08 +08:00
fawney19
1dac4cb156 refactor: optimize provider query and stats aggregation logic 2025-12-17 16:41:10 +08:00
fawney19
d24c3885ab feat(admin): add config and user data import/export functionality
Add comprehensive import/export endpoints for:
- Provider and model configuration (with key decryption for export)
- User data and API keys (preserving encrypted data)

Includes merge modes (skip/overwrite/error) for conflict handling,
10MB size limit for imports, and automatic cache invalidation.

Also fix optional field in GlobalModelResponse tiered_pricing.
2025-12-16 18:33:14 +08:00
fawney19
46ff5a1a50 refactor(models): enhance model management with official provider marking and extended metadata
- Add OFFICIAL_PROVIDERS set to mark first-party vendors in models.dev
- Implement official provider marking function with cache compatibility
- Extend model metadata with family, context_limit, output_limit fields
- Improve frontend model selection UI with wider panel and better search
- Add dark mode support for provider logos
- Optimize scrollbar styling for model lists
- Update deployment documentation with clearer migration steps
2025-12-16 17:28:40 +08:00
fawney19
33265b4b13 refactor(global-model): migrate model metadata to flexible config structure
将模型配置从多个固定字段(description, official_url, icon_url, default_supports_* 等)
统一为灵活的 config JSON 字段,提高扩展性。同时优化前端模型创建表单,支持从 models-dev
列表直接选择模型快速填充。

主要变更:
- 后端:模型表迁移,支持 config JSON 存储模型能力和元信息
- 前端:GlobalModelFormDialog 支持两种创建方式(列表选择/手动填写)
- API 类型更新,对齐新的数据结构
2025-12-16 12:21:21 +08:00
fawney19
4e2ba0e57f feat(usage): add first_byte_time_ms tracking to usage statistics
- Enhance usage service to capture and store first byte latency metrics
- Update usage API routes to include new timing information
2025-12-16 02:39:36 +08:00
fawney19
a3df41d63d refactor(cli-handler): improve stream handling and response processing
- Refactor CLI handler base for better stream context management
- Optimize request/response handling for Claude, OpenAI, and Gemini CLI adapters
- Enhance telemetry tracking across CLI handlers
2025-12-16 02:39:20 +08:00
fawney19
ad1c8c394c refactor(handler): optimize stream processing and telemetry pipeline
- Enhance stream context for better token and latency tracking
- Refactor stream processor for improved performance metrics
- Improve telemetry integration with first_byte_time_ms support
- Add comprehensive stream context unit tests
2025-12-16 02:39:03 +08:00
fawney19
f3a69a6160 refactor(handler): implement defensive token update strategy and extract cache creation token utility
- Add extract_cache_creation_tokens utility to handle new/old cache creation token formats
- Implement defensive update strategy in StreamContext to prevent zero values overwriting valid data
- Simplify cache creation token parsing in Claude handler using new utility
- Add comprehensive test suite for cache creation token extraction
- Improve type hints in handler classes
2025-12-16 00:02:49 +08:00
fawney19
cf67160821 feat(cache): enhance cache monitoring endpoints and handler integrations 2025-12-15 23:12:48 +08:00
fawney19
f2cd96c34c feat(api): add model mapping cache management endpoints 2025-12-15 20:39:51 +08:00
fawney19
88e37594cf refactor(backend): update handlers, utilities and core modules after models restructure 2025-12-15 14:30:53 +08:00
fawney19
56fb6bf36c refactor(backend): update model catalog and provider APIs after mappings removal 2025-12-15 14:30:10 +08:00
fawney19
728f9bb126 refactor(backend): remove model mappings module 2025-12-15 14:30:00 +08:00
fawney19
beae7a2616 feat(api): add unified Models API endpoint
- Add models_service.py with model query logic and caching
- Add models.py unified endpoint supporting Claude/OpenAI/Gemini formats
- Auto-detect API format based on request headers
- Support /v1/models and /v1beta/models (Gemini) paths
- Update route registration and comments
2025-12-14 20:01:19 +08:00
fawney19
393d4d13ff fix(system): fix timezone handling in dashboard and stats services
- Use app timezone instead of UTC for date calculations in dashboard routes
- Ensure consistency between stats_daily.date and timezone-aware comparisons
- Fix date calculations in cleanup scheduler to handle DST correctly
- Update log message in stats aggregator to use business date
2025-12-13 23:50:59 +08:00
fawney19
a73e0d51db fix: 修复链路追踪密钥显示和默认选中逻辑
1. 修复密钥脱敏显示问题:先解密再脱敏,避免显示加密后的 base64 数据
2. 优化详情默认选中逻辑:优先显示最后一个有效结果(成功/失败),而非未执行/跳过
2025-12-12 18:15:46 +08:00
fawney19
53bf74429e refactor: 重构流式处理模块,提取 StreamContext/Processor/Telemetry
- 将 chat_handler_base.py 中的流式处理逻辑拆分为三个独立模块:
  - StreamContext: 类型安全的流式上下文数据类,替代原有的 ctx dict
  - StreamProcessor: SSE 解析、预读、嵌套错误检测
  - StreamTelemetryRecorder: 统计记录(Usage/Audit/Candidate)
- 将硬编码配置外置到 settings.py,支持环境变量覆盖:
  - HTTP 超时配置(connect/write/pool)
  - 流式处理配置(预读行数、统计延迟)
  - 并发控制配置(槽位 TTL、缓存预留比例)
2025-12-12 15:42:45 +08:00
fawney19
859c699e90 fix: 调整 interval-timeline 接口 limit 上限
- 管理员接口 limit 上限从 5000 调整为 50000
- 用户接口 limit 上限从 5000 调整为 20000
- 默认 hours 从 168 改为 24
2025-12-11 19:39:51 +08:00
fawney19
abc41c7d3c feat: 添加缓存监控和使用量统计 API 端点 2025-12-11 17:47:59 +08:00
fawney19
22ea0e245d refactor: 统一响应解析中的嵌套错误检测逻辑
- 提取 _check_nested_error 函数处理多种错误格式
- 支持检测顶层 error、type=error 以及 chunks 内嵌套的错误
- 简化 OpenAIResponseParser 和 ClaudeResponseParser 中的错误处理代码
- 提高代码复用性和可维护性
2025-12-11 11:33:07 +08:00
fawney19
8f914d89bb fix: 增加写入超时时间支持大请求体
- 将 chat_handler_base 的写入超时从 10 秒增加到 60 秒
- 将 cli_handler_base 的写入超时从 10 秒增加到 60 秒
- 将 http_client 的写入超时从 10 秒增加到 60 秒
- 支持包含大量数据(如图片)的长对话请求
2025-12-11 11:21:46 +08:00
fawney19
d6994316f1 fix: 修复失败请求筛选兼容旧数据
失败请求筛选同时考虑新旧两种判断方式:
- 新方式:status = "failed"
- 旧方式:status_code >= 400 或 error_message 不为空
2025-12-11 10:52:12 +08:00
fawney19
323a514f77 refactor: 优化活跃请求状态查询逻辑
- 重命名 get_active_requests 为 get_active_requests_status
- 支持从端点配置读取超时时间
- 新增 content_length_limit 错误类型
2025-12-11 10:45:06 +08:00
fawney19
0474f63403 refactor: 完善 handler 基类类型注解和流式状态更新
- 为 BaseMessageHandler 和 MessageTelemetry 添加完整类型注解
- 新增 _update_usage_to_streaming 方法,异步更新 Usage 状态为 streaming
- 优化 chat/cli handler 的类型提示,提升代码可维护性
- 修复类型检查警告,确保 mypy 通过
2025-12-11 10:05:06 +08:00
fawney19
913a87d7f3 refactor: 重构活跃请求查询逻辑到 UsageService
- 在 UsageService 新增 get_active_requests 方法,统一处理活跃请求查询
- 支持自动清理超时的 pending 请求(默认 5 分钟)
- admin 和 user 接口均复用该方法,减少重复代码
- 支持按 ID 列表查询或查询所有活跃请求
2025-12-11 10:04:15 +08:00
fawney19
f784106826 Initial commit 2025-12-10 20:52:44 +08:00