mirror of
https://github.com/DayuanJiang/next-ai-draw-io.git
synced 2026-01-02 14:22:28 +08:00
* feat: add XML auto-fix and improve validator accuracy - Add autoFixXml() to automatically repair common XML issues: - CDATA wrapper removal - Duplicate attribute removal - Unescaped & and < character escaping - Invalid entity reference fixing - Unclosed tag completion - Nested mxCell flattening - Duplicate ID renaming - Improve validateMxCellStructure() with DOM + regex approach: - Use DOMParser for syntax error detection (94% recall) - Add regex checks for edge cases - Stateful parser for handling > in attribute values - Integrate validateAndFixXml() in chat-message-display and diagram-context - Auto-repair invalid XML before loading - Log fixes applied for debugging Metrics: 99.77% accuracy, 94.06% recall, 94.4% auto-fix success rate * fix: improve XML auto-fix from 58.7% to 99% fix rate Key improvements: - Reorder CDATA removal to run before text-before-root check (+35 cases) - Implement Gemini's backslash-quote fix with regex backreference Handles attr="value", value="text\"inner\"more", and mixed patterns - Add aggressive drop-broken-cells fix for unfixable mxCell elements Iteratively removes cells causing DOM parse errors (up to 50) Results on 9,411 XML dataset: - 206 invalid XMLs detected - 204 successfully fixed (99.0% fix rate) - 2 unfixable (completely broken, need regeneration) * refactor: extract XML validation/fix helpers and add constants - Add constants: MAX_XML_SIZE (1MB), MAX_DROP_ITERATIONS (10), STRUCTURAL_ATTRS, VALID_ENTITIES - Extract parseXmlTags helper for shared tag parsing logic - Extract validation helpers: checkDuplicateAttributes, checkDuplicateIds, checkTagMismatches, checkCharacterReferences, checkEntityReferences, checkNestedMxCells - Simplify validateMxCellStructure from ~200 lines to ~55 lines - Add logging to empty catch block in DOMParser section - Add size warning for large XML documents - Remove unused variables (isSelfClose, duplicate idPattern) * fix: improve XML auto-fix with malformed quote pattern - Fix ="..." pattern where " was used as delimiter instead of actual quotes - Common in dashPattern attributes like dashPattern="1 1;"