Files
next-ai-draw-io/lib
Dayuan Jiang b33e09be05 feat: add XML auto-fix with refined validation logic (#247)
* feat: add XML auto-fix and improve validator accuracy

- Add autoFixXml() to automatically repair common XML issues:
  - CDATA wrapper removal
  - Duplicate attribute removal
  - Unescaped & and < character escaping
  - Invalid entity reference fixing
  - Unclosed tag completion
  - Nested mxCell flattening
  - Duplicate ID renaming

- Improve validateMxCellStructure() with DOM + regex approach:
  - Use DOMParser for syntax error detection (94% recall)
  - Add regex checks for edge cases
  - Stateful parser for handling > in attribute values

- Integrate validateAndFixXml() in chat-message-display and diagram-context
  - Auto-repair invalid XML before loading
  - Log fixes applied for debugging

Metrics: 99.77% accuracy, 94.06% recall, 94.4% auto-fix success rate

* fix: improve XML auto-fix from 58.7% to 99% fix rate

Key improvements:
- Reorder CDATA removal to run before text-before-root check (+35 cases)
- Implement Gemini's backslash-quote fix with regex backreference
  Handles attr="value", value="text\"inner\"more", and mixed patterns
- Add aggressive drop-broken-cells fix for unfixable mxCell elements
  Iteratively removes cells causing DOM parse errors (up to 50)

Results on 9,411 XML dataset:
- 206 invalid XMLs detected
- 204 successfully fixed (99.0% fix rate)
- 2 unfixable (completely broken, need regeneration)

* refactor: extract XML validation/fix helpers and add constants

- Add constants: MAX_XML_SIZE (1MB), MAX_DROP_ITERATIONS (10), STRUCTURAL_ATTRS, VALID_ENTITIES
- Extract parseXmlTags helper for shared tag parsing logic
- Extract validation helpers: checkDuplicateAttributes, checkDuplicateIds, checkTagMismatches, checkCharacterReferences, checkEntityReferences, checkNestedMxCells
- Simplify validateMxCellStructure from ~200 lines to ~55 lines
- Add logging to empty catch block in DOMParser section
- Add size warning for large XML documents
- Remove unused variables (isSelfClose, duplicate idPattern)

* fix: improve XML auto-fix with malformed quote pattern

- Fix =&quot;...&quot; pattern where &quot; was used as delimiter instead of actual quotes
- Common in dashPattern attributes like dashPattern=&quot;1 1;&quot;
2025-12-13 23:31:01 +09:00
..