Files
ragflow/web/src/utils/citation-utils.ts
Attili-sys 21bc1ab7ec Feature rtl support (#13118)
### What problem does this PR solve?

This PR adds comprehensive **Right-to-Left (RTL) language support**,
primarily targeting Arabic and other RTL scripts (Hebrew, Persian, Urdu,
etc.).

Previously, RTL content had multiple rendering issues:

- Incorrect sentence splitting for Arabic punctuation in citation logic
- Misaligned text in chat messages and markdown components  
- Improper positioning of blockquotes and “think” sections  
- Incorrect table alignment  
- Citation placement ambiguity in RTL prompts  
- UI layout inconsistencies when mixing LTR and RTL text  

This PR introduces backend and frontend improvements to properly detect,
render, and style RTL content while preserving existing LTR behavior.

#### Backend
- Updated sentence boundary regex in `rag/nlp/search.py` to include
Arabic punctuation:
  - `،` (comma)
  - `؛` (semicolon)
  - `؟` (question mark)
  - `۔` (Arabic full stop)
- Ensures citation insertion works correctly in RTL sentences.
- Updated citation prompt instructions to clarify citation placement
rules for RTL languages.

#### Frontend
- Introduced a new utility: `text-direction.ts`
  - Detects text direction based on Unicode ranges.
  - Supports Arabic, Hebrew, Syriac, Thaana, and related scripts.
  - Provides `getDirAttribute()` for automatic `dir` assignment.

- Applied dynamic `dir` attributes across:
  - Markdown rendering
  - Chat messages
  - Search results
  - Tables
  - Hover cards and reference popovers

- Added proper RTL styling in LESS:
  - Text alignment adjustments
  - Blockquote border flipping
  - Section indentation correction
  - Table direction switching
  - Use of `<bdi>` for figure labels to prevent bidirectional conflicts

#### DevOps / Environment
- Added Windows backend launch script with retry handling.
- Updated dependency metadata.
- Adjusted development-only React debugging behavior.

---

### Type of change

- [x] Bug Fix (non-breaking change which fixes RTL rendering and
citation issues)
- [x] New Feature (non-breaking change which adds RTL detection and
dynamic direction handling)

---------

Co-authored-by: 6ba3i <isbaaoui09@gmail.com>
Co-authored-by: Ahmad Intisar <ahmadintisar@Ahmads-MacBook-M4-Pro.local>
Co-authored-by: Ahmad Intisar <168020872+ahmadintisar@users.noreply.github.com>
Co-authored-by: Liu An <asiro@qq.com>
2026-03-02 13:03:44 +08:00

25 lines
801 B
TypeScript
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

export const normalizeCitationDigits = (text: string) => {
if (!text) return text;
return text.replace(/[٠-٩۰-۹]/g, (char) => {
const code = char.charCodeAt(0);
if (code >= 0x0660 && code <= 0x0669) {
return String.fromCharCode(code - 0x0660 + 0x30);
}
if (code >= 0x06f0 && code <= 0x06f9) {
return String.fromCharCode(code - 0x06f0 + 0x30);
}
return char;
});
};
export const parseCitationIndex = (value: string) => {
const normalized = normalizeCitationDigits(value);
const markerMatch = normalized.match(/\[(?:ID:)?(\d+)\]/);
if (markerMatch) return Number(markerMatch[1]);
if (/^\d+$/.test(normalized)) return Number(normalized);
return Number.NaN;
};
export const citationMarkerReg =
/\[(?:ID:)?([0-9\u0660-\u0669\u06F0-\u06F9]+)\]/g;