ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-03-20 06:07:41 +08:00

Author	SHA1	Message	Date
Daniil Sivak	60ad32a0c2	Feat: support epub parsing (#13650 ) Closes #1398 ### What problem does this PR solve? Adds native support for EPUB files. EPUB content is extracted in spine (reading) order and parsed using the existing HTML parser. No new dependencies required. ### Type of change - [x] New Feature (non-breaking change which adds functionality) To check this parser manually: ```python uv run --python 3.12 python -c " from deepdoc.parser import EpubParser with open('$HOME/some_epub_book.epub', 'rb') as f: data = f.read() sections = EpubParser()(None, binary=data, chunk_token_num=512) print(f'Got {len(sections)} sections') for i, s in enumerate(sections[:5]): print(f'\n--- Section {i} ---') print(s[:200]) " ```	2026-03-17 20:14:06 +08:00
Yongteng Lei	382458ace7	Feat: advanced markdown parsing (#9607 ) ### What problem does this PR solve? Using AST parsing to handle markdown more accurately, preventing components from being cut off by chunking. #9564 <img width="1746" height="993" alt="image" src="https://github.com/user-attachments/assets/4aaf4bf6-5714-4d48-a9cf-864f59633f7f" /> <img width="1739" height="982" alt="image" src="https://github.com/user-attachments/assets/dc00233f-7a55-434f-bbb7-74ce7f57a6cf" /> <img width="559" height="100" alt="image" src="https://github.com/user-attachments/assets/4a556b5b-d9c6-4544-a486-8ac342bd504e" /> ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-08-21 09:36:18 +08:00
Jin Hai	3894de895b	Update comments (#4569 ) ### What problem does this PR solve? Add license statement. ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-01-21 20:52:28 +08:00
Zhichang Yu	0d68a6cd1b	Fix errors detected by Ruff (#3918 ) ### What problem does this PR solve? Fix errors detected by Ruff ### Type of change - [x] Refactoring	2024-12-08 14:21:12 +08:00
黄腾	ede733e130	add support for eml file parser (#1768 ) ### What problem does this PR solve? add support for eml file parser #1363 ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: Zhedong Cen <cenzhedong2@126.com> Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2024-08-06 16:42:14 +08:00
Zhedong Cen	a95c1d45f0	Support table for markdown file in general parser (#1278 ) ### What problem does this PR solve? Support extracting table for markdown file in general parser ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-06-27 14:38:35 +08:00
Wang Baoling	18f4a6b35c	feat: support json file (#1217 ) ### What problem does this PR solve? feat: support json file. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: KevinHuSh <kevinhu.sh@gmail.com>	2024-06-21 10:42:29 +08:00
Jin Hai	cdea1d0a85	Update readme and add license (#1018 ) ### What problem does this PR solve? - Update readme - Add license ### Type of change - [x] Documentation Update --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2024-06-01 16:24:10 +08:00
Zhedong Cen	8dd45459be	Add support for HTML file (#973 ) ### What problem does this PR solve? Add support for HTML file ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2024-05-30 09:12:55 +08:00
KevinHuSh	9d60a84958	refactor code (#583 ) ### What problem does this PR solve? ### Type of change - [x] Refactoring	2024-04-28 13:19:54 +08:00
KevinHuSh	fd7fcb5baf	apply pep8 formalize (#155 )	2024-03-27 11:33:46 +08:00
KevinHuSh	f6aee7f230	add use layout or not option (#145 ) * add use layout or not option * trival	2024-03-22 19:21:09 +08:00
KevinHuSh	7fd1eca582	init README of deepdoc, add picture processer. (#71 ) * init README of deepdoc, add picture processer. * add resume parsing	2024-02-23 18:28:12 +08:00
KevinHuSh	cacd36c5e1	use onnx models, new deepdoc (#68 )	2024-02-21 16:32:38 +08:00

14 Commits