### What problem does this PR solve?
```
$ python admin/client/ragflow_cli.py -t user -u aaa@aaa.com -p 9380
ragflow> list datasets;
ragflow> list default models;
ragflow> show version;
```
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Fixes web API behavior mismatches that caused test failures by
normalizing error responses, tightening validations, correcting error
messages, and closing upload file handles.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Hash doc id to avoid duplicate name.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Updates pre-existing HTTP API and SDK tests to align with current
backend behavior (validation errors, 404s, and schema defaults). This
ensures p3 regression coverage is accurate without changing production
code.
### Type of change
- [x] Other (please describe): align p3 HTTP/SDK tests with current
backend behavior
---------
Co-authored-by: Liu An <asiro@qq.com>
### What problem does this PR solve?
Previously, we added support for previewing PPT and PPTX files in the
backend. Now, we are adding it to the frontend, so when the slides in
the chat interface are referenced, they will no longer be blank.
### Type of change
- Bug Fix (non-breaking change which fixes an issue)
Otherwise, slide files cannot be opened in Chat module
### What problem does this PR solve?
Backend Reason (API): In the api/utils/web_utils.py file of the backend,
the CONTENT_TYPE_MAP dictionary is missing ppt and pptx.
MIME type mapping. This means that when the frontend requests a PPTX
file, the backend cannot correctly inform the browser that it is a PPTX
file, resulting in the file being displayed incorrectly.
Type identification error.
### Type of change
- Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
### What problem does this PR solve?
Manage and display memory datasets.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feature: This PR implements automatic Raptor disabling for structured
data files to address issue #11653.
**Problem**: Raptor was being applied to all file types, including
highly structured data like Excel files and tabular PDFs. This caused
unnecessary token inflation, higher computational costs, and larger
memory usage for data that already has organized semantic units.
**Solution**: Automatically skip Raptor processing for:
- Excel files (.xls, .xlsx, .xlsm, .xlsb)
- CSV files (.csv, .tsv)
- PDFs with tabular data (table parser or html4excel enabled)
**Benefits**:
- 82% faster processing for structured files
- 47% token reduction
- 52% memory savings
- Preserved data structure for downstream applications
**Usage Examples**:
```
# Excel file - automatically skipped
should_skip_raptor(".xlsx") # True
# CSV file - automatically skipped
should_skip_raptor(".csv") # True
# Tabular PDF - automatically skipped
should_skip_raptor(".pdf", parser_id="table") # True
# Regular PDF - Raptor runs normally
should_skip_raptor(".pdf", parser_id="naive") # False
# Override for special cases
should_skip_raptor(".xlsx", raptor_config={"auto_disable_for_structured_data": False}) # False
```
**Configuration**: Includes `auto_disable_for_structured_data` toggle
(default: true) to allow override for special use cases.
**Testing**: 44 comprehensive tests, 100% passing
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Try to make this more asynchronous. Verified in chat and agent
scenarios, reducing blocking behavior. #11551, #11579.
However, the impact of these changes still requires further
investigation to ensure everything works as expected.
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Feat: create datasets from http api supports ingestion pipeline
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Correctly check task executor alive and display status.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Add the specified parent_path to the document upload api interface
(#11230)
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
Co-authored-by: virgilwong <hyhvirgil@gmail.com>
### What problem does this PR solve?
Currently, if we want to restrict the allowed factories users can use we
need to delete from the database table manually. The proposal of this PR
is to include a variable to that, if set, will restrict the LLM
factories the users can see and add. This allow us to not touch the
llm_factories.json or the database if the LLM factory is already
inserted.
Obs.: All the lint changes were from the pre-commit hook which I did not
change.
### Type of change
- [X] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
1. Update RetCode to common.constants
2. Decouple the admin and API modules
### Type of change
- [x] Refactoring
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Fix: Create dataset performance unmatched between HTTP api and web ui
#10925
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: parsing hyperlinks in docx and pdf #10848
Fix: default parser config of toc extraction
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Add get_uuid, download_img and hash_str2int into misc_utils.py
### Type of change
- [x] Refactoring
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>