mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-05-24 09:57:36 +08:00
## Problem
When using MinerU with `vlm-http-client` backend, the parser fails to
find the output files because they are located in a `vlm/` subdirectory,
but the `_read_output`
method doesn't check this location.
## Error Message
[ERROR]MinerU not found.
[MinerU] Missing output file, tried: ...
## Root Cause
The MinerU API with `vlm-http-client` backend returns output files in
the following structure:
output_dir/
vlm/
filename_content_list.json
filename.md
images/
However, the `_read_output` method in `mineru_parser.py` only checks:
1. `output_dir/filename_content_list.json`
2. `output_dir/sanitized_filename_content_list.json`
3. `output_dir/sanitized_filename/sanitized_filename_content_list.json`
It doesn't check the `vlm/` subdirectory.
## Solution
Added two additional fallback paths to check the `vlm/` subdirectory:
- `output_dir/vlm/filename_content_list.json`
- `output_dir/vlm/sanitized_filename_content_list.json`
## Testing
Tested with MinerU API using `vlm-http-client` backend. The parser now
successfully finds and processes the output files.
## Related
This issue occurs specifically when using:
- MinerU backend: `vlm-http-client`
- MinerU server URL configured for remote vLLM inference