mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-05-22 00:50:10 +08:00
## Summary This fixes a missing authorization check in the beta API document download endpoint: - **CWE:** CWE-862 (Missing Authorization) - **Severity:** Medium - **Affected route/file:** `GET /api/v1/documents/<document_id>` in `api/apps/sdk/doc.py` - **Data flow:** the route reads a bearer beta API token, resolves the token with `APIToken.query(beta=token)`, accepts `document_id` directly from the URL, loads the document with `DocumentService.query(id=document_id)`, and then fetches the backing object through `File2DocumentService.get_storage_address()` / `settings.STORAGE_IMPL.get()`. Before this change, that flow verified that the API token was valid, but it did not verify that the token's tenant owned the document's knowledge base. A caller with any valid beta API token and a known document ID could therefore reach storage for a document belonging to another tenant. ## Fix The endpoint now takes the tenant ID from the resolved API token and checks the document's knowledge base with: ```python KnowledgebaseService.query(id=doc[0].kb_id, tenant_id=tenant_id) ``` If the knowledge base is not owned by the token tenant, the request returns an access error before any storage lookup occurs. This mirrors the tenant-scoped ownership checks used by the dataset-scoped document download path and keeps the patch small. ## Tests Added unit coverage for `download_doc()` to assert that: - the beta token tenant ID is used in the knowledge-base ownership lookup; - cross-tenant access returns `You do not have access to this document.`; - storage resolution is not called before tenant authorization succeeds; - the existing same-tenant empty-file and successful-download paths still run after the authorization gate passes. I also verified the final patch is limited to `api/apps/sdk/doc.py` and the related document SDK route unit test. A local `pytest` invocation could not complete in this checkout because the shared test fixture attempts to log in to a RAGFlow server at `127.0.0.1:9380`, which was not running in the local environment. ## Security analysis This is exploitable when an attacker has a valid beta API token for their own tenant and obtains or guesses a document ID from another tenant. The token alone should not grant access to other tenants' files, but the direct document route previously authorized only the token itself and not the requested resource. The new tenant-scoped knowledge-base check binds the requested document back to the token tenant before storage is accessed, preventing cross-tenant document downloads through this endpoint. Before submitting, we attempted to disprove this by checking whether existing dataset-scoped routes, token validation, or framework protections already enforced ownership. They do not apply to this direct document-ID route: it bypassed the dataset path parameter and used only `DocumentService.query(id=document_id)` before reading storage. cc @lewiswigmore