Commit Graph

10 Commits

Author SHA1 Message Date
d1afcc9e71 feat(seafile): add library and directory sync scope support (#13153)
### What problem does this PR solve?

The SeaFile connector currently synchronises the entire account — every
library
visible to the authenticated user. This is impractical for users who
only need
a subset of their data indexed, especially on large SeaFile instances
with many
shared libraries.

This PR introduces granular sync scope support, allowing users to choose
between
syncing their entire account, a single library, or a specific directory
within a
library. It also adds support for SeaFile library-scoped API tokens
(`/api/v2.1/via-repo-token/` endpoints), enabling tighter access control
without
exposing account-level credentials.


### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

### Test

```
from seafile_connector import SeaFileConnector
import logging
import os

logging.basicConfig(level=logging.DEBUG)

URL = os.environ.get("SEAFILE_URL", "https://seafile.example.com")
TOKEN = os.environ.get("SEAFILE_TOKEN", "")
REPO_ID = os.environ.get("SEAFILE_REPO_ID", "")
SYNC_PATH = os.environ.get("SEAFILE_SYNC_PATH", "/Documents")
REPO_TOKEN = os.environ.get("SEAFILE_REPO_TOKEN", "")

def _test_scope(scope, repo_id=None, sync_path=None):
    print(f"\n{'='*50}")
    print(f"Testing scope: {scope}")
    print(f"{'='*50}")

    creds = {"seafile_token": TOKEN} if TOKEN else {}
    if REPO_TOKEN and scope in ("library", "directory"):
        creds["repo_token"] = REPO_TOKEN

    connector = SeaFileConnector(
        seafile_url=URL,
        batch_size=5,
        sync_scope=scope,
        include_shared = False,
        repo_id=repo_id,
        sync_path=sync_path,
    )
    connector.load_credentials(creds)
    connector.validate_connector_settings()

    count = 0
    for batch in connector.load_from_state():
        for doc in batch:
            count += 1
            print(f"  [{count}] {doc.semantic_identifier} "
                  f"({doc.size_bytes} bytes, {doc.extension})")

    print(f"\n-> {scope} scope: {count} document(s) found.\n")

# 1. Account scope
if TOKEN:
    _test_scope("account")
else:
    print("\nSkipping account scope (set SEAFILE_TOKEN)")

# 2. Library scope
if REPO_ID and (TOKEN or REPO_TOKEN):
    _test_scope("library", repo_id=REPO_ID)
else:
    print("\nSkipping library scope (set SEAFILE_REPO_ID + token)")

# 3. Directory scope
if REPO_ID and SYNC_PATH and (TOKEN or REPO_TOKEN):
    _test_scope("directory", repo_id=REPO_ID, sync_path=SYNC_PATH)
else:
    print("\nSkipping directory scope (set SEAFILE_REPO_ID + SEAFILE_SYNC_PATH + token)")
```
2026-02-28 10:24:28 +08:00
5903d1c8f1 Feat: GitHub connector (#12314)
### What problem does this PR solve?

Feat: GitHub connector

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-12-30 15:09:52 +08:00
1812491679 Feat: add Airtable connector and integration for data synchronization (#12211)
### What problem does this PR solve?
change:
add Airtable connector and integration for data synchronization
### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-12-25 17:50:41 +08:00
42f9ac997f Remove Chinese comments and fix function arguments errors (#12052)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-12-22 12:59:37 +08:00
7a4044b05f Feat: use filepath for files with the same name for all data source types (#11819)
### What problem does this PR solve?

When there are multiple files with the same name the file would just
duplicate, making it hard to distinguish between the different files.
Now if there are multiple files with the same name, they will be named
after their folder path in the storage unit.

This was done for the webdav connector and with this PR also for Notion,
Confluence and S3 Storage.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Contribution by RAGcon GmbH, visit us [here](https://www.ragcon.ai/)
2025-12-18 17:42:43 +08:00
43f51baa96 Fix errors (#11804)
### What problem does this PR solve?

1. typos
2. grammar errors.

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-12-08 12:21:18 +08:00
12979a3f21 feat: improve metadata handling in connector service (#11421)
### What problem does this PR solve?

- Update sync data source to handle metadata properly

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-11-26 19:55:48 +08:00
df16a80f25 Feat: add initial Google Drive connector support (#11147)
### What problem does this PR solve?

This feature is primarily ported from the
[Onyx](https://github.com/onyx-dot-app/onyx) project with necessary
modifications. Thanks for such a brilliant project.

Minor: consistently use `google_drive` rather than `google_driver`.

<img width="566" height="731" alt="image"
src="https://github.com/user-attachments/assets/6f64e70e-881e-42c7-b45f-809d3e0024a4"
/>

<img width="904" height="830" alt="image"
src="https://github.com/user-attachments/assets/dfa7d1ef-819a-4a82-8c52-0999f48ed4a6"
/>

<img width="911" height="869" alt="image"
src="https://github.com/user-attachments/assets/39e792fb-9fbe-4f3d-9b3c-b2265186bc22"
/>

<img width="947" height="323" alt="image"
src="https://github.com/user-attachments/assets/27d70e96-d9c0-42d9-8c89-276919b6d61d"
/>


### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-10 19:15:02 +08:00
465a140727 Feat: refine Confluence connector (#10994)
### What problem does this PR solve?

Refine Confluence connector.
#10953

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
2025-11-04 17:29:11 +08:00
3e5a39482e Feat: Support multiple data sources synchronizations (#10954)
### What problem does this PR solve?
#10953

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-03 19:59:18 +08:00