Commit Graph

254 Commits

Author SHA1 Message Date
7011a5029e merge main 2024-12-31 14:12:35 +08:00
8d15c8cfbf fix: improve error handling in NotionExtractor data fetching (#12182)
Signed-off-by: -LAN- <laipz8200@outlook.com>
2024-12-29 11:53:09 +08:00
dae1b5a619 fix: import jieba.analyse (#12133)
Signed-off-by: -LAN- <laipz8200@outlook.com>
2024-12-27 11:37:55 +08:00
6175f8c16f merge main 2024-12-26 16:35:11 +08:00
811e4bd0cf fix unstructured setting (#12116) 2024-12-26 12:08:36 +08:00
84ac004772 py lint (#12102)
Signed-off-by: -LAN- <laipz8200@outlook.com>
Co-authored-by: -LAN- <laipz8200@outlook.com>
2024-12-26 00:16:35 +08:00
9231fdbf4c Feat/support parent child chunk (#12092) 2024-12-25 19:49:07 +08:00
56e15d09a9 feat: mypy for all type check (#10921) 2024-12-24 18:38:51 +08:00
c3f3b79b79 merge main 2024-12-23 15:33:08 +08:00
599d410d99 fix: validate reranking model attributes before processing (#11930)
Signed-off-by: -LAN- <laipz8200@outlook.com>
2024-12-21 21:23:12 +08:00
8c559d6231 fix(retrieval_service): avoid to use exception (#11925)
Signed-off-by: -LAN- <laipz8200@outlook.com>
2024-12-21 21:19:46 +08:00
7b03a0316d fix: better memory usage from 800+ to 500+ (#11796)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2024-12-20 14:51:43 +08:00
463fbe2680 fix: better gard nan value from numpy for issue #11827 (#11864)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2024-12-20 09:28:32 +08:00
5a8a901560 fix: float values are not json for nan value close #11827 (#11840)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2024-12-19 20:50:20 +08:00
ad17ff9a92 Lindorm vdb bug-fix (#11790)
Co-authored-by: jiangzhijie <jiangzhijie.jzj@alibaba-inc.com>
2024-12-18 15:19:20 +08:00
171cc88a0d merge main 2024-12-16 11:09:13 +08:00
924b4fe742 test: run vdb tests on TiDB Vector with docker in CI tests (#11645) 2024-12-15 17:16:40 +08:00
22258fb0bf fix: filter bug for keywork cause code can not reach (#11666)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2024-12-15 17:12:06 +08:00
36cb25b341 fix: support mdx files close #11557 (#11565)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2024-12-12 13:37:56 +08:00
0d04cdc323 Lindorm vdb (#11574)
Co-authored-by: jiangzhijie <jiangzhijie.jzj@alibaba-inc.com>
2024-12-12 09:43:27 +08:00
9b7adcd4d9 update tidb batch get endpoint to basic mode (#11426) 2024-12-06 17:06:46 +08:00
Yi
b8f9747849 Merge branch "main" into feat/plugins 2024-12-05 15:08:09 +08:00
d7c1f43b49 fix tidb full-text-search vector missed (#11337) 2024-12-04 16:13:23 +08:00
f634a4488f merge main 2024-12-04 11:06:00 +08:00
c58d2fce89 roll back rerank topn setting (#11297) 2024-12-03 17:34:56 +08:00
e686f12317 fix: better handle error (#11265)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2024-12-03 09:15:38 +08:00
9601102885 fix(word_extractor): Fix type error and remove stream in ssrf_proxy (#11241)
Signed-off-by: -LAN- <laipz8200@outlook.com>
2024-12-02 10:24:03 +08:00
f9c2aa7689 feat: add retireval_top_n to config in env (#11132) 2024-11-30 11:14:45 +08:00
2d6865d421 Ensure consistent float type for cached embedding return values (#10185) 2024-11-29 09:18:41 +08:00
d7160ee563 fix: typo in upstashVector if id is always true, also fix some type hint (#11183)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2024-11-28 14:05:25 +08:00
9789905a1f chore(*): Removes debugging print statements (#11145)
Signed-off-by: -LAN- <laipz8200@outlook.com>
2024-11-26 22:03:19 +08:00
a0873a956f merge main 2024-11-26 10:27:48 +08:00
6c8e208ef3 chore: bump minimum supported Python version to 3.11 (#10386) 2024-11-24 13:28:46 +08:00
ed55de888a fix: rules should not be None for in (#10977)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2024-11-22 23:04:20 +08:00
cb0c55daa7 fix weight rerank of knowledge retrieval (#10931) 2024-11-21 17:53:20 +08:00
99ffe43e91 merge main 2024-11-20 18:24:03 +08:00
58a9d9eb9a fix: better WeightRerankRunner run logic use O(1) and delete unused code (#10849)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2024-11-19 20:12:13 +08:00
14f3d44c37 refactor: improve handling of leading punctuation removal (#10761) 2024-11-18 21:32:33 +08:00
873e9720e9 feat: AnalyticDB vector store supports invocation via SQL. (#10802)
Co-authored-by: 璟义 <yangshangpo.ysp@alibaba-inc.com>
2024-11-18 19:29:54 +08:00
51db59622c chore(lint): cleanup repeated cause exception in logging.exception replaced by helpful message (#10425) 2024-11-15 15:41:40 +08:00
6fbff78b9c merge 2024-11-15 15:16:53 +08:00
0b2d51d859 add the index field for elasticsearch (#10592) 2024-11-12 21:43:16 +08:00
a1543b7da0 fix(extractor): temporary file (#10543) 2024-11-11 17:31:27 +08:00
a9de7f24a2 merge main 2024-11-08 13:55:39 +08:00
c9f785e00f Feat/tools/gitlab (#10407) 2024-11-08 09:53:03 +08:00
574c4a264f chore(lint): Use logging.exception instead of logging.error (#10415) 2024-11-07 21:13:02 +08:00
e90a06a7b7 fix the ssrf of docx file extractor external images (#10237) 2024-11-07 17:38:31 +08:00
31445c3782 Add Lindorm as a VDB choice (#10202)
Co-authored-by: jiangzhijie <jiangzhijie.jzj@alibaba-inc.com>
2024-11-07 17:38:30 +08:00
602f75bb30 fix: avoid unexpected error when create knowledge base with baidu vector database and wenxin embedding model (#10130) 2024-11-07 17:38:15 +08:00
1b645c1cc9 fix issue: query is none when doing retrieval (#10129) 2024-11-07 17:38:14 +08:00