youngkingdom/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
bnellnm	f9c069c85e	Modularize fused experts and integrate PPLX kernels (#15956 )	2025-05-14 13:11:54 -07:00
Ekagra Ranjan	418d2f8bfb	[V1][Spec Decode] Share input embedding of target model with EAGLE draft model to free ~1GB for llama 3 model (#17326 ) Co-authored-by: root <root@ekagra-8xh100.us-east5-a.c.serving-efficiency-poc.internal> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-05-14 12:31:46 -07:00
Ecthlion_zyy	33011318c2	Fix broken example: examples/offline_inference/profiling at scheduler_config (#18117 )	2025-05-13 23:19:14 -07:00
Tao He	60f7624334	Implements dual-chunk-flash-attn backend for dual chunk attention with sparse attention support (#11844 )	2025-05-12 19:52:47 -07:00
Harry Mellor	72a3f6b898	Construct `KVTransferConfig` properly from Python instead of using JSON blobs without CLI (#17994 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-12 11:25:33 -07:00
Isotr0py	021c16c7ca	[Model] Broadcast Ovis2 implementation to fit Ovis1.6 (#17861 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-11 17:56:30 -07:00
Mark McLoughlin	7e3571134f	[V1][Spec Decoding] Include bonus tokens in mean acceptance length (#17908 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-05-09 13:32:36 -07:00
Cyrus Leung	a1e19b635d	[Doc] Fix a typo in the file name (#17836 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-08 18:04:18 +08:00
Harry Mellor	646a31e51e	Fix and simplify `deprecated=True` CLI `kwarg` (#17781 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-07 16:51:06 +01:00
Satyajith Chilappagari	043e4c4955	Add NeuronxDistributedInference support, Speculative Decoding, Dynamic on-device sampling (#16357 ) Signed-off-by: Satyajith Chilappagari <satchill@amazon.com> Co-authored-by: Aaron Dou <yzdou@amazon.com> Co-authored-by: Shashwat Srijan <sssrijan@amazon.com> Co-authored-by: Chongming Ni <chongmni@amazon.com> Co-authored-by: Amulya Ballakur <amulyaab@amazon.com> Co-authored-by: Patrick Lange <patlange@amazon.com> Co-authored-by: Elaine Zhao <elaineyz@amazon.com> Co-authored-by: Lin Lin Pan <tailinpa@amazon.com> Co-authored-by: Navyadhara Gogineni <navyadha@amazon.com> Co-authored-by: Yishan McNabb <yishanm@amazon.com> Co-authored-by: Mrinal Shukla <181322398+mrinalks@users.noreply.github.com>	2025-05-07 00:07:30 -07:00
Jee Jee Li	ba7703e659	[Misc] Remove qlora_adapter_name_or_path (#17699 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-05-06 23:10:37 -07:00
Jevin Jiang	621ca2c0ab	[TPU] Increase block size and reset block shapes (#16458 )	2025-05-06 13:55:04 -04:00
Cyrus Leung	5b8c390747	[Bugfix] Fix modality limits in vision language example (#17721 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-06 16:12:28 +00:00
Harry Mellor	d6484ef3c3	Add full API docs and improve the UX of navigating them (#17485 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-03 19:42:43 -07:00
Cyrus Leung	d7543862bd	[Misc] Rename assets for testing (#17575 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-02 03:29:25 -07:00
Cyrus Leung	f89d0e11bf	[Misc] Continue refactoring model tests (#17573 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-01 22:06:08 -07:00
Isotr0py	88c8304104	[Model] Refactor Ovis2 to support original tokenizer (#17537 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-01 11:00:53 -07:00
Marco	54072f315f	[MODEL ADDITION] Ovis2 Model Addition (#15826 ) Signed-off-by: Marco <121761685+mlinmg@users.noreply.github.com> Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-04-30 07:33:29 +00:00
Bryan Lu	70788bdbdc	[V1][Spec Decode] Apply torch.compile & cudagraph to EAGLE (#17211 ) Signed-off-by: Bryan Lu <yuzhelu@amazon.com>	2025-04-29 21:10:00 +00:00
Alex Brooks	fa93cd9f60	[Model] Add Granite Speech Support (#16246 ) Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-04-28 10:05:00 +00:00
Isotr0py	8c1c926d00	[Bugfix] Fix missing int type for `-n` in multi-image example (#17223 )	2025-04-26 08:49:52 +00:00
Yihua Cheng	5e83a7277f	[v1] [P/D] Adding LMCache KV connector for v1 (#16625 )	2025-04-26 03:03:38 +00:00
Benjamin Chislett	a0e619e62a	[V1][Spec Decode] EAGLE-3 Support (#16937 ) Signed-off-by: Bryan Lu <yuzhelu@amazon.com> Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai> Co-authored-by: Bryan Lu <yuzhelu@amazon.com>	2025-04-25 15:43:07 -07:00
Cyrus Leung	205d84aaa9	[VLM] Clean up models (#16873 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-19 12:13:06 +00:00
Isotr0py	83f3c3bd91	[Model] Refactor Phi-4-multimodal to use merged processor and support V1 (#15477 ) Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-19 02:26:11 -07:00
Yang Fan	2c1bd848a6	[Model][VLM] Add Qwen2.5-Omni model support (thinker only) (#15130 ) Signed-off-by: fyabc <suyang.fy@alibaba-inc.com> Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: Xiong Wang <wangxiongts@163.com>	2025-04-18 23:14:36 -07:00
Cyrus Leung	aadb656562	[Misc] Clean up Kimi-VL (#16833 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-18 05:15:09 -07:00
Harry Mellor	e78587a64c	Improve-mm-and-pooler-and-decoding-configs (#16789 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-17 22:13:32 -07:00
Chauncey	7a4a5de729	[Misc] Update outdated note: LMCache now supports chunked prefill (#16697 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-04-18 05:12:42 +00:00
Yihua Cheng	3408e47159	[P/D][V1] KV Connector API V1 (#15960 ) Signed-off-by: ApostaC <yihua98@uchicago.edu> Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Signed-off-by: remi <remi@mistral.ai> Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Rémi Delacourt <54138269+Flechman@users.noreply.github.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>	2025-04-17 13:22:40 -07:00
Reid	99ed526101	[Misc] refactor examples series - lmcache (#16758 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-17 11:02:35 +00:00
Richard Liaw	8cac35ba43	[Ray] Improve documentation on batch inference (#16609 ) Signed-off-by: Richard Liaw <rliaw@berkeley.edu>	2025-04-16 22:19:26 -07:00
Isotr0py	cb072ce93b	[Bugfix] Update Florence-2 tokenizer to make grounding tasks work (#16734 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-04-17 04:17:39 +00:00
Reid	7168920491	[Misc] refactor examples series (#16708 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-16 10:16:36 +00:00
Reid	6ae996a873	[Misc] refactor argument parsing in examples (#16635 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-15 08:05:30 +00:00
courage17340	b1308b84a3	[Model][VLM] Add Kimi-VL model support (#16387 ) Signed-off-by: courage17340 <courage17340@163.com>	2025-04-14 21:41:48 +00:00
Reid	7cbfc10943	[Misc] refactor examples (#16563 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-14 09:59:15 +00:00
Jee Jee Li	3cdc57669f	[Misc] Delete redundant code (#16530 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-04-12 11:21:37 +00:00
Cyrus Leung	d9fc8cd9da	[V1] Enable multi-input by default (#15799 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-12 08:52:39 +00:00
wang.yuqi	fbf722c6e6	[Frontend] support matryoshka representation / support embedding API dimensions (#16331 )	2025-04-11 23:23:10 -07:00
Isotr0py	93195146ea	[Bugfix][VLM] Fix failing Phi-4-MM multi-images tests and add vision-speech test (#16424 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-04-11 04:57:16 +00:00
Lily Liu	e8224f3dca	[V1][Spec Decode] Eagle Model loading (#16035 ) Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>	2025-04-10 11:21:48 -07:00
Ye (Charlotte) Qi	61de3ef74b	[Model] Remove image mm limit for LLaMa4 (#16365 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-04-10 09:36:27 +00:00
Reid	1bff42c4b7	[Misc] refactor Structured Outputs example (#16322 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-09 23:32:42 +00:00
zh Wang	a25866ac8d	[Bugfix] Fix profiling.py (#16202 ) Signed-off-by: zh Wang <rekind133@outlook.com>	2025-04-09 17:03:34 +00:00
Chauncey	102bf967f0	[Model] Add smolvlm support (#16017 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-04-08 19:12:17 -07:00
Russell Bryant	2755c34a8f	[V1] Update structured output offline inference example (#15721 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-04-08 22:34:09 +00:00
Cyrus Leung	4ebc0b9640	[Bugfix] Proper input validation for multi-modal encoder-decoder models (#16156 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-08 09:45:21 -07:00
wang.yuqi	1f5d13ab9f	[New Model]: jinaai/jina-embeddings-v3 (#16120 )	2025-04-08 08:39:12 -07:00
Reid	7f00899ff7	[Misc] format and refactor some examples (#16252 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-04-08 10:42:32 +00:00

1 2 3

132 Commits