youngkingdom/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Pooya Davoodi	1efce68605	[Bugfix] Use runner_type instead of task in GritLM (#11144 ) Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>	2024-12-13 04:09:53 +00:00
Luka Govedič	30870b4f66	[torch.compile] Dynamic fp8 + rms_norm fusion (#10906 ) Signed-off-by: luka <luka@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2024-12-13 03:19:23 +00:00
Cody Yu	78ed8f57d8	[Misc][V1] Fix type in v1 prefix caching (#11151 )	2024-12-13 00:57:40 +00:00
shangmingc	db6c264a1e	[Bugfix] Fix value unpack error of simple connector for KVCache transfer. (#11058 ) Signed-off-by: ShangmingCai <csmthu@gmail.com>	2024-12-12 21:19:17 +00:00
Jeremy Arnold	9f3974a319	Fix logging of the vLLM Config (#11143 )	2024-12-12 12:05:57 -08:00
Cody Yu	2c97eca1ff	[Misc] Validate grammar and fail early (#11119 )	2024-12-12 18:34:26 +00:00
Jeff Cook	5d712571af	[Bugfix] Quick fix to make Pixtral-HF load correctly again after `39e227c7ae`. (#11024 )	2024-12-12 18:09:20 +00:00
Ramon Ziai	d4d5291cc2	fix(docs): typo in helm install instructions (#11141 ) Signed-off-by: Ramon Ziai <ramon.ziai@bettermarks.com>	2024-12-12 17:36:32 +00:00
Roger Wang	4816d20aa4	[V1] Fix torch profiling for offline inference (#11125 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2024-12-12 15:51:53 +00:00
Jiaxin Shan	85362f028c	[Misc][LoRA] Ensure Lora Adapter requests return adapter name (#11094 ) Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2024-12-12 09:25:16 +00:00
youkaichao	62de37a38e	[core][distributed] initialization from StatelessProcessGroup (#10986 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-12 09:04:19 +00:00
Sanju C Sudhakaran	8195824206	[Hardware][Intel-Gaudi] Enable LoRA support for Intel Gaudi (HPU) (#10565 ) Signed-off-by: Sanju C Sudhakaran <scsudhakaran@habana.ai>	2024-12-12 08:09:28 +00:00
Woosuk Kwon	f092153fbe	[V1] Use more persistent buffers to optimize input preparation overheads (#11111 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-12-11 23:14:20 -08:00
Pooya Davoodi	1da8f0e1dd	[Model] Add support for embedding model GritLM (#10816 ) Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>	2024-12-12 06:39:16 +00:00
Russell Bryant	ccede2b264	[Core] cleanup zmq ipc sockets on exit (#11115 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-12-11 19:12:24 -08:00
Yuan Tang	24a36d6d5f	Update link to LlamaStack remote vLLM guide in serving_with_llamastack.rst (#11112 ) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2024-12-12 02:39:21 +00:00
Simon Mo	8fb26dac61	[Docs] Add media kit (#11121 )	2024-12-11 17:33:11 -08:00
Clayton	7439a8b5fc	[Bugfix] Multiple fixes to tool streaming with hermes and mistral (#10979 ) Signed-off-by: cedonley <clayton@donley.io>	2024-12-12 01:10:12 +00:00
Alexander Matveev	4e11683368	[V1] VLM preprocessor hashing (#11020 ) Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: Alexander Matveev <alexm@neuralmagic.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-12-12 00:55:30 +00:00
Tyler Michael Smith	452a723bf2	[V1][Core] Remove should_shutdown to simplify core process termination (#11113 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2024-12-11 23:34:54 +00:00
Cyrus Leung	d1e21a979b	[CI/Build] Split up VLM tests (#11083 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-12 06:18:16 +08:00
Rui Qiao	72ff3a9686	[core] Bump ray to use _overlap_gpu_communication in compiled graph tests (#10410 ) Signed-off-by: Rui Qiao <ubuntu@ip-172-31-15-128.us-west-2.compute.internal> Signed-off-by: Rui Qiao <ruisearch42@gmail.com> Co-authored-by: Rui Qiao <ubuntu@ip-172-31-15-128.us-west-2.compute.internal>	2024-12-11 11:36:35 -08:00
youkaichao	66aaa7722d	[torch.compile] remove graph logging in ci (#11110 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-11 10:59:50 -08:00
Woosuk Kwon	d643c2aba1	[V1] Use input_ids as input for text-only models (#11032 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-12-11 10:49:23 -08:00
youkaichao	91642db952	[torch.compile] use depyf to dump torch.compile internals (#10972 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-11 10:43:05 -08:00
bingps	fd22220687	[Doc] Installed version of llmcompressor for int8/fp8 quantization (#11103 ) Signed-off-by: Guangda Liu <bingps@users.noreply.github.com> Co-authored-by: Guangda Liu <bingps@users.noreply.github.com>	2024-12-11 15:43:24 +00:00
hissu-hyvarinen	b2f775456e	[CI/Build] Enable prefix caching test for AMD (#11098 ) Signed-off-by: Hissu Hyvarinen <hissu.hyvarinen@amd.com>	2024-12-11 15:23:37 +00:00
Cyrus Leung	cad5c0a6ed	[Doc] Update docs to refer to pooling models (#11093 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-11 13:36:27 +00:00
Cyrus Leung	8f10d5e393	[Misc] Split up pooling tasks (#10820 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-11 01:28:00 -08:00
Rafael Vasquez	40766ca1b8	[Bugfix]: Clamp `-inf` logprob values in prompt_logprobs (#11073 ) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>	2024-12-11 01:27:39 -08:00
B-201	2e32f5d28d	[Bugfix] Fix Idefics3 fails during multi-image inference (#11080 ) Signed-off-by: B-201 <Joy25810@foxmail.com>	2024-12-11 01:27:07 -08:00
Russell Bryant	61b1d2f6ae	[Core] v1: Use atexit to handle engine core client shutdown (#11076 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-12-11 01:26:36 -08:00
Kevin H. Luu	9974fca047	[ci/build] Fix entrypoints test and pin outlines version (#11088 )	2024-12-11 01:01:53 -08:00
Kevin H. Luu	3fb4b4f163	[ci/build] Fix AMD CI dependencies (#11087 )	2024-12-11 00:39:53 -08:00
Cyrus Leung	2e33fe4191	[CI/Build] Check transformers v4.47 (#10991 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-11 05:02:02 +00:00
Maximilien de Bayser	e39400a4b6	Fix streaming for granite tool call when <\|tool_call\|> is present (#11069 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2024-12-11 04:51:40 +00:00
Mor Zusman	ffa48c9146	[Model] PP support for Mamba-like models (#10992 ) Signed-off-by: mzusman <mor.zusmann@gmail.com>	2024-12-10 21:53:37 -05:00
Aurick Qiao	d5c5154fcf	[Misc] LoRA + Chunked Prefill (#9057 )	2024-12-11 10:09:20 +08:00
Tyler Michael Smith	9a93973708	[Bugfix] Fix Mamba multistep (#11071 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2024-12-11 00:16:22 +00:00
Woosuk Kwon	134810b3d9	[V1][Bugfix] Always set enable_chunked_prefill = True for V1 (#11061 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-12-10 14:41:23 -08:00
youkaichao	75f89dc44c	[torch.compile] add a flag to track batchsize statistics (#11059 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-12-10 12:40:52 -08:00
Russell Bryant	e739194926	[Core] Update to outlines >= 0.1.8 (#10576 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-12-10 12:08:16 -08:00
Flávia Béo	250ee65d72	[BUG] Remove token param #10921 (#11022 ) Signed-off-by: Flavia Beo <flavia.beo@ibm.com>	2024-12-10 17:38:15 +00:00
Joe Runde	9b9cef3145	[Bugfix] Backport request id validation to v0 (#11036 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-12-10 16:38:23 +00:00
Jee Jee Li	d05f88679b	[Misc][LoRA] Add PEFTHelper for LoRA (#11003 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2024-12-10 11:12:01 +00:00
Travis Johnson	beb16b2c81	[Bugfix] Handle <\|tool_call\|> token in granite tool parser (#11039 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>	2024-12-10 10:27:11 +00:00
Maxime Fournioux	fe2e10c71b	Add example of helm chart for vllm deployment on k8s (#9199 ) Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>	2024-12-10 09:19:27 +00:00
Gene Der Su	82c73fd510	[Bugfix] cuda error running llama 3.2 (#11047 )	2024-12-10 07:41:11 +00:00
Diego Marinho	bfd610430c	Update README.md (#11034 )	2024-12-09 23:08:10 -08:00
Jeff Cook	e35879c276	[Bugfix] Fix xgrammar failing to read a vocab_size from LlavaConfig on PixtralHF. (#11043 )	2024-12-10 14:54:22 +08:00

1 2 3 4 5 ...

3782 Commits