youngkingdom/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Mor Zusman	fdd9daafa3	[Kernel/Model] Migrate mamba_ssm and causal_conv1d kernels to vLLM (#7651 )	2024-08-28 15:06:52 -07:00
Stas Bekman	8c56e57def	[Doc] fix 404 link (#7966 )	2024-08-28 13:54:23 -07:00
Woosuk Kwon	eeffde1ac0	[TPU] Upgrade PyTorch XLA nightly (#7967 )	2024-08-28 13:10:21 -07:00
rasmith	e5697d161c	[Kernel] [Triton] [AMD] Adding Triton implementations awq_dequantize and awq_gemm to support AWQ (#7386 )	2024-08-28 15:37:47 -04:00
Pavani Majety	b98cc28f91	[Core][Kernels] Use FlashInfer backend for FP8 KV Cache when available. (#7798 ) Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-08-28 10:01:22 -07:00
Cyrus Leung	ef9baee3c5	[Bugfix][VLM] Fix incompatibility between #7902 and #7230 (#7948 )	2024-08-28 08:11:18 -07:00
Stas Bekman	98c12cffe5	[Doc] fix the autoAWQ example (#7937 )	2024-08-28 12:12:32 +00:00
youkaichao	f52a43a8b9	[ci][test] fix pp test failure (#7945 )	2024-08-28 01:27:07 -07:00
Cody Yu	e3580537a4	[Performance] Enable chunked prefill and prefix caching together (#7753 )	2024-08-28 00:36:31 -07:00
Alexander Matveev	f508e03e7f	[Core] Async_output_proc: Add virtual engine support (towards pipeline parallel) (#7911 )	2024-08-28 00:02:30 -07:00
Cyrus Leung	51f86bf487	[mypy][CI/Build] Fix mypy errors (#7929 )	2024-08-27 23:47:44 -07:00
bnellnm	c166e7e43e	[Bugfix] Allow ScalarType to be compiled with pytorch 2.3 and add checks for registering FakeScalarType and dynamo support. (#7886 )	2024-08-27 23:13:45 -04:00
youkaichao	bc6e42a9b1	[hardware][rocm] allow rocm to override default env var (#7926 )	2024-08-27 19:50:06 -07:00
Peter Salas	fab5f53e2d	[Core][VLM] Stack multimodal tensors to represent multiple images within each prompt (#7902 )	2024-08-28 01:53:56 +00:00
Jonathan Berkhahn	9c71c97ae2	[mypy] Enable mypy type checking for `vllm/core` (#7229 )	2024-08-28 07:11:14 +08:00
zifeitong	5340a2dccf	[Model] Add multi-image input support for LLaVA-Next offline inference (#7230 )	2024-08-28 07:09:02 +08:00
Philipp Schmid	345be0e244	[benchmark] Update TGI version (#7917 )	2024-08-27 15:07:53 -07:00
Dipika Sikka	fc911880cc	[Kernel] Expand MoE weight loading + Add Fused Marlin MoE Kernel (#7766 ) Co-authored-by: ElizaWszola <eliza@neuralmagic.com>	2024-08-27 15:07:09 -07:00
youkaichao	ed6f002d33	[cuda][misc] error on empty CUDA_VISIBLE_DEVICES (#7924 )	2024-08-27 12:06:11 -07:00
Isotr0py	b09c755be8	[Bugfix] Fix phi3v incorrect image_idx when using async engine (#7916 )	2024-08-27 17:36:09 +00:00
alexeykondrat	42e932c7d4	[CI/Build][ROCm] Enabling tensorizer tests for ROCm (#7237 )	2024-08-27 10:09:13 -07:00
Kunshang Ji	076169f603	[Hardware][Intel GPU] Add intel GPU pipeline parallel support. (#7810 )	2024-08-27 10:07:02 -07:00
Isotr0py	9db642138b	[CI/Build][VLM] Cleanup multiple images inputs model test (#7897 )	2024-08-27 15:28:30 +00:00
Patrick von Platen	6fc4e6e07a	[Model] Add Mistral Tokenization to improve robustness and chat encoding (#7739 )	2024-08-27 12:40:02 +00:00
Cody Yu	9606c7197d	Revert #7509 (#7887 )	2024-08-27 00:16:31 -07:00
youkaichao	64cc644425	[core][torch.compile] discard the compile for profiling (#7796 )	2024-08-26 21:33:58 -07:00
Nick Hill	39178c7fbc	[Tests] Disable retries and use context manager for openai client (#7565 )	2024-08-26 21:33:17 -07:00
Megha Agarwal	2eedede875	[Core] Asynchronous Output Processor (#7049 ) Co-authored-by: Alexander Matveev <alexm@neuralmagic.com>	2024-08-26 20:53:20 -07:00
Dipika Sikka	015e6cc252	[Misc] Update compressed tensors lifecycle to remove `prefix` from `create_weights` (#7825 )	2024-08-26 18:09:34 -06:00
omrishiv	760e9f71a8	[Bugfix] neuron: enable tensor parallelism (#7562 ) Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>	2024-08-26 15:13:13 -07:00
youkaichao	05826c887b	[misc] fix custom allreduce p2p cache file generation (#7853 )	2024-08-26 15:02:25 -07:00
Dipika Sikka	dd9857f5fa	[Misc] Update `gptq_marlin_24` to use vLLMParameters (#7762 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-08-26 17:44:54 -04:00
Dipika Sikka	665304092d	[Misc] Update `qqq` to use vLLMParameters (#7805 )	2024-08-26 13:16:15 -06:00
Cody Yu	2deb029d11	[Performance][BlockManagerV2] Mark prefix cache block as computed after schedule (#7822 )	2024-08-26 11:24:53 -07:00
Cyrus Leung	029c71de11	[CI/Build] Avoid downloading all HF files in `RemoteOpenAIServer` (#7836 )	2024-08-26 05:31:10 +00:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	0b769992ec	[Bugfix]: Use float32 for base64 embedding (#7855 ) Signed-off-by: Hollow Man <hollowman@opensuse.org>	2024-08-26 03:16:38 +00:00
Nick Hill	1856aff4d6	[Spec Decoding] Streamline batch expansion tensor manipulation (#7851 )	2024-08-25 15:45:14 -07:00
youkaichao	70c094ade6	[misc][cuda] improve pynvml warning (#7852 )	2024-08-25 14:30:09 -07:00
Isotr0py	2059b8d9ca	[Misc] Remove snapshot_download usage in InternVL2 test (#7835 )	2024-08-25 15:53:09 +00:00
Isotr0py	8aaf3d5347	[Model][VLM] Support multi-images inputs for Phi-3-vision models (#7783 )	2024-08-25 11:51:20 +00:00
zifeitong	80162c44b1	[Bugfix] Fix Phi-3v crash when input images are of certain sizes (#7840 )	2024-08-24 18:16:24 -07:00
youkaichao	aab0fcdb63	[ci][test] fix RemoteOpenAIServer (#7838 )	2024-08-24 17:31:28 +00:00
youkaichao	ea9fa160e3	[ci][test] exclude model download time in server start time (#7834 )	2024-08-24 01:03:27 -07:00
youkaichao	7d9ffa2ae1	[misc][core] lazy import outlines (#7831 )	2024-08-24 00:51:38 -07:00
Tyler Rockwood	d81abefd2e	[Frontend] add json_schema support from OpenAI protocol (#7654 )	2024-08-23 23:07:24 -07:00
Pooya Davoodi	8da48e4d95	[Frontend] Publish Prometheus metrics in run_batch API (#7641 )	2024-08-23 23:04:22 -07:00
Pooya Davoodi	6885fde317	[Bugfix] Fix run_batch logger (#7640 )	2024-08-23 13:58:26 -07:00
Alexander Matveev	9db93de20c	[Core] Add multi-step support to LLMEngine (#7789 )	2024-08-23 12:45:53 -07:00
Simon Mo	09c7792610	Bump version to v0.5.5 (#7823 ) v0.5.5	2024-08-23 11:35:33 -07:00
Dipika Sikka	f1df5dbfd6	[Misc] Update `marlin` to use vLLMParameters (#7803 )	2024-08-23 14:30:52 -04:00

1 2 3 4 5 ...

2490 Commits