vLLM: extract_hidden_states speculative decoding crashes server on any request with penalty parameters — CVE-2026-44223
GitHub · GitHub · CVE-2026-44223
ID
CVE-2026-44223
CVE-2026-44223
Date
Updated
Activity
Source
GitHub
GitHub
Vendor
GitHub
GitHub
Threat
medium
medium
CVSS
6.5
6.5
EPSS
0.0004
0.0004
Summary
### Summary The `extract_hidden_states` speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a `RuntimeError` that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (`repetition_penalty`, `frequency_penalty`, or `presence_penalty`). A single request with a penalty parameter (e.g.,…
Product
pip: vllm
What to do
General, cautious steps (verify details in the official source):
- Review exposure and plan remediation based on risk and environment.
- Identify affected product versions in your inventory and verify whether you are impacted.
- Apply vendor patches/updates or recommended mitigations as soon as available.
- Read the official advisory for exact affected versions and remediation steps.