Back to list

vLLM: extract_hidden_states speculative decoding crashes server on any request with penalty parameters — CVE-2026-44223

GitHub · GitHub · CVE-2026-44223

ID
CVE-2026-44223
Date
Updated
Activity
Source
GitHub
Vendor
GitHub
Threat
medium
CVSS
6.5
EPSS
0.0004

Summary

### Summary The `extract_hidden_states` speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a `RuntimeError` that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (`repetition_penalty`, `frequency_penalty`, or `presence_penalty`). A single request with a penalty parameter (e.g.,…

Product

pip: vllm

What to do

General, cautious steps (verify details in the official source):

  • Review exposure and plan remediation based on risk and environment.
  • Identify affected product versions in your inventory and verify whether you are impacted.
  • Apply vendor patches/updates or recommended mitigations as soon as available.
  • Read the official advisory for exact affected versions and remediation steps.

Official advisory

Related advisories