Pushing the Frontier of Black-Box LVLM Attacks via Fine-Grained Detail Targeting

VILA Lab, Department of Machine Learning, MBZUAI
Corresponding Author
M-Attack-V2 improvement over M-Attack

M-Attack-V2 significantly improves attack success rate (ASR) and keyword matching rate (KMR) over M-Attack across state-of-the-art commercial black-box models including Claude 4, Gemini 2.5, and GPT-5.

Unexpectedly Low Gradient Similarity

Local-level matching methods exhibit near-zero gradient cosine similarity between iterations, even with significant spatial overlap between crops. This stems from ViTs' translation sensitivity and an overlooked asymmetry: source crops reshape the pixel-space gradient landscape, while target crops merely shift the feature-space reference.

Gradient cosine similarity analysis.

(a) Gradient similarity vs. IoU between two crops. (b) Cosine similarity of consecutive source gradients across iterations.

Asymmetric Matching over Expectation

We reformulate the objective as an expectation over local transformations within an asymmetric framework:

$$\min_{\lVert \mathbf{X}_\text{sou} \rVert_p \le \epsilon} \mathbb{E}_{\mathcal{T} \sim \mathcal{D},\, y \sim \mathcal{Y}} \left[ \mathcal{L}\!\left(f\!\left(\mathcal{T}(\mathbf{X}_{\text{sou}})\right),\, y\right) \right]$$

where $\mathcal{D}$ is the distribution of local transformations and $\mathcal{Y}$ the target semantic distribution. This highlights the intrinsic asymmetry: embedding content $y$ into a locally transformed source $\mathcal{T}(\mathbf{X}_{\text{sou}})$. Our two enhancements, Multi-Crop Alignment (MCA) and Auxiliary Target Alignment (ATA), improve the expectation estimation and the sampling quality of $\mathcal{Y}$, respectively.

Gradient Denoising via Multi-Crop Alignment

MCA averages gradients from $K$ independent crops per iteration, yielding a low-variance estimate of the expected gradient. This produces smoother gradient patterns and accelerates convergence compared to single-crop alignment.

Multi-Crop Alignment comparison.

(a) Optimization trajectories with different K. (b) Gradient patterns: single-crop (M-Attack) vs. multi-crop (M-Attack-V2).

Auxiliary Target Alignment

Selecting a representative target embedding $y \in \mathcal{Y}$ is challenging since $\mathcal{Y}$ is unobservable. M-Attack explores via transformed views of the target, but radical crops drift too far while conservative ones provide little signal.

ATA introduces $P$ auxiliary images $\{\mathbf{X}_\text{aux}^{(p)}\}_{p=1}^P$ as additional semantic anchors. With mild transformations $\tilde{\mathcal{T}} \sim \tilde{\mathcal{D}}$ applied to each anchor, the combined objective becomes:

$$\hat{\mathcal{L}} = \frac{1}{K} \sum_{k=1}^{K} \Big[ \mathcal{L}(f(\mathcal{T}_k(\mathbf{X}_\text{sou})), y_0) + \frac{\lambda}{P} \sum_{p=1}^{P} \mathcal{L}(f(\mathcal{T}_k(\mathbf{X}_{\text{sou}})), \tilde{y}_p) \Big]$$

where $y_0 = f(\hat{\mathcal{T}}_0(\mathbf{X}_\text{tar}))$, $\tilde{y}_p = f(\tilde{\mathcal{T}}_p(\mathbf{X}_\text{aux}^{(p)}))$, and $\lambda \in [0,1]$ interpolates between target fidelity and auxiliary diversity. ATA achieves a better exploration-exploitation balance by allocating its shift budget toward semantically meaningful exploration via the auxiliary set.

Algorithm: M-Attack-V2 pipeline.

Experimental Results

M-Attack-V2 consistently outperforms all existing methods across GPT-5, Claude 4.0-thinking, and Gemini 2.5-Pro, achieving the highest attack success rates with strong imperceptibility.

Main results comparison table.

Comparison with state-of-the-art approaches on commercial black-box LVLMs.

Visualization of Adversarial Samples

Visual comparison across methods. M-Attack-V2 produces more effective yet more imperceptible perturbations.

Adversarial sample visualization.

BibTeX

@article{zhao2026pushingfrontierblackboxlvlm,
  title={Pushing the Frontier of Black-Box LVLM Attacks via Fine-Grained Detail Targeting},
  author={Zhao, Xiaohan and Li, Zhaoyi and Luo, Yaxin and Cui, Jiacheng and Shen, Zhiqiang},
  journal={arXiv preprint arXiv:2602.17645},
  year={2026}
}