A Frustratingly Simple Yet Highly Effective Attack Baseline: Over 90% Success Rate Against the Strong Black-box Models of GPT-4.5/4o/o1

Mohamed bin Zayed University of Artificial Intelligence


teaser image

Figure 1: Example responses from commercial LVLMs to targeted attacks generated by our method.

Overview of our method

teaser image

Figure 2: Illustration of our proposed framework. Our method is based on two components: Local-to-Global or Local-to-Local Matching (LM) and Model Ensemble (ENS). LM is the core of our approach, which helps to refine the local semantics of the perturbation. ENS helps to avoid overly relying on single models embedding similarity, thus improving attack transferability.

Experimental Results

teaser image

Table 1: Comparison with the state-of-the-art approaches.

teaser image

Table 2: Ablation study on the impact of ϵ.

Visualization of perturbations and adversarial samples

teaser image

Figure 3: Left: visualization of perturbations generated by models with local-to-global matching. The ensemble integrates these complementary strengths for high-quality perturbation. Right: visualization of perturbation generated by other methods.

teaser image

Figure 4: Visualization of adversarial samples generated by different methods.

BibTeX


    @article{li2025frustratingly,
      title={A frustratingly simple yet highly effective attack baseline: Over 90\% success rate against the strong black-box models of gpt-4.5/4o/o1},
      author={Li, Zhaoyi and Zhao, Xiaohan and Wu, Dong-Dong and Cui, Jiacheng and Shen, Zhiqiang},
      journal={arXiv preprint arXiv:2503.10635},
      year={2025}
    }