A Frustratingly Simple Yet Highly Effective Attack Baseline: Over 90% Success Rate Against the Strong Black-box Models of GPT-4.5/4o/o1

Mohamed bin Zayed University of Artificial Intelligence

Paper Code Hugging Face

Figure 1: Example responses from commercial LVLMs to targeted attacks generated by our method.

Overview of our method

Figure 2: Illustration of our proposed framework. Our method is based on two components: Local-to-Global or Local-to-Local Matching (LM) and Model Ensemble (ENS). LM is the core of our approach, which helps to refine the local semantics of the perturbation. ENS helps to avoid overly relying on single models embedding similarity, thus improving attack transferability.

Experimental Results

Table 1: Comparison with the state-of-the-art approaches.

Table 2: Ablation study on the impact of ϵ.

Visualization of perturbations and adversarial samples

Figure 3: Left: visualization of perturbations generated by models with local-to-global matching. The ensemble integrates these complementary strengths for high-quality perturbation. Right: visualization of perturbation generated by other methods.

Figure 4: Visualization of adversarial samples generated by different methods.

BibTeX


    @article{li2025frustratingly,
      title={A frustratingly simple yet highly effective attack baseline: Over 90\% success rate against the strong black-box models of gpt-4.5/4o/o1},
      author={Li, Zhaoyi and Zhao, Xiaohan and Wu, Dong-Dong and Cui, Jiacheng and Shen, Zhiqiang},
      journal={arXiv preprint arXiv:2503.10635},
      year={2025}
    }