Latest papers

3,770 papers
attack arXiv Apr 10, 2026 · 4d ago

XFED: Non-Collusive Model Poisoning Attack Against Byzantine-Robust Federated Classifiers

Israt Jahan Mouri, Muhammad Ridowan, Muhammad Abdullah Adnan · Bangladesh University of Engineering and Technology · TigerIT Bangladesh Ltd.

Non-collusive federated learning poisoning attack where compromised clients independently craft malicious updates without coordination

Data Poisoning Attack federated-learning
PDF
defense The 64th Annual Meeting of the... Apr 10, 2026 · 4d ago

Leave My Images Alone: Preventing Multi-Modal Large Language Models from Analyzing Images via Visual Prompt Injection

Zedian Shao, Hongbin Liu, Yuepeng Hu et al. · Georgia Institute of Technology · Duke University

Embeds imperceptible adversarial perturbations in images that force MLLMs to refuse analysis requests, protecting user privacy

Input Manipulation Attack Prompt Injection visionmultimodal
PDF
attack arXiv Apr 10, 2026 · 4d ago

Unreal Thinking: Chain-of-Thought Hijacking via Two-stage Backdoor

Wenhan Chang, Tianqing Zhu, Ping Xiong et al. · Zhongnan University of Economics and Law · City University of Macau

Backdoor attack embedding triggers in lightweight adapters that hijack LLM reasoning chains to display malicious thought processes

Model Poisoning AI Supply Chain Attacks Prompt Injection nlp
PDF Code
defense arXiv Apr 10, 2026 · 4d ago

Precise Shield: Explaining and Aligning VLLM Safety via Neuron-Level Guidance

Enyi Shi, Fei Shen, Shuyi Miao et al. · Nanjing University of Science and Technology · National University of Singapore +2 more

Neuron-level defense identifying and fine-tuning safety-critical neurons to improve VLLM robustness against cross-lingual multimodal jailbreaks

Input Manipulation Attack Prompt Injection multimodalnlpvision
PDF
attack arXiv Apr 10, 2026 · 4d ago

Mosaic: Multimodal Jailbreak against Closed-Source VLMs via Multi-View Ensemble Optimization

Yuqin Lan, Gen Li, Yuanze Hu et al. · Beihang University · Huazhong University of Science and Technology

Gradient-based ensemble jailbreak attack on closed-source VLMs via multi-view image perturbations and surrogate model aggregation

Input Manipulation Attack Prompt Injection multimodalvisionnlp
PDF Code
defense arXiv Apr 10, 2026 · 4d ago

Detecting Diffusion-generated Images via Dynamic Assembly ForestsDetecting Diffusion-generated Images via Dynamic Assembly Forests

Mengxin Fu, Yuezun Li · Ocean University of China

Lightweight forest-based detector for diffusion-generated images achieving competitive accuracy with 100x fewer parameters than CNNs

Output Integrity Attack visiongenerative
PDF Code
defense arXiv Apr 10, 2026 · 4d ago

AudioGuard: Toward Comprehensive Audio Safety Protection Across Diverse Threat Models

Mintong Kang, Chen Fang, Bo Li · University of Illinois Urbana-Champaign

Comprehensive audio safety guardrail detecting harmful sounds, voice impersonation, child voice misuse, and risky voice-content combinations

Input Manipulation Attack Output Integrity Attack Prompt Injection audionlpmultimodal
PDF
benchmark arXiv Apr 10, 2026 · 4d ago

Spectral Geometry of LoRA Adapters Encodes Training Objective and Predicts Harmful Compliance

Roi Paul

Spectral analysis of LoRA weight deltas identifies which fine-tuning objective was used and predicts harmful compliance rates

Transfer Learning Attack Prompt Injection nlp
PDF
defense arXiv Apr 10, 2026 · 4d ago

CORA: Conformal Risk-Controlled Agents for Safeguarded Mobile GUI Automation

Yushi Feng, Junye Du, Qifan Wang et al. · The University of Hong Kong · The Chinese University of Hong Kong +1 more

Conformal risk control framework that provides statistical safety guarantees for autonomous mobile GUI agents against harmful actions

Prompt Injection Excessive Agency multimodalvision
PDF Code
defense arXiv Apr 10, 2026 · 4d ago

Dictionary-Aligned Concept Control for Safeguarding Multimodal LLMs

Jinqi Luo, Jinyu Yang, Tal Neiman et al. · University of Pennsylvania · Amazon +1 more

Activation steering defense using sparse autoencoders and concept dictionaries to safeguard multimodal LLMs against jailbreaks

Prompt Injection nlpvisionmultimodal
PDF
attack arXiv Apr 10, 2026 · 4d ago

GRM: Utility-Aware Jailbreak Attacks on Audio LLMs via Gradient-Ratio Masking

Yunqiang Wang, Hengyuan Na, Di Wu et al. · Sun Yat-Sen University

Frequency-selective adversarial audio attack jailbreaking ALLMs while preserving transcription quality via gradient-ratio band masking

Input Manipulation Attack Prompt Injection audiomultimodalnlp
PDF
attack arXiv Apr 10, 2026 · 4d ago

BadSkill: Backdoor Attacks on Agent Skills via Model-in-Skill Poisoning

Guiyao Tie, Jiawen Shi, Pan Zhou et al. · Huazhong University of Science and Technology · Lehigh University

Backdoor attack embedding trojaned classifiers in agent skills that activate malicious payloads via semantic trigger combinations in routine parameters

Model Poisoning AI Supply Chain Attacks Excessive Agency nlp
PDF
defense arXiv Apr 10, 2026 · 4d ago

Efficient Unlearning through Maximizing Relearning Convergence Delay

Khoa Tran, Simon S. Woo · Sungkyunkwan University · Secure Machines Lab

Machine unlearning method evaluated by measuring how long it takes adversaries to relearn forgotten data from unlearned models

Model Inversion Attack vision
PDF
defense arXiv Apr 10, 2026 · 4d ago

CLIP-Inspector: Model-Level Backdoor Detection for Prompt-Tuned CLIP via OOD Trigger Inversion

Akshit Jindal, Saket Anand, Chetan Arora et al. · IIIT Delhi · IIT Delhi

Detects backdoors in prompt-tuned CLIP via OOD trigger inversion, achieving 94% detection accuracy and enabling model repair

Model Poisoning Transfer Learning Attack visionmultimodal
PDF
benchmark arXiv Apr 10, 2026 · 4d ago

Large Language Models Generate Harmful Content Using a Distinct, Unified Mechanism

Hadas Orgad, Boyi Wei, Kaden Zheng et al. · Harvard University · Princeton University +2 more

Discovers that LLM harmful content generation relies on a compact, unified set of weights distinct from benign capabilities, explaining jailbreak brittleness and emergent misalignment

Transfer Learning Attack Prompt Injection nlp
PDF
attack arXiv Apr 9, 2026 · 5d ago

Phantasia: Context-Adaptive Backdoors in Vision Language Models

Nam Duong Tran, Phi Le Nguyen · Hanoi University of Science and Technology

Context-adaptive backdoor attack on VLMs that generates semantically coherent malicious responses, evading output-based defenses that detect fixed patterns

Model Poisoning Input Manipulation Attack Prompt Injection multimodalvisionnlp
PDF
defense arXiv Apr 9, 2026 · 5d ago

SyncBreaker:Stage-Aware Multimodal Adversarial Attacks on Audio-Driven Talking Head Generation

Wenli Zhang, Xianglong Shi, Sirui Zhao et al. · University of Science and Technology of China · Beijing University of Technology

Multimodal adversarial protection that perturbs both portrait images and audio to break lip-sync in deepfake talking-head videos

Input Manipulation Attack Output Integrity Attack multimodalaudiovisiongenerative
PDF Code
attack arXiv Apr 9, 2026 · 5d ago

Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain

Hanzhi Liu, Chaofan Shou, Hongbo Wen et al. · University of California · Fuzzland +1 more

Malicious LLM API routers inject code into tool calls and steal credentials from agent frameworks in the wild

AI Supply Chain Attacks Insecure Plugin Design Sensitive Information Disclosure nlp
PDF
defense arXiv Apr 9, 2026 · 5d ago

TADP-RME: A Trust-Adaptive Differential Privacy Framework for Enhancing Reliability of Data-Driven Systems

Labani Halder, Payel Sadhukhan, Sarbani Palit · Indian Statistical Institute · Army Institute of Management

Trust-adaptive differential privacy with geometric transformation to defend against inference attacks on ML training data

Model Inversion Attack Membership Inference Attack tabular
PDF
attack arXiv Apr 9, 2026 · 5d ago

Preference Redirection via Attention Concentration: An Attack on Computer Use Agents

Dominik Seip, Matthias Hein · University of Tübingen

Adversarial visual patch attack on computer use agents that manipulates attention to redirect product selection decisions on trusted websites

Input Manipulation Attack Prompt Injection multimodalvision
PDF
Loading more papers…