This repository provides a summary of recent advancements in the security landscape surrounding Language Models for Code (also known as Neural Code Models), including backdoor, adversarial attacks, corresponding defenses and so on.
NOTE: We collect the original code from the papers and the code we have reproduced. While, our reproduced code is not guaranteed to be fully accurate and is for reference only. For specific issues, please consult the original authors.
Language Models for Code (CodeLMs) have significantly advanced code-related tasks, they are excel in programming language understanding and generation. Despite their success, CodeLMs are prone to security vulnerabilities, which have become a growing concern. While existing research has explored various attacks and defenses for CodeLMs. To address a systematic review of CodeLM security, this repository organizes the current knowledge on Security Threats and Defense Strategies.
NOTE: Our paper is labeled with 🚩.
The survey analyzes security threats to CodeLMs, categorizing existing attack types such as backdoor and adversarial attacks, and explores their implications for code intelligence tasks.
Year | Conf./Jour. | Paper |
---|---|---|
2024 | CoRR | Security of Language Models for Code: A Systematic Literature Review 🚩 |
2024 | 《软件学报》 | 深度代码模型安全综述 🚩 |
2024 | CoRR | Robustness, Security, Privacy, Explainability, Efficiency, and Usability of Large Language Models for Code. |
2023 | CoRR | A Survey of Trojans in Neural Models of Source Code: Taxonomy and Techniques. |
According to the document, security threats in CodeLMs are mainly classified into two categories: backdoor attacks and adversarial attacks. Backdoor attacks occur during the training phase, where attackers implant hidden backdoors in the model, allowing it to function normally on benign inputs but behave maliciously when triggered by specific patterns. In contrast, adversarial attacks happen during the testing phase, where carefully crafted perturbations are added to the input, causing the model to make incorrect predictions with high confidence while remaining undetectable to humans.
Backdoor attacks inject malicious behavior into the model during training, allowing the attacker to trigger it at inference time using specific triggers:
- Data poisoning attacks: Slight changes to the training data that cause backdoor behavior.
Year | Conf./Jour. | Paper | Code Repository | Reproduced Repository |
---|---|---|---|---|
2024 | ISSTA | FDI: Attack Neural Code Generation Systems through User Feedback Channel. | ||
2024 | TSE | Stealthy Backdoor Attack for Code Models. | ||
2024 | SP | Trojanpuzzle: Covertly Poisoning Code-Suggestion Models. | ||
2024 | TOSEM | Poison Attack and Poison Detection on Deep Source Code Processing Models. | ||
2023 | ICPC | Vulnerabilities in AI Code Generators: Exploring Targeted Data Poisoning Attacks. | ||
2023 | ACL | Backdooring Neural Code Search. 🚩 | ||
2022 | ICPR | Backdoors in Neural Models of Source Code. | ||
2022 | FSE | You See What I Want You to See: Poisoning Vulnerabilities in Neural Code Search. | ||
2021 | USENIX Security | Explanation-Guided Backdoor Poisoning Attacks Against Malware Classifiers. | ||
2021 | USENIX Security | You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion. |
- Model poisoning attacks: Changes that do not alter the functionality of the code but trick the model.
Year | Conf./Jour. | Paper | Code Repository | Reproduced Repository |
---|---|---|---|---|
2024 | Internetware | LateBA: Latent Backdoor Attack on Deep Bug Search via Infrequent Execution Codes. | ||
2023 | CoRR | BadCS: A Backdoor Attack Framework for Code search. | ||
2023 | ACL | Multi-target Backdoor Attacks for Code Pre-trained Models. | ||
2023 | USENIX Security | PELICAN: Exploiting Backdoors of Naturally Trained Deep Learning Models In Binary Code Analysis. | ||
2021 | USENIX Security | You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion. |
These attacks manipulate the input data to deceive the model into making incorrect predictions. Including two categories:
- White-box attacks: Attackers have complete knowledge of the target model, including model structure, weight parameters, and training data.
Year | Conf./Jour. | Paper | Code Repository | Reproduced Repository |
---|---|---|---|---|
2023 | BdCloud | AdvBinSD: Poisoning the Binary Code Similarity Detector via Isolated Instruction Sequences. | ||
2023 | CoRR | Adversarial Attacks against Binary Similarity Systems. | ||
2023 | SANER | How Robust Is a Large Pre-trained Language Model for Code Generation𝑓 A Case on Attacking GPT2. | ||
2022 | TOSEM | Towards Robustness of Deep Program Processing Models - Detection, Estimation, and Enhancement. | ||
2022 | ICECCS | Generating Adversarial Source Programs Using Important Tokens-based Structural Transformations. | ||
2021 | ICLR | Generating Adversarial Computer Programs using Optimized Obfuscations. | ||
2020 | OOPSLA | Adversarial Examples for Models of Code. | ||
2020 | ICML | Adversarial Robustness for Code. | ||
2018 | CoRR | Adversarial Binaries for Authorship Identification. |
- Black-box attacks: Adversaries attackers can only generate adversarial examples by obtaining limited model outputs through model queries.
In response to the growing security threats, researchers have proposed various defense mechanisms:
Methods for defending against backdoor attacks include:
Year | Conf./Jour. | Paper | Code Reporisty | Reproduced Reporisty |
---|---|---|---|---|
2024 | TOSEM | Poison Attack and Poison Detection on Deep Source Code Processing Models. | ||
2024 | CoRR | Eliminating Backdoors in Neural Code Models via Trigger Inversion. 🚩 | ||
2024 | CoRR | Defending Code Language Models against Backdoor Attacks with Deceptive Cross-Entropy Loss. | ||
2023 | CoRR | Occlusion-based Detection of Trojan-triggering Inputs in Large Language Models of Code. | ||
2022 | ICPR | Backdoors in Neural Models of Source Code. |
Approaches to counter adversarial attacks include:
Year | Conf./Jour. | Paper | Code Reporisty | Reproduced Reporisty |
---|---|---|---|---|
2024 | TOSEM | How Important Are Good Method Namesin NeuralCode Generation? A Model Robustness Perspective. | ||
2023 | ICSE | RoPGen: Towards Robust Code Authorship Attribution via Automatic Coding Style Transformation. | ||
2023 | PACM PL | Discrete Adversarial Attack to Models of Code. | ||
2023 | CCS | Large Language Models for Code: Security Hardening and Adversarial Testing. | ||
2023 | CoRR | Enhancing Robustness of AI Offensive Code Generators via Data Augmentation. | ||
2022 | SANER | Semantic Robustness of Models of Source Code. | ||
2022 | COLING | Semantic-Preserving Adversarial Code Comprehension. |
If you find this repository useful for your work, please include the following citation:
@article{2024-Security-of-CodeLMs,
title={Security of Language Models for Code: A Systematic Literature Review},
author={Yuchen Chen and Weisong Sun and Chunrong Fang and Zhenpeng Chen and Yifei Ge and Tingxu Han and Quanjun Zhang and Yang Liu and Zhenyu Chen and Baowen Xu},
journal={arXiv preprint arXiv:2410.15631},
year={2024}
}