In July 2023, we evaluated ChatGPT's smart contract vulnerability detection capabilities by testing 134 examples of vulnerable smart contracts (compiled in this GitHub repo) that were known to contain exploitable vulnerabilities and breaking them up into 41 groups based on the types of vulnerabilities that they contained. We then fed the smart contracts into ChatGPT and prompted it to find the vulnerabilities*.
/ How well did ChatGPT perform? Keep scrolling to find out! /
ChatGPT always successfully identified the following types of smart contract vulnerabilities:
Impressive? Not Really...
In most of these cases, the presence of the vulnerability could have been easily identified by looking for certain functions or code patterns. For example, simply scanning code for the use of tx.origin or a pragma version would identify certain types of vulnerabilities. In general, ChatGPT detection ratio is around 75% with all types of vulnerabilities and in all tested smart contracts. However, we have discovered that ChatGPT can better identify vulnerabilities when prompted if a code sample contained a specific vulnerability (e.g., reentrancy) compared to asking it to find all vulnerabilities in a piece of code.
In fact, the specific vulnerability prompt increased ChatGPT-4's detection accuracy from 76.1% to 86.6%!
while ChatGPT is very effective at finding certain types of vulnerabilities, it struggles with the understanding of how Solidity or how EVM works.
In general, there are certain types of smart contract vulnerabilities that ChatGPT struggles with finding, regardless of the ChatGPT version tested:
Abuse of
Global Semantic
Insufficient
Gas Griefing
Storage
Collisions
Hash collisions with multiple variable length arguments
Reference to an external malicious contract
Furthermore, different versions of ChatGPT struggle with different vulnerabilities.
ChatGPT-3.5 struggled to identify:
ChatGPT-4 struggled to identify:
ChatGPT was able to identify certain vulnerabilities with 100% accuracy — such as variable shadowing or bad randomness— within smart contracts. However, it tends to struggle with different variations of these attacks. For example, most instances of read-only and cross-function reentrancy were not detected during the study. Similarly, ChatGPT overlooked DoS by external calls without gas stipends in 2 out of 3 prompts.
CTF, or Capture The Flag, is a type of cybersecurity exercise where participants engage in solving security-related challenges, simulating real-world scenarios to enhance their skills and knowledge in protecting systems against cyber threats.
In our study, we found that ChatGPT could completely
solve CTF challenges 43.3% of the time
These results depend largely on the complexity of the CTF, with more complex challenges having lower success rates
Also, ChatGPT has much higher success rates if the CTF and its solution were published before ChatGPT was trained and, therefore,
were part of the tools’ training data set.
Top 3 Tips for Using ChatGPT to Detect Smart Contract Vulnerabilities:
Because ChatGPT was using data from 2021 and prior, ChatGPT cannot be used to identify ALL issues within a smart contract.
Organizations should always work with security experts like Halborn to supplement and enhance their protection.
Run multiple versions of ChatGPT when analyzing a smart contract to improve the probability of detection.
Be specific with your prompts: ask ChatGPT if a code sample contains a specific vulnerability (e.g., reentrancy). This not only increases the accuracy of its findings, it also speeds up the process of identifying a vulnerability in the code .
Don't let your platform fall victim to smart contract vulnerabilities or other security threats. Contact Halborn now for professional security advisory and auditing services.