ChatGPT Vulnerability Detection Report

Can ChatGPT detect Smart Contract vulnerabilities?

In July 2023, we evaluated ChatGPT's smart contract vulnerability detection capabilities by testing 134 examples of vulnerable smart contracts (compiled in this GitHub repo) that were known to contain exploitable vulnerabilities and breaking them up into 41 groups based on the types of vulnerabilities that they contained. We then fed the smart contracts into ChatGPT and prompted it to find the vulnerabilities*.

/ How well did ChatGPT perform? Keep scrolling to find out! /

Where Does ChatGPT excel?

ChatGPT always successfully identified the following types of smart contract vulnerabilities:

Bad randomness
Use of deprecated functions
Right-to-left override control character
Integer overflow
Missing protection against signature replay
Typographical error
Variable shadowing
Arbitrary jump with function type variable
Logical error
Authorization through tx.origin
Presence of unused variables
Block values as a proxy for time
Default visibility
Asserting EOA from code side
Numerical Precision/Floating points
Outdated compiler versions
Message call with hardcoded gas amount

NO. of vulnerabilities in range of accuracy

100%

Between 75% and 99%

Between 50% and 75%

Between 25% and 49%

Between 0% and 24%

Impressive? Not Really...

In most of these cases, the presence of the vulnerability could have been easily identified by looking for certain functions or code patterns. For example, simply scanning code for the use of tx.origin or a pragma version would identify certain types of vulnerabilities. In general, ChatGPT detection ratio is around 75% with all types of vulnerabilities and in all tested smart contracts. However, we have discovered that ChatGPT can better identify vulnerabilities when prompted if a code sample contained a specific vulnerability (e.g., reentrancy) compared to asking it to find all vulnerabilities in a piece of code.

In fact, the specific vulnerability prompt increased ChatGPT-4's detection accuracy from 76.1% to 86.6%!

Where ChatGPT Falls Short

while ChatGPT is very effective at finding certain types of vulnerabilities, it struggles with the understanding of how Solidity or how EVM works.

In general, there are certain types of smart contract vulnerabilities that ChatGPT struggles with finding, regardless of the ChatGPT version tested:

Abuse of
Global Semantic

Insufficient
Gas Griefing

Storage
Collisions

Hash collisions with multiple variable length arguments

Reference to an external malicious contract

Furthermore, different versions of ChatGPT struggle with different vulnerabilities.

ChatGPT-3.5 struggled to identify:

Forced reception of Ether
Unencrypted private data on-chain
Short address attacks

ChatGPT-4 struggled to identify:

Delegated call to an untrusted callee
Signature malleability
Write to arbitrary storage location

ChatGPT was able to identify certain vulnerabilities with 100% accuracy — such as variable shadowing or bad randomness— within smart contracts. However, it tends to struggle with different variations of these attacks. For example, most instances of read-only and cross-function reentrancy were not detected during the study. Similarly, ChatGPT overlooked DoS by external calls without gas stipends in 2 out of 3 prompts.

ChatGPT can assess a lot of vulnerabilities.
Can you predict how ChatGPT did?

Can ChatGPT solve CTFs?

What is CTF?

CTF, or Capture The Flag, is a type of cybersecurity exercise where participants engage in solving security-related challenges, simulating real-world scenarios to enhance their skills and knowledge in protecting systems against cyber threats.

In our study, we found that ChatGPT could completely
solve CTF challenges 43.3% of the time

ChatGPT also offered a partial solution in an additional 20% of cases.

These results depend largely on the complexity of the CTF, with more complex challenges having lower success rates

Ethernaut

In Ethernaut, ChatGPT was able to solve most of the challenges excelling in the first level with 100% accuracy. Although it was clear that with increasing difficulty, it started to struggle more.
With a difficulty level of 4, it’s effectiveness decreased to 25%:

Capture The Ether

In Capture the Ether, we can observe that ChatGPT is only able to solve correctly 37.5% of the challenges versus 55% in Ethernaut:

Damn Vulnerable DeFi

This percentage drops even further for Damn Vulnerable Defi’s CTFs, falling to 26.7%. These challenges are often more complex because they typically require an analysis of multiple contracts and how they interact with each other.
This is different from previous repositories, where challenges usually center on just one contract.

Also, ChatGPT has much higher success rates if the CTF and its solution were published before ChatGPT was trained and, therefore,
were part of the tools’ training data set.

Top 3 Tips for Using ChatGPT to Detect Smart Contract Vulnerabilities:

Because ChatGPT was using data from 2021 and prior, ChatGPT cannot be used to identify ALL issues within a smart contract.
Organizations should always work with security experts like Halborn to supplement and enhance their protection.
Run multiple versions of ChatGPT when analyzing a smart contract to improve the probability of detection.
Be specific with your prompts: ask ChatGPT if a code sample contains a specific vulnerability (e.g., reentrancy). This not only increases the accuracy of its findings, it also speeds up the process of identifying a vulnerability in the code .

Need Help? We've got your back!

Don't let your platform fall victim to smart contract vulnerabilities or other security threats. Contact Halborn now for professional security advisory and auditing services.