John Saigle
October 13th, 2022
Most developers don’t dream of writing elaborate and exhaustive error-handling code.
Instead, they like to focus on the “happy path,” that is, the way a program flows when every little bit of logic is proceeding according to plan.
As security engineers and hackers, we are focused on the opposite. Instead of a pleasant stroll along the happy path, we have in mind the dangers that lay in wait just off the road for those who take the wrong turn. And we’re here to help you stay on the sunny side.
A convenient tool for a developer who is focused on the happy path is to use a helpful programming construction called Panic.
This post will explain why using Panic (and other similar error-handling syntax) can have devastating security impacts.
Developers can run into a wide variety of errors while developing. Part of creating robust and healthy software is writing code to handle these errors in a graceful way. This means that when errors occur, the program will make a note of it and inform the user or developer of the issue. If appropriate, the code will continue executing its logic even after encountering an error.
However, sometimes we find ourselves in a hurry. We want to focus on creating interesting features and solving complex problems. Thinking about all the ways the code might go wrong is a distracting irritation. For times like these, there’s panic.
Panic is a construction used in Go and Rust to handle errors. It’s a powerful tool that can be a great help while debugging. When a panic is triggered, the program will crash immediately. Rather than print a helpful error message, the program will instead halt and catch fire.
The easiest way to spot a panic is simply by looking for the word “panic”.
Go panics look like this:
go
panic(“Error!”)
However, there are more subtle cases where panics can be triggered by developers.
Some Go methods will issue a panic in certain cases. For example, a logging function log.Panic
will result in a panic. Other functions will contain the word Must
which will panic if they run into an error case. This is a common convention in many Go
standard packages as well as in the Cosmos SDK.
Rust panics look like this:
panic!(“Error!”);
In Rust, there are many other macros in addition to panic!
which will also result in a panic
. These include:
unreachable!
unimplemented!
assert_eq!
debug_assert_eq!
debug_assert_ne!
todo!
Each of these macros can signal something different in terms of error-handling. For example, using todo!
or unimplemented!
communicates that a feature simply does not exist yet rather than indicating an error condition. However, while these macros mean something different to a human reading them, under the hood they will all issue a panic.
Rust also includes methods for error-handling including .unwrap()
and .expect()
. These methods are used when calling a function.
In Rust, functions can return Result
types. A Result “wraps” other types. When everything is going well, a Result will contain the expected return value of the function, which will have a type like str
for a string. On the other hand, when an error is triggered, the Result will wrap an error type, Err
. When the calling code receives the value, it can “unwrap” the Result type to get the value the function returned, either something like str
or an error Err
. A Rust developer will write different conditional statements to handle all the expected types or variants of Err
that a function can return.
The .unwrap
method is a shortcut that a developer can take instead of writing all the conditional statements. It will unwrap the Result and panic if it gets an Err
type. The .expect()
method behaves the same way and also allows the developer to print a custom message when the code panics.
When prototyping code, using panic as a placeholder for error-handling can save time.
It can be distracting and time-consuming to create robust error-handling for every single edge-case when you’re just getting started with a new project.
A panic will return a stack trace containing a lot of information about the error involved, including a stack trace and locations of relevant code on the file system. This is helpful for tracking down errors.
Panics also occur accidentally, like when a process runs out of memory, overflows the stack, performs a division using 0, and so on. Depending on the context, these kinds of issues can be considered to be unrecoverable errors. In other words, something has gone terribly wrong. In this case, the best course of action is to stop execution, and a panic is an appropriate way to do so.
The power of panic also results in risks. For this reason, it should not be used in software in a production environment.
Stack traces can aid an attacker in crafting an exploit. They also provide information about the underlying file system that can be used as part of sophisticated attacks that chain multiple small pieces of information together into a dangerous hack. If a panic can be predictably triggered in a piece of software, this on its own can be enough to perform a denial-of-service (DoS) attack.
This is true for all software. However, the problem with using panics in production can be much than an information leak or temporary reduction in service availability due to DoS.
In blockchain software in particular, the use of panic can have devastating consequences. This is because a crashing program may affect not only the availability of a service, but also the integrity of the state. In other words, a panic may cost you, literally.
This is a concern for any projects using Go and Rust in their software stack. This includes projects that use either of these languages for development of applications or chains.
For example, Cosmos applications are written in Go and Solana programs are written in Rust. It can also affect other chains that use Go or Rust for their node software, such as go-ethereum or Lighthouse.
To better understand the impacts of panicking code in the context of blockchains, let’s explore an example in Cosmos.
In the Cosmos ecosystem, it is possible to create code that runs automatically on a per-block basis. This is done via the BeginBlocker and EndBlocker methods within a given module. As the documentation notes, it is possible to cause major slowdowns or even a chain halt if these functions aren’t used carefully.
This can open up a path for an attacker. If user-controlled data can trigger a panic and that panic occurs in the context of code that handles consensus, the attacker could cause major issues. The validators will try to run code that panics, and from there will be unable to coordinate the state of the chain. This could halt the entire chain!
More subtle side effects could occur as well. An attacker might be able to crash some validators, but not others. If there are a reduced number of validators active, this makes it easier to, for example, create a malicious governance proposal and have it pass. Reducing the number of validators on the network means that it’s relatively easy to win a voting process that has negative effects for the chain.
This could lead to changes in a governance vote and allow a malicious subset of validators to take over the network!
As security experts, naturally our minds go to scenarios where there is a malicious actor involved. But problems can arise even without an attacker involved. If the code can reach a state where it panics ‘on its own’ – that is, without the panic being caused by user-controlled data – an issue like a chain halt could still occur.
In the example above, we can use Cosmos custom errors instead of panics.
The Cosmos SDK also allows you to configure ‘invariants’ which are statements that should always be true within a protocol. As an example, a user’s supply of a Token should never be greater than the total supply of that token. Rather than use a panic, invariants can be defined and handled gracefully.
Defining invariants allows you to make use of simulations, a Cosmos feature that allows for built-in fuzz testing. You can run these simulations during development and catch edge-cases before you ship.
In general, regardless of what language or ecosystem you use, always make sure that error conditions are checked and properly handled.
If using a panic is truly necessary in the case of an unrecoverable error, think carefully about how this may affect consensus operations. Evaluate whether an attacker could trigger a panic predictably and what side effects this may have.
Paying close attention to the potential error-states of the program and handling them carefully is an essential practice. Error-handling is not the most exciting task as a developer; on the other hand, a lack of robust error-handling may make for a very exciting day for an attacker. It’s always worth the time and effort to think carefully about how code might behave given unusual inputs, and to handle those scenarios sooner rather than later.
If your company is having trouble debugging, testing or overall just trying to make your protocol more secure and want professional security advice, connect with our Web3 security experts at halborn@protonmail.com.