John Saigle
December 19th, 2022
Are you a Cosmos developer wondering how to secure your project? At Halborn we look at a lot of Cosmos projects to make sure that the protocol works as expected, has robust errors detection, and safely handles user funds.
We wanted to share some of our experience with performing code review on a wide range of Cosmos projects and distill our knowledge into some helpful tips that will assist you during the development process.
Below are the top five most common vulnerabilities and issues that we look for when checking out a new Cosmos project.
When developing a project, typically a developer wants to spend their time developing new features and error handling comes second. In the heat of the moment when developing a hot new feature, it’s tempting to skip error handling, at least temporarily, in order to focus on completing the feature. A developer in a hurry might use the panic function in Go to tell the code to abort execution when handling an error case. In some cases this is appropriate: if there is truly an unrecoverable error, then it may be correct to stop execution entirely.
However, this can be problematic for a number of reasons. Panic does not allow the program to handle errors gracefully. In many cases it’s better to handle the error, log it, and print a helpful error message for the user or the validator software indicating what went wrong. Ideally, execution should continue where possible to ensure a smooth user experience.
In general panic can cause security problems in Go code because it indicates that a serious problem has been encountered. From an attacker‘s perspective, a panicking program can indicate a potential attack vector. When developing exploits, an attacker’s methodology will begin with discovering a crash and escalating this into a full exploit.
In the context of Blockchain software, a panic can have a very pronounced impact. For more information on this topic we invite you to check out another article we wrote, ‘Don’t “Panic”: How Improper Error-Handling Can Lead to Blockchain Hacks’, that goes into detail about why you should avoid using panic.
We recommend using proper error handling instead of panic. Panic should be used only in exceptional situations where there’s absolutely no way for code execution to continue safely.
A classical piece of security wisdom is that all software should be kept up-to-date. The reason for this is that outdated or unsupported software will not include the latest performance improvements or security patches compared to newer versions.
In the context of Cosmos, these components include:
As software ages, it typically becomes vulnerable as hackers and security researchers examine it for flaws. New techniques will be developed that render software vulnerable to attack even if it was believed to be secure when it was released.
This is especially important in a Blockchain context because new attack vectors are being developed all the time. These can range from relatively simple bugs to critical issues that can have serious effects on the health of a chain. A timely example includes a vulnerability discovered in the CosmosSDK software following the Binance hack. The Cosmos developers issued an urgent security notice informing all validators to update their software in light of this critical issue.
There are a variety of tools that can be used to monitor your project’s dependencies for outdated components, and we recommend using them as well as enforcing their use within a CI/CD environment.
It is also suggested to monitor the repository of CosmosSDK and other popular sub-components for the latest software releases and bug fixes. Joining a project’s social media communities and paying attention to their published announcements is also an effective way to stay informed of any potential issues and fixes.
It is imperative that Blockchain software has a deterministic state. This is because all validators must be able to perform the same calculations and reach the same output so that they can agree on the state of the network. When consensus fails, the network will halt as there is no safe way to coordinate state among the validators.
A common example of non-determinism in Cosmos projects occurs when iterating over a ‘map’ which is analogous to a dictionary or hashmap in other programming languages. Map orderings are not deterministic. This means that if a loop is used to list the elements of a map, the loop could traverse the elements of the map in a different order for every execution even when the data in the map has not changed. If this happens in a consensus method or storage, such as BeginBlocker and EndBlocker, the validators may not be able to agree on state and the chain could halt.
Earlier this year, a bug in Cosmos allowed an attacker to halt the Juno network. The attacker was able to submit malicious transactions via Cosmwasm contracts that caused non-determinism in the network.
To understand this a little better, let’s look at an example. Consider the following code:
“
go for lpAddress, rewards := range liquidityProvidersMap {
coinToTransfer := sdk.NewCoin(“HAL”, sdk.NewIntFromBigInt(rewards.BigInt()))
err := k.bankKeeper.SendCoinsFromModuleToAccount(ctx, “RewardModule”, lpAddress, sdk.NewCoins(coinToTransfer))
if err != nil {
// Do error handling here…
}
}
“
Here we want to distribute some rewards from our module “RewardModule” to a group of liquidity providers. In order to do so, we iterate over a map that pairs liquidity provider addresses with the rewards they are owed and pay them one by one.
A problem can arise here if our
RewardModule
has an insufficient balance and cannot pay all of the LPs. Let’s say we have three LPs and each is owed 100 units of the coin “HAL” but the module only has a balance of 200 HAL. Our loop will pay two of the LPs 100 HAL each and then trigger an error when trying to pay the third LP because the module does not hold enough HAL to be able to complete the transfer.This error could escalate into a chain halt because of non-determinism. Since the map
liquidityProvidersMap
is not ordered, different validators may traverse it in a different order. The possible orderings for the LPs are as follows:
As the validators run the code, they will pay the rewards to different LPs and reach an error when trying to pay the third one in their given ordering. There is a high chance that they will pay rewards to different LPs and so when it comes time to establish a consensus about the state of the chain, they will report different results. When this happens, the chain will halt.
Non-determinism can occur in other ways as well. Another common example is using the
time.Now()
function in order to grab the system time from the operating system. Because all of the validators will execute this function at slightly different times, this function will give different results for each validator. This too can lead to consensus issues if some logic in a consensus function relies on the result of the time function. In this scenario, it’s recommended to use the block time of the chain to represent time rather than the system clock.Many of the issues we’ve discussed so far arise from the program reaching a bad error state that causes a problem in the app chain operation. In contrast, logic issues arise when a program executes ‘correctly’ and yet behaves in a way that is against the intentions of the developers.
When logic issues come up, there will be no error message printed to developers or users informing them that a problem has occurred. Why? Simply because, to the computer, it looks like everything is fine. It is only when we humans interpret the result that we can identify whether an issue has occurred.
We can explore logic errors by considering the following scenario: what happens if an attacker is able to create a liquidity pool in a DeFi protocol that uses a spoofed token?
Say this protocol has its own token ‘SwapCoin’ and it allows liquidity providers to create pools that pair SwapCoin with other tokens. In order to enforce this, a developer might write a function to ensure that, when a liquidity provider creates a new pool by submitting an amount of some pair of tokens, that one of these tokens has the name ‘SwapCoin’.
That function might look like this:
“
gofunc (h CreatePoolHandler) handleCreatePool(ctx cosmos.Context, msg CreatePool) error {
assetX, assetY, err := h.mgr.Keeper().ParseCreatePoolMsg(msg)
// Check that either assetX or assetY is SwapCoin
if !strings.EqualFold(“SwapCoin”, assetX.Denom) &&
!strings.EqualFold(“SwapCoin”, assetY.Denom)
{
// return error
}
“
At first this code may look safe, but there is actually a logic issue here. The EqualFold function in the strings package performs a case-insensitive comparison. This means that it will accept values that resemble ‘SwapCoin’ with different capitalizationg, e.g. ‘sWapCoin’, ‘swapcoin’ and so on.
This behavior allows an attacker to attack the protocol. This attacker could create a token called ‘sWapCoin’ and use it in place of the native token. This has the effect of allowing them to forge tokens and perform any actions within the project’s liquidity pools as though they really had authentic SwapCoins.
To profit from this, an attacker could drain value from an existing pool that pairs SwapCoin with Atom. They could use ‘sWapCoin’ to swap for Atom and then walk away from the project. In the end, the attacker walks away with the funds while simultaneously undermining the economics of the project as they have effectively inflated the supply of SwapCoins, at least for the purposes of trading between liquidity pools.
This is just one example. Logic errors by definition are hard to generalize about because there are many ways in which they manifest. Thus, it is also difficult to provide a general recommendation for fixing logic errors because they are highly dependent on what the developer wishes to occur.
The best way to handle these sorts of problems is to write extensive unit tests. Unit tests can encode a developer’s intentions by supplying example values and ensuring that the code does the right thing. It can be helpful to compare software implementations to design documents to ensure that the code does what it’s supposed to.
We advise writing “adversarial tests” in addition to ordinary unit tests. This means providing strings where numbers are expected (and vice-versa), using signed integers in place of unsigned integers, and so on. The developer should also try especially high or low values, switch around character sets for strings (from ASCII to UTF-8), etc. The sky’s the limit! The more imaginative these tests are, the more likely it is that a developer will discover a logic error before an attacker does.
In the long term, developers should also implement fuzz testing. Fuzzing is a technique for testing software that involves automatically generating mangled data and providing it as inputs to various functions within the code base. This has the effect of unearthing logic errors and strange crashes during the development process.
Cosmos provides simulations which embed a form of fuzzing within your project without the need for external tools. Simulations work by testing against invariants, i.e., rules in your program that should never be broken. An example of an invariant is ensuring that a user’s supply of a given token does not exceed the total supply of that token throughout the whole ecosystem. Clearly, this should never happen in well-functioning code.
Extensive use of simulations and invariants will go a long way to writing more secure Cosmos software.
Time for a bit of a math lesson before we return to security.
Integers in computers can only increase to a certain limit because they are represented by a fixed number of bits in a computer. If you are using an integer with 8 bits for example, the maximum value for that number is 255 which corresponds to all of the bits being set to 1. This is because all we can do is change bits from 0 to 1; if they’re all set to 1, there’s no way to increase the number.
Now, what happens when we add two numbers together? Say we try to add 200 to 200. In the normal world, we know this is 400. But when working with integers with a size of 8 bits, we can’t store values higher than 255. So what happens?
This scenario is called an overflow. Like an odometer in a car, the number wraps around once we reach the maximum. We add numbers to 200 until we reach 255, then start over at 0. The result is the remainder of x + y / int_max_size. In this case, 200 + 200 / 255 gives us a remainder of 145.
To sum up, when adding two numbers that are close to the maximum size that an integer can hold, we can sometimes end up with a much lower number than we expected. The same is true for subtraction, but with the opposite result: subtracting one number from another will give us a very high result rather than a negative result. This is called underflow.
The above is true for unsigned integers. We can represent negative numbers by interpreting a set of bits as a signed integer. When we do this, a single bit is used to represent whether the number is positive or negative. Since this bit is now representing some information about the number rather than the number itself, the range of what numbers we can represent changes.
In the example where we are using 8 bits for our numbers, the range for an unsigned integer is 0 to 255, and the range of the negative number is -127 to 127. In both cases we can represent a total of 256 different values. Check out this Wikipedia link for more information on how numbers are represented at the bit-level.
An important consequence here is that a developer can specify whether a number should be treated as signed or unsigned. For example, if a value of 128 in an unsigned integer is treated as a signed integer, then its value becomes -1!
If we put all of this together, we can see that operations like addition and subtraction are much more complicated than you might think. We also must be careful when telling the program whether we want a value to be considered signed or unsigned. In both cases we may end up with a very large number when we expected a negative number, and vice versa.
Let’s think about a scenario where a user can only perform an action in the protocol as long as they have a sufficient amount of tokens. If they have a low balance and it is then decreased for some reason, it is possible that the value could underflow. Then if it is treated as an unsigned integer, the code could interpret them as having a very high balance. This could have huge consequences for the integrity of the program.
Complicating things further, Go will not warn you if an overflow or underflow occurs. It also won’t be able to suggest to you whether you should treat a number as signed or not. CosmosSDK does provide helper functions that will panic rather than allow numbers to overflow or underflow. However, a developer has to use them in order to benefit from this. As we’ve seen, a panic could also cause other problems.
When writing code for Cosmos project, it’s crucial to be aware of whether you want to work with an unsigned or signed integer and to check for overflows and underflows. In general it’s a good idea to avoid frequently switching between unsigned and signed integers so that it’s easier to reason about the state of the program.
These are just some of the complications that arise in Go and Cosmos code. We hope this survey of some of the most common, high-impact considerations has helped you to advance in your journey to write effective and secure code.
We’re happy to help you review your source code for these types of errors and more. To talk to one of our blockchain experts about Cosmos source code reviews or any other Web3 security topics, drop us a line at halborn@protonmail.com.