Rob Behnke
March 7th, 2023
In this article, we’ll provide an overview of blockchain hash functions, their properties and types, their uses, and application in different blockchains. By the end of the article, you will have a clear understanding of the working principles of the hash functions, their attack vectors, and their importance to blockchain technology.
Let’s dive right in!
In the most basic sense, a hash function is a function that accepts any size input and creates a fixed-size digest which is “random” looking. The working mechanism is simple: you have a certain data, then you run it through a hash function, and then get an output digest. This fixed-sized digest can be 32-bit, 64-bit, 128-bit, or 256-bit depending on the hash function adopted.
Hash functions are cryptographic algorithms that are extensively used in blockchain consensus mechanisms. You have probably heard of or used any of the popular hash functions like MD5, SHA-1, SHA-256, BLAKE2, etc. In the next section, we will take a look at the underlying properties of these algorithms and how they work.
Hashing consists of several properties, including:
The Avalanche Effect: This connotes that changing 1 bit of the input results in a change in about half of the output bits.
Preimage Resistance: This connotes that given a value y, you can’t find any x such that hash(x)=y. This is why a hash (short for hash functions) is called a unidirectional function, which is a very important property in hashing algorithms.
Collision Resistance: This connotes that there is no x, z pair such that x!=z but hash(x)==hash(z). This means that it must be computationally intractable to have two different input messages x and z such that the hash of x is the same as the hash of z.
Collision resistance has been broken for popular algorithms like SHA-1 and the MD5, but the Pre-image Resistance property still remains intact. Another important property is the speed required to produce the digest message output.
In the most fundamental way, the hash function f(M) of the arbitrarily long message [M] is a function that receives [M], runs a hash on it which compresses [M] to a fixed length, and obtains an output (message digest or hash value) of a determined fixed length [M`].
Let’s see how to implement SHA-256 in Rust, to show the properties of a normal hash function.
In a Rust file, paste the following code and run it:
use sha256{digest, try_digest};
fn main() {
let message = "hello";
let hash_value = digest(message);
println!("{}", hash_value);
}
This returns a fixed-size string:
70de66401b1399d79b843521ee726dcec1e9a8cb5708ec1520f1f3bb4b1dd984
To test the avalanche effect, let’s change the message to “helo” instead. The output will be:
f4e454f802b88d2f64168ff1742e8cf413fd677d38b87cbefb45821f8981b912
You can also confirm that the length of the first hash digest is equal to the length of the second hash digest, and it takes a fraction of a second to compute this.
Popular hash functions are designed using a method called the iterated hash function method. This method involves dividing the data to be hashed into equal block sizes, then processing them iteratively with compression functions. Examples of these methods include:
Merkle-Damgard construction
Sponge construction
The steps involved in the design of this hash functions includes:
Pad the data to convert it into 512 bits — the required block size — if it is not up to.
Parse the data into blocks to divide the data into equal blocks of 512 bits.
Set the initial hash value, which ensures there are no backdoors in the algorithm.
The blocks are processed sequentially, for up to 64 dissimilar rounds, after which the hash digest is complete.
A message schedule is prepared, and then eight working variables are initialized.
The hash value is calculated for all blocks and the respective outputs for each are concatenated to produce the hash digest.
There are different kinds of attacks possible to hash functions, and these attacks are responsible for the discontinuation in adoption of some “now insecure” hash functions. Let’s briefly take a look at the various kinds of hash function attacks.
This attack tries to break the Collision Resistance property of hash functions by finding two different messages or inputs (M1 and M2) such that hash(M1)==hash(M2).
Similarly, this attack is aimed at the Pre-image Resistance property of hash functions, where attackers try to recover the message from the output hash value.
This attack involves using a message M1 to find another entirely different message M2 (i.e. M1 ≠ M2) such that hash(M1)=hash(M2).
All the attacks we have seen before now are examples of Brute Force attacks. Now, we are going to explore an example of a Cryptanalytic Attack.
The Length Extension Attack is also called the Padding Attack. It works by exploiting the weakness of the Merkle-Damgard construction. Given that digest=hash(M1), it is possible to calculate M1 and the hash function, such that digest=hash(M1||M2) (even for unknown M1 but with known length of M1). The Length Extension Attack works by making use of hash(M1) as an internal hash for calculating hash(M1||M2).
Now that we have gained a basic understanding of hash functions, let’s see what hash functions are typically used for:
Building blockchains — a blockchain fundamentally a chain of hashes. Furthermore, hash functions are key to preserving integrity and immutability on the blockchain.
Creating digital signatures, and Message Authentication Codes (MACs).
TLS/SSL certificate signature.
Naming files (when immutability is important)
Password protection in online website databases
Verifying integrity
Public key encryption
As references or pointers in algorithms
As commitments like zero-knowledge proofs
For creating Merkle Trees — a binary tree of hashes, where you take data, and hash it again and again. It’s a useful data structure.
Identifying similar files and files that have been modified, in cloud storage systems
Network based intrusion detection systems
Identifying files in a Git repository, etc.
There are a lot of applications for hash functions that are probably not dreamt of today, so this list can get inexhaustible. However, let’s look at the types of hash functions and their implementations.
We have seen the properties of hash functions; these properties are very important in their adoption in blockchain engineering. Moreover, the following are reasons why hash functions are important in blockchain:
Since transaction time must be kept minimal, hash functions used on the blockchain are computationally efficient in nature. In other words, miners must be able to finish the computational task tied to their “work” in a relatively short time.
Hash functions used in blockchain engineering are deterministic. They will generate the same hash value or digest message when the same input is fed into them. This fosters integrity on the blockchain.
Blockchains require one-way encryption functions, and hash functions solve this need. Hash functions might look like encryption algorithms, but the big difference is that they cannot be decrypted to see the initial input.
Now that we’ve seen the importance of hash functions in blockchain, let’s look at how they are applied in blockchain. Just as we can use hash functions to hash a password, a file, a text message, etc., we can use it to hash an entire block on a blockchain.
Hashing a block creates a snapshot of the entire blockchain state from the moment the block was created. When this happens, the nodes on the network refer to the block hash to confirm that they have the most recent copy of the blockchain.
The Proof-of-Work (PoW) algorithm powers the Bitcoin blockchain. It is used to decentralize the block creation process, ensuring that no one party has complete control over the history of the blockchain.
Ideally, every 10 minutes, a miner confirms a newly created block of transaction data. A block is considered valid if it::
Contains only valid transactions
Has a header hash that is lower than the current network target.
The block hash on Bitcoin is generated by using SHA-256 hash twice on the Block_Header property:
SHA256(SHA256( Block_Header ))
This has the following data fields:
Version: the block version number
hashPrevBlock which is a 256-bit has of the Previous header
hashMerkleRoot which is a 256-bit hash that is the root of the Merkle Tree summarizing the blockchain’s transactions
Nonce which is a 32-bit number that starts at 0
Time or the block’s timestamp in seconds
Bits of the current target.
When a new block is created, it links back to the previous block cryptographically using a hash pointer. The previous block hash contains the hash of the previous block header. The transaction on the previous block is confirmed a second time on the new block, while the new block receives its first confirmation.
The code implementation of these fields look like this:
pub struct Block {
pub id: u64,
pub hash: String,
pub previous_hash: String,
pub timestamp: i64,
pub data: String,
pub nonce: u64,
}
A block header is only valid if it has a hash less than the current target. A blockchain miner can modify the nonce value which, due to the Avalanche Effect, has a dramatic impact on the hash of the header. By trying different nonce values, a miner may eventually find one that creates a hash less than the target. When this happens, the block is broadcast to the rest of the blockchain network, and, after validating that the block is correct, miners should start working on building the next block in the chain.
The target is a number that changes at regular intervals (every 2016 block or about 2 weeks) and must be higher than a valid block hash. This target value is based on the hashrate of the blockchain over the previous interval and is designed so that a new block is created every ten minutes on average.
Typically, the higher the hash power — the processing power — of the miner’s computer, the easier it is to discover the new block.
There are several types of hash functions or algorithms, and we will look at some of them in this article, including how secure they are in blockchain applications. Let’s take a look at these blockchain based hash functions.
The Secure Hash Algorithm (SHA) family of hash functions are built by the National Security Agency (NSA). It has three main categories: the SHA-1, SHA-2, and SHA-3. The SHA-1 family is labeled insecure by internet browsers, hence only the SHA-2 functions are in use. Under the SHA-2 category, there are the SHA-224, SHA-256, SHA-384, and SHA-512 functions, the most popular of which are the SHA-256 and SHA-512.
The SHA-256 in particular is the hash function used in Bitcoin’s Proof-of-Work algorithm in cryptocurrency mining.
How it works on Bitcoin
SHA-256 works on Bitcoin in three ways:
It is used in the Proof-of-Work consensus algorithm, where the miners on the network calculate the hash of new blocks.
Every new block in the Bitcoin blockchain contains a SHA-256 hash of the previous block.
It is used for the creation of addresses/digital signatures, where the message is hashed with SHA-256 and encrypted with their private key to get the digital signature.
Ethash was used in the Ethereum Proof-of-Work consensus algorithm built using the SHA-3/Keccak hash function for mining Ethereum and other ETH-based cryptocurrencies.
Ethash is no longer in use, as Ethereum has moved to the Proof-of-Stake protocol adopting the LMD GHOST algorithm, but the Keccak hash function is worth taking a look into. The Keccak is completely different from the SHA-1 and SHA-2 families in several ways:
It is resistant to length extension attacks
It has a completely different construction method than the others. It uses a sponge and squeeze construction approach.
You can learn about the strong points of the Keccak algorithm here.
The Skrypt hash function is also similar to the ones above. It is used in the Proof-of-Work mining algorithm, which is adopted by Litecoin, Hashcash, and Dogecoin. It is very fast and lightweight, yet it offers a high level of security that is adjustable. It is very resistant to brute force attacks like those mentioned above.