Pay attention to the Ethereum hash collision problem from the "Stealing coins" incident
Author : Kai Song(exp-sky) , hearmen , salt , sekaiwu of Tencent Security Xuanwu Lab
“Stealing coins”
On November 6th, we observed that such a contract appeared on Ethereum. After investigation, it was found that a blockchain security vendor issued a contract to let everyone “Stealing coins”.
1 | pragma solidity ^0.4.21; |
After observing, we found the security issue of an EVM storage we studied earlier in this contract, namely the hash collision problem in EVM storage.
First, for the above contract, if x == uint256(keccak256(block.blockhash(map[uint256(msg.sender)+x))))))))
, the ether in the contract can be obtained in the lottery
method. But the value of x
can only be obtained through constant guessing, and the probability is minimal.
Then, we found that in the fallback
function of the contract, there is also a selfdestruct
function that can help us complete the “Stealing coins” task, but requires the balance of this contract address in the token contract to be 0.
Based on our previous analysis of EVM storage, we found that there is an assignment to the arbitrary offset of the map type data in the guess
function map[uint256(msg.sender)+x] = blockNum;
.Because in EVM, the data storage in the map
type which’s address is calculated as address(map_data) = sha(key,slot)+offset
, which causes an arbitrary address to be written. If we can overwrite the token
variable, we can write the contract we constructed to the token
, guaranteeing The DVPgame contract has a balance of 0 in our construction contract, so that the selfdestruct
function of the DVPgame contract can be executed to complete the “Stealing coins”.
The address of the token variable is 0. This value can be reached after overflow. That is, we need to construct sha(msg.sender, slot)+x==2**256 (after overflow it will be 0)
.
In-depth analysis
In fact, as early as the end of June, after preliminary research on ETH and its runtime environment EVM, we have found some problems at the contract level and the virtual machine level respectively. Variable coverage and Hash collision problems are very typical two examples.
Variable coverage
In some contracts, we have found that modifying temporary variables of type struct inside a function will overwrite existing global variables in some cases.
1 | pragma solidity ^0.4.23; |
The source code of the contract is as shown above. Under normal circumstances, since the contract does not provide an interface to modify the unlocked
, it is unlikely to modify it. But in fact, we found in the test that you can modify unlocked
by calling the register
method of the contract.
Hash collision
After analyzing the storage structure of EVM, we found that in the design of EVM, potential hash collisions may occur when storing some complex variables, overwriting existing variables, and generating unpredictable problems.
1 | pragma solidity ^0.4.23; |
The above code has a similar hash collision problem. Looking at the contract source code, you can see that the balances field can only be accessed through the Reslove
interface. Under normal circumstances, the value stored in balance
cannot be modified. But in this contract, calling the function Rewrite
may overwrite the data in balances when operating on stateVar
.
background analysis
There are three ways to store in EVM, namely memory, storage, and stack.
- memory: life cycle is only during the execution of the entire method, the function is recycled after the call, because only the temporary variables are saved, so the GAS overhead is very small
- storage : permanently stored in the blockchain, GAS overhead is also the largest because the contract state variables are permanently saved.
- stack : store some local value type variables, almost free memory, but there are a limit
First, let’s analyze the storage and access of various object structures in EVM.
Map
First analyze the storage of map
1 | struct NameRecord { |
When we debug the map
structure in storage, we find that the storage address of the data in the map
is actually the hash value of map.key and the map_slot where the map is located. This value is a uint256. which is
1 | address(map_data) = sha(key,slot) |
And we also found that if the data stored in the map is a structure, the members in the structure are sequentially stored in the storage, and the storage location is sha(key, slot) + offset
.That is, the member is directly the offset in the structure is added to the previously calculated hash value as the storage location.
This hash + offset
struct storage method directly causes the hash of the sha3 algorithm to lose its meaning. In some cases, it produces sha(key1,slot) + offset == sha(key2,slot)
, which is a hash collision.
Array
Next, let’s take a look at the Array.
A fixed-length Array of global variables found in debugging is arranged in storage in the order of index.
If we use the new keyword to request a variable-length array, look at its runtime storage
1 | function GetSome() returns(uint){ |
Debugging found that if it is a variable length array, the storage location of the array member is selected according to the hash value, and the storage location of the array is sha3(address(array_object))+index
. The slot
in the array itself is only the length of the array, so it is well understood why variable-length arrays stored in storage can be incremented by adjusting the length
property.
Variable-length arrays are still stored in the same way as hash + offset
. There is also the possibility of a hash collision.
Array + Struct
If the array
and struct
are combined, how will the index of the data in storage be determined?
1 | struct Person { |
The object of type Person p
which’s first member is a dynamic array addr
. When storing p
object, first store the dynamic array in the map
:
1 | storage[hash(msg_sender,people_slot)] = storage[p+slot] |
Then store the dynamic array contents in turn:
1 | storage[hash(hash(msg_sender,people_slot))] = storage[hash(p_slot)]; storage[hash(hash(msg_sender,people_slot))+1] = storage[hash(p_slot)+1]; |
Finally store funds:
1 | storage[hash(msg_sender,people_slot)+1] |
Similarly, the struct
storage in the array is similar.
problem analysis
Variable coverage
1 | pragma solidity ^0.4.23; |
The unlocked
variable in this contract is stored in an offset of 1 in storage. In the debugging, it is found that the index position of the newRecord
object in the storage part is also 0, which overlaps with the global unlocked
, so when accessing newRecord
, it will be modified to unlocked
.
In debugging, we found that all temporary variables are stored from the 0 position of storage. If we set a few temporary variables, we will find that all temporary variables have a slot value of 0 when the function starts to select the slot.
Causes
We download the source code of the solidity compiler to view the reasons for the problem here. The source code can be found here, compile the source code directly using cmake, compile tutorial. The source code for solidity needs to reference the boost
library. If it is not installed before, you need to install boost
first. The compilation process will not be described again, and will eventually generate three executable files (the compilation on Windows will be a bit problematic, the dependent header files can’t be added automatically to the project, you need to add them manually, and there will be some characters to express the problem)
solc\solc
lllc\lllc
test\soltest
Solc
can compile sol source code into bytecode that EVM can run.
Debug Solc to see the compilation of the struct
as a temporary variable
1 | contract Project |
The key code call stack is as follows
1 | > solc.exe!dev::solidity::ContractCompiler::appendStackVariableInitialisation(const dev::solidity::VariableDeclaration & _variable) Line 951 C++ |
The key function is appendStackVariableInitialisation
. You can see that the call to pushZeroValue
records temporary variable information. If the function finds that value exists in Storage, then directly press PUSH 0
and press 0 directly. All temporary variables pass this path. In other words, all temporary variables are 0.
1 | void ContractCompiler::appendStackVariableInitialisation(VariableDeclaration const& _variable) |
I still can’t understand the reason for this design. The guess may be because of the sparse array relationship of storage itself. It is not convenient to control the slot position by other extra variables. However, with the current implementation, the problem should be more.
Compiled with the global variables, the function call stack is as follows
1 | > solc.exe!dev::solidity::ContractCompiler::initializeStateVariables(const dev::solidity::ContractDefinition & _contract) Line 403 C++ |
The key function is StorageItem::StorageItem
, the function gets the slot of the global variable in storage from storageLocationOfVariable
1 | StorageItem::StorageItem(CompilerContext& _compilerContext, VariableDeclaration const& _declaration): |
Hash collision
As mentioned earlier, smart contracts using struct
and array
have the potential for a hash collision.
In general, the hash returned by the sha3 method will not collide, but there is no guarantee that hash(mem1)+n
will not conflict with other hash(mem2)
. For example, there are two map
.
1 | struct Account{ |
Calculate sha3(key1,0) = hash1; Storage[hash1] = value1
when storing balances[key1] = value1
.
Calculate sha3(key2,1) = hash2;
when storing userTable[key2] = account
.
Hash1 and hash2 are not the same, but hash1 and hash2 are likely to be adjacent, with a small difference, we assume that they differ by 4 .
At this time, when the account
is actually stored, Account.name, Account.ID, Account.amount, Account.priceLimit
, and Account.total
are stored in storage in the hash2, hash2+1, hash2+2, hash2+3, and hash2+4
position. And hash2+4
is exactly equal to hash1
, then the value of Account.total
will overwrite the content value1
previously stored in balances.
However, it is only theoretically possible to attack by struct. It is very difficult to find sha3 with a small difference in practice. But if you turn the problem into array
, it’s possible to achieve a real attack.
Because in array
, the length of the array is controlled by the data stored in the first byte of the array object, as long as the value is large enough, the attacker can override the hash data of any gap.
1 | pragma solidity ^0.4.23; |
The current sender address is 0x14723a09acff6d2a60dcdf7aa4aff308fddc160c
, and the balance[msg.sender]
location is 0x51fb309f06bafadda6dd60adbce5b127369a3463545911e6444ab4017280494d
. Call the Resize
method to modify the length of the array stateVar
. The storage location of the array is 0x405787fa12a823e0f2b7631cc41b3ba8828b3321ca811111fa75cd3aa3bb5ace
.
Finally call the contract method Rewrite
assigns an value to the array, which overrides the contents of the balance
and overwrites the value of the sender
.
Actual memory
Finally, let’s take a look at the actual memory management. Regardless of how high-level technology of the Ethereum blockchain is, the memory needs to be settled. Ultimately, the data needs to be stored in the actual physical memory. So we actually analyze the storage of the storage part through the source code. The source code for EVM is at https://github.com/ethereum/cpp-ethereum
Process analysis
The return value of EVM is passed through EVM. Generally, the return value address is stored at the memory offset 0x40. This address holds the true return value.
Storage at the bottom of the implementation is an STL implementation sparse array, the slot value as a key to store the value
Maps and variable-length Arrays in Storage are all based on the hash value as the index of the lowest-level sparse array. The variable length array is indexed as
hash(array_slot) + index
and the Map index ishash(map_slot, key)
. When Value isStruct
, theStruct
members are stored separately. The index of each member ishash(map_slot, key) +offset
Code analysis
Storage
The storage part of the memory is the memory stored in the block together with the contract code, so the storage memory consumes a relatively large amount of gas back. We use the SLOAD
instruction to check how Storage is stored on the block.
The SLOAD
instruction is processed in the function interpretCases
. When the EVM resolves to the SLOAD
instruction, it first gets the top element of the stack as the key of the storage access from the stack, and then calls the function getStorage
for the actual access.
1 | case SLOAD: |
The getStorage
function receives four parameters, the first parameter is the return address, the second parameter is the context of the current call, the third parameter is the destination address of the transaction information, that is, the contract address, and the fourth parameter is the index of the storage,Key.
The function first verifies the address, ensuring that the current context is in the space of the contract address, and then calling env.store
to actually get the data.
1 | void getStorage( |
The final work comes to State::storage
1 | u256 State::storage(Address const& _id, u256 const& _key) const |
The function first gets the corresponding Account
object based on address
1 | Account* State::account(Address const& _addr) |
The following comment is a description of the partial Account
object, the Account
object is used to represent the state of an Ethernet account, and the Account
object and addr
are stored in the State object through the Map
. Each Account
account contains a storage trie
to index its nodes in the entire StateDB
. Account’s operations on storage are first performed on the map of storageOverlay
, and the data is updated to trie
when needed later.
1 | /** |
Go back to the State::storage
function. After getting the Account
, check whether the value of the specified key is in the storageOverlay
of the Account
. If not, go to the DB
to find it. Take Account->m_storageRoot
as the root and get a db from State->m_db
. Look in this tire
‘s copy and format it after it is in m_storageOverlay
.
It can be seen that before the actual data is synchronized to the block, EVM provides a level 2 cache mechanism for both storage
and account
to improve the efficiency of the memory:
- storage: Level 1 cache ->
account->m_storageOverlay
; Level 2 cache ->state -> m_db
- account: level 1 cache ->
state -> m_cache
; level 2 cache ->state -> m_state
Similarly, we start from the storage storage entry point SSTORE
, the main function is VM::interpretCases
, SSTORE
opcode will eventually access a hash table of type unordered_map
1 | void VM::interpretCases(){ |
memory
Still starting with MSTORE
, see the processing of memory in EVM
1 | CASE(MSTORE) |
It can be seen that memory is only valid in the current running environment, and is not stored in any position related to state, so memory is only valid in the current running environment, that is, Memory only takes effect in one transaction.
code
Code is similar to storage and is also related to Account, so code is also stored in the corresponding structure of Account
, the first level cache is account->m_codeCache
; the second level cache is stored in state->m_db[codehash]
1 | void State::setCode(Address const& _address, bytes&& _code) |
summary
Although the problem of hash collisions appears in a CTF-like “Stealing coins” competition, we should also pay attention to variable coverage and hash collisions due to EVM storage design issues. I hope that developers of smart contracts will In the development, we pay attention to the data storage in the code to avoid the loss caused by such problems.
Timeline
June 28th - found variable coverage and hash collision problems
November 6th - Contract found to have a hash collision problem
Reference
[1] https://github.com/ethereum/solidity/issues/1550