Pay attention to the Ethereum hash collision problem from the "Stealing coins" incident

Author : Kai Song(exp-sky) , hearmen , salt , sekaiwu of Tencent Security Xuanwu Lab

“Stealing coins”

On November 6th, we observed that such a contract appeared on Ethereum. After investigation, it was found that a blockchain security vendor issued a contract to let everyone “Stealing coins”.

pragma solidity ^0.4.21;
contract DVPgame {
    ERC20 public token;
    uint256[] map;
    using SafeERC20 for ERC20;
    using SafeMath for uint256;
    constructor(address addr) payable{
        token = ERC20(addr);
    }
    function (){
        if(map.length>=uint256(msg.sender)){
            require(map[uint256(msg.sender)]!=1);
        }
        if(token.balanceOf(this)==0){
            //airdrop is over
            selfdestruct(msg.sender);
        }else{
            token.safeTransfer(msg.sender,100);

            if (map.length <= uint256(msg.sender)) {
                map.length = uint256(msg.sender) + 1;
            }
            map[uint256(msg.sender)] = 1;  

        }
    }
    //Guess the value(param:x) of the keccak256 value modulo 10000 of the future block (param:blockNum)
    function guess(uint256 x,uint256 blockNum) public payable {
        require(msg.value == 0.001 ether || token.allowance(msg.sender,address(this))>=1*(10**18));
        require(blockNum>block.number);
        if(token.allowance(msg.sender,address(this))>0){
            token.safeTransferFrom(msg.sender,address(this),1*(10**18));
        }
        if (map.length <= uint256(msg.sender)+x) {
            map.length = uint256(msg.sender)+x + 1;
        }

        map[uint256(msg.sender)+x] = blockNum;
    }
    //Run a lottery
    function lottery(uint256 x) public {
        require(map[uint256(msg.sender)+x]!=0);
        require(block.number > map[uint256(msg.sender)+x]);
        require(block.blockhash(map[uint256(msg.sender)+x])!=0);
        uint256 answer = uint256(keccak256(block.blockhash(map[uint256(msg.sender)+x])))%10000;
        if (x == answer) {
            token.safeTransfer(msg.sender,token.balanceOf(address(this)));
            selfdestruct(msg.sender);
        }
    }
}

After observing, we found the security issue of an EVM storage we studied earlier in this contract, namely the hash collision problem in EVM storage.

First, for the above contract, if x == uint256(keccak256(block.blockhash(map[uint256(msg.sender)+x)))))))), the ether in the contract can be obtained in the lottery method. But the value of x can only be obtained through constant guessing, and the probability is minimal.

Then, we found that in the fallback function of the contract, there is also a selfdestruct function that can help us complete the “Stealing coins” task, but requires the balance of this contract address in the token contract to be 0.

Based on our previous analysis of EVM storage, we found that there is an assignment to the arbitrary offset of the map type data in the guess function map[uint256(msg.sender)+x] = blockNum;.Because in EVM, the data storage in the map type which’s address is calculated as address(map_data) = sha(key,slot)+offset, which causes an arbitrary address to be written. If we can overwrite the token variable, we can write the contract we constructed to the token, guaranteeing The DVPgame contract has a balance of 0 in our construction contract, so that the selfdestruct function of the DVPgame contract can be executed to complete the “Stealing coins”.

The address of the token variable is 0. This value can be reached after overflow. That is, we need to construct sha(msg.sender, slot)+x==2**256 (after overflow it will be 0).

In-depth analysis

In fact, as early as the end of June, after preliminary research on ETH and its runtime environment EVM, we have found some problems at the contract level and the virtual machine level respectively. Variable coverage and Hash collision problems are very typical two examples.

Variable coverage

In some contracts, we have found that modifying temporary variables of type struct inside a function will overwrite existing global variables in some cases.

pragma solidity ^0.4.23; 
contract Locked {
    bool public unlocked = false;    
    struct NameRecord { 
        bytes32 name;
        address mappedAddress;
    }
    mapping(address => NameRecord) public registeredNameRecord; 
    mapping(bytes32 => address) public resolve;
    function register(bytes32 _name, address _mappedAddress) public {
        NameRecord newRecord;
        newRecord.name = _name;
        newRecord.mappedAddress = _mappedAddress; 
        resolve[_name] = _mappedAddress;
        registeredNameRecord[msg.sender] = newRecord; 
        require(unlocked); 
    }
}

The source code of the contract is as shown above. Under normal circumstances, since the contract does not provide an interface to modify the unlocked, it is unlikely to modify it. But in fact, we found in the test that you can modify unlocked by calling the register method of the contract.

Hash collision

After analyzing the storage structure of EVM, we found that in the design of EVM, potential hash collisions may occur when storing some complex variables, overwriting existing variables, and generating unpredictable problems.

pragma solidity ^0.4.23; 
    
contract Project
{
    mapping(address => uint) public balances; // records who registered names 
    mapping(bytes32 => address) public resolve; // resolves hashes to addresses

    uint[] stateVar;

    function Resolve() returns (bytes32){
        balances[msg.sender] = 10000000;   
        return sha3(bytes32(msg.sender),bytes32(0));
    }

    function Resize(uint i){
        stateVar.length = i;
    }

    function Rewrite(uint i){
        stateVar[i] = 0x10adbeef; 
    }

}

The above code has a similar hash collision problem. Looking at the contract source code, you can see that the balances field can only be accessed through the Reslove interface. Under normal circumstances, the value stored in balance cannot be modified. But in this contract, calling the function Rewrite may overwrite the data in balances when operating on stateVar.

background analysis

There are three ways to store in EVM, namely memory, storage, and stack.

memory: life cycle is only during the execution of the entire method, the function is recycled after the call, because only the temporary variables are saved, so the GAS overhead is very small
storage : permanently stored in the blockchain, GAS overhead is also the largest because the contract state variables are permanently saved.
stack : store some local value type variables, almost free memory, but there are a limit

First, let’s analyze the storage and access of various object structures in EVM.

Map

First analyze the storage of map

struct NameRecord { 
    bytes32 name; 
    address mappedAddress;
}
mapping(bytes32 => address) public resolve; 
function register(bytes32 _name, address _mappedAddress) public {
    NameRecord newRecord;
    newRecord.name = _name;
    newRecord.mappedAddress = _mappedAddress; 
    resolve[_name] = _mappedAddress;
}

When we debug the map structure in storage, we find that the storage address of the data in the map is actually the hash value of map.key and the map_slot where the map is located. This value is a uint256. which is

1	address(map_data) = sha(key,slot)

And we also found that if the data stored in the map is a structure, the members in the structure are sequentially stored in the storage, and the storage location is sha(key, slot) + offset.That is, the member is directly the offset in the structure is added to the previously calculated hash value as the storage location.

This hash + offset struct storage method directly causes the hash of the sha3 algorithm to lose its meaning. In some cases, it produces sha(key1,slot) + offset == sha(key2,slot), which is a hash collision.

Array

Next, let’s take a look at the Array.

A fixed-length Array of global variables found in debugging is arranged in storage in the order of index.

If we use the new keyword to request a variable-length array, look at its runtime storage

function GetSome() returns(uint){
    stateVar = new uint[](2);
    stateVar[1] = 0x10adbeef;
    //stateVar = [1,2,4,5,6]; // This way is the same as `new` method
    return stateVar[1];
}

Debugging found that if it is a variable length array, the storage location of the array member is selected according to the hash value, and the storage location of the array is sha3(address(array_object))+index. The slot in the array itself is only the length of the array, so it is well understood why variable-length arrays stored in storage can be incremented by adjusting the length property.

Variable-length arrays are still stored in the same way as hash + offset. There is also the possibility of a hash collision.

Array + Struct

If the array and struct are combined, how will the index of the data in storage be determined?

struct Person {
    address[] addr;
    uint funds;
}    
mapping(address => Person) public people;   
function f() {
    Person p;
    p.addr = [0xca35b7d915458ef540ade6068dfe2f44e8fa733c,0x14723a09acff6d2a60dcdf7aa4aff308fddc160c];
    p.funds = 0x10af;

    people[msg.sender] = p;
}

The object of type Person p which’s first member is a dynamic array addr. When storing p object, first store the dynamic array in the map:

1	storage[hash(msg_sender,people_slot)] = storage[p+slot]

Then store the dynamic array contents in turn:

1	storage[hash(hash(msg_sender,people_slot))] = storage[hash(p_slot)]; storage[hash(hash(msg_sender,people_slot))+1] = storage[hash(p_slot)+1];

Finally store funds：

1	storage[hash(msg_sender,people_slot)+1]

Similarly, the struct storage in the array is similar.

problem analysis

Variable coverage

pragma solidity ^0.4.23; 
contract Locked {
    bool public unlocked = false;    
    struct NameRecord { 
        bytes32 name;
        address mappedAddress;
    }
    mapping(address => NameRecord) public registeredNameRecord; 
    mapping(bytes32 => address) public resolve;
    function register(bytes32 _name, address _mappedAddress) public {
        NameRecord newRecord;
        newRecord.name = _name;
        newRecord.mappedAddress = _mappedAddress; 
        resolve[_name] = _mappedAddress;
        registeredNameRecord[msg.sender] = newRecord; 
        require(unlocked); 
    }
}

The unlocked variable in this contract is stored in an offset of 1 in storage. In the debugging, it is found that the index position of the newRecord object in the storage part is also 0, which overlaps with the global unlocked, so when accessing newRecord, it will be modified to unlocked.

In debugging, we found that all temporary variables are stored from the 0 position of storage. If we set a few temporary variables, we will find that all temporary variables have a slot value of 0 when the function starts to select the slot.

Causes

We download the source code of the solidity compiler to view the reasons for the problem here. The source code can be found here, compile the source code directly using cmake, compile tutorial. The source code for solidity needs to reference the boost library. If it is not installed before, you need to install boost first. The compilation process will not be described again, and will eventually generate three executable files (the compilation on Windows will be a bit problematic, the dependent header files can’t be added automatically to the project, you need to add them manually, and there will be some characters to express the problem)

solc\solc
lllc\lllc
test\soltest

Solc can compile sol source code into bytecode that EVM can run.

Debug Solc to see the compilation of the struct as a temporary variable

contract Project
{
    uint a= 12345678;
    struct Leak{
        uint s1;
    }
    function f(uint i) returns(uint) {
        Leak l;
        return l.s1;
    }

}

The key code call stack is as follows

>   solc.exe!dev::solidity::ContractCompiler::appendStackVariableInitialisation(const dev::solidity::VariableDeclaration & _variable) Line 951  C++
    solc.exe!dev::solidity::ContractCompiler::visit(const dev::solidity::FunctionDefinition & _function) Line 445   C++
    solc.exe!dev::solidity::FunctionDefinition::accept(dev::solidity::ASTConstVisitor & _visitor) Line 206  C++
    solc.exe!dev::solidity::ContractCompiler::appendMissingFunctions() Line 870 C++
    solc.exe!dev::solidity::ContractCompiler::compileContract(const dev::solidity::ContractDefinition & _contract, const std::map<dev::solidity::ContractDefinition const *,dev::eth::Assembly const *,std::less<dev::solidity::ContractDefinition const *>,std::allocator<std::pair<dev::solidity::ContractDefinition const * const,dev::eth::Assembly const *> > > & _contracts) Line 75  C++
    solc.exe!dev::solidity::Compiler::compileContract(const dev::solidity::ContractDefinition & _contract, const std::map<dev::solidity::ContractDefinition const *,dev::eth::Assembly const *,std::less<dev::solidity::ContractDefinition const *>,std::allocator<std::pair<dev::solidity::ContractDefinition const * const,dev::eth::Assembly const *> > > & _contracts, const std::vector<unsigned char,std::allocator<unsigned char> > & _metadata) Line 39 C++
    solc.exe!dev::solidity::CompilerStack::compileContract(const dev::solidity::ContractDefinition & _contract, std::map<dev::solidity::ContractDefinition const *,dev::eth::Assembly const *,std::less<dev::solidity::ContractDefinition const *>,std::allocator<std::pair<dev::solidity::ContractDefinition const * const,dev::eth::Assembly const *> > > & _compiledContracts) Line 730  C++
    solc.exe!dev::solidity::CompilerStack::compile() Line 309   C++
    solc.exe!dev::solidity::CommandLineInterface::processInput() Line 837   C++
    solc.exe!main(int argc, char * * argv) Line 59  C++

The key function is appendStackVariableInitialisation. You can see that the call to pushZeroValue records temporary variable information. If the function finds that value exists in Storage, then directly press PUSH 0 and press 0 directly. All temporary variables pass this path. In other words, all temporary variables are 0.

void ContractCompiler::appendStackVariableInitialisation(VariableDeclaration const& _variable)
{
    CompilerContext::LocationSetter location(m_context, _variable);
    m_context.addVariable(_variable);
    CompilerUtils(m_context).pushZeroValue(*_variable.annotation().type);
}

I still can’t understand the reason for this design. The guess may be because of the sparse array relationship of storage itself. It is not convenient to control the slot position by other extra variables. However, with the current implementation, the problem should be more.

Compiled with the global variables, the function call stack is as follows

>   solc.exe!dev::solidity::ContractCompiler::initializeStateVariables(const dev::solidity::ContractDefinition & _contract) Line 403    C++
    solc.exe!dev::solidity::ContractCompiler::appendInitAndConstructorCode(const dev::solidity::ContractDefinition & _contract) Line 146    C++
    solc.exe!dev::solidity::ContractCompiler::packIntoContractCreator(const dev::solidity::ContractDefinition & _contract) Line 165 C++
    solc.exe!dev::solidity::ContractCompiler::compileConstructor(const dev::solidity::ContractDefinition & _contract, const std::map<dev::solidity::ContractDefinition const *,dev::eth::Assembly const *,std::less<dev::solidity::ContractDefinition const *>,std::allocator<std::pair<dev::solidity::ContractDefinition const * const,dev::eth::Assembly const *> > > & _contracts) Line 89   C++
    solc.exe!dev::solidity::Compiler::compileContract(const dev::solidity::ContractDefinition & _contract, const std::map<dev::solidity::ContractDefinition const *,dev::eth::Assembly const *,std::less<dev::solidity::ContractDefinition const *>,std::allocator<std::pair<dev::solidity::ContractDefinition const * const,dev::eth::Assembly const *> > > & _contracts, const std::vector<unsigned char,std::allocator<unsigned char> > & _metadata) Line 44 C++
    solc.exe!dev::solidity::CompilerStack::compileContract(const dev::solidity::ContractDefinition & _contract, std::map<dev::solidity::ContractDefinition const *,dev::eth::Assembly const *,std::less<dev::solidity::ContractDefinition const *>,std::allocator<std::pair<dev::solidity::ContractDefinition const * const,dev::eth::Assembly const *> > > & _compiledContracts) Line 730  C++
    solc.exe!dev::solidity::CompilerStack::compile() Line 309   C++
    solc.exe!dev::solidity::CommandLineInterface::processInput() Line 837   C++
    solc.exe!main(int argc, char * * argv) Line 59  C++

The key function is StorageItem::StorageItem , the function gets the slot of the global variable in storage from storageLocationOfVariable

StorageItem::StorageItem(CompilerContext& _compilerContext, VariableDeclaration const& _declaration):
    StorageItem(_compilerContext, *_declaration.annotation().type)
{
    auto const& location = m_context.storageLocationOfVariable(_declaration);
    m_context << location.first << u256(location.second);
}

Hash collision

As mentioned earlier, smart contracts using struct and array have the potential for a hash collision.

In general, the hash returned by the sha3 method will not collide, but there is no guarantee that hash(mem1)+n will not conflict with other hash(mem2). For example, there are two map.

struct Account{
    string name；
    uint ID;
    uint amount;
    uint priceLimit;
    uint total;
}

map<address, uint> balances;     // slot 0  
map<string, Account> userTable;    // slot 1

Calculate sha3(key1,0) = hash1; Storage[hash1] = value1 when storing balances[key1] = value1.

Calculate sha3(key2,1) = hash2; when storing userTable[key2] = account.

Hash1 and hash2 are not the same, but hash1 and hash2 are likely to be adjacent, with a small difference, we assume that they differ by 4 .

At this time, when the account is actually stored, Account.name, Account.ID, Account.amount, Account.priceLimit, and Account.total are stored in storage in the hash2, hash2+1, hash2+2, hash2+3, and hash2+4 position. And hash2+4 is exactly equal to hash1 , then the value of Account.totalwill overwrite the content value1 previously stored in balances.

However, it is only theoretically possible to attack by struct. It is very difficult to find sha3 with a small difference in practice. But if you turn the problem into array, it’s possible to achieve a real attack.

Because in array, the length of the array is controlled by the data stored in the first byte of the array object, as long as the value is large enough, the attacker can override the hash data of any gap.

pragma solidity ^0.4.23; 
contract Project
{
    mapping(address => uint) public balances; // records who registered names 
    mapping(bytes32 => address) public resolve; // resolves hashes to addresses

    uint[] stateVar;

    function Resolve() returns (bytes32){
        balances[msg.sender] = 10000000;   // 0x14723a09acff6d2a60dcdf7aa4aff308fddc160c ->  0x51fb309f06bafadda6dd60adbce5b127369a3463545911e6444ab4017280494d 

        return sha3(bytes32(msg.sender),bytes32(0));
    }

    function Resize(uint i){
        stateVar.length = 0x92b6e4f83ec43f4bc9069880e92f6ea53e45d964038b04cc518a923857c1b79c; // 0x405787fa12a823e0f2b7631cc41b3ba8828b3321ca811111fa75cd3aa3bb5ace
    }

    function Rewrite(uint i){
        stateVar[i] = 0x10adbeef; // 0x11a3a8a4f412d6fcb425fd90f8ca757eb40f014189d800d449d4e6c6cec4ee7f = 0x51fb309f06bafadda6dd60adbce5b127369a3463545911e6444ab4017280494d - 0x405787fa12a823e0f2b7631cc41b3ba8828b3321ca811111fa75cd3aa3bb5ace
    }

}

The current sender address is 0x14723a09acff6d2a60dcdf7aa4aff308fddc160c , and the balance[msg.sender] location is 0x51fb309f06bafadda6dd60adbce5b127369a3463545911e6444ab4017280494d. Call the Resize method to modify the length of the array stateVar. The storage location of the array is 0x405787fa12a823e0f2b7631cc41b3ba8828b3321ca811111fa75cd3aa3bb5ace.

Finally call the contract method Rewrite assigns an value to the array, which overrides the contents of the balance and overwrites the value of the sender.

Actual memory

Finally, let’s take a look at the actual memory management. Regardless of how high-level technology of the Ethereum blockchain is, the memory needs to be settled. Ultimately, the data needs to be stored in the actual physical memory. So we actually analyze the storage of the storage part through the source code. The source code for EVM is at https://github.com/ethereum/cpp-ethereum

Process analysis

The return value of EVM is passed through EVM. Generally, the return value address is stored at the memory offset 0x40. This address holds the true return value.
Storage at the bottom of the implementation is an STL implementation sparse array, the slot value as a key to store the value
Maps and variable-length Arrays in Storage are all based on the hash value as the index of the lowest-level sparse array. The variable length array is indexed as hash(array_slot) + index and the Map index is hash(map_slot, key). When Value is Struct, the Struct members are stored separately. The index of each member is hash(map_slot, key) +offset

Code analysis

Storage

The storage part of the memory is the memory stored in the block together with the contract code, so the storage memory consumes a relatively large amount of gas back. We use the SLOAD instruction to check how Storage is stored on the block.

The SLOAD instruction is processed in the function interpretCases. When the EVM resolves to the SLOAD instruction, it first gets the top element of the stack as the key of the storage access from the stack, and then calls the function getStorage for the actual access.

case SLOAD:
    evmc_uint256be key = toEvmC(m_SP[0]);
    evmc_uint256be value;
    m_context->fn_table->get_storage(&value, m_context, &m_message->destination, &key);
    m_SPP[0] = fromEvmC(value);


evmc_context_fn_table const fnTable = {
    accountExists,
    getStorage,
    setStorage,
    getBalance,
    getCodeSize,
    copyCode,
    selfdestruct,
    eth::call,
    getTxContext,
    getBlockHash,
    eth::log,
};

The getStorage function receives four parameters, the first parameter is the return address, the second parameter is the context of the current call, the third parameter is the destination address of the transaction information, that is, the contract address, and the fourth parameter is the index of the storage,Key.

The function first verifies the address, ensuring that the current context is in the space of the contract address, and then calling env.store to actually get the data.

void getStorage(
    evmc_uint256be* o_result,
    evmc_context* _context,
    evmc_address const* _addr,
    evmc_uint256be const* _key
) noexcept
{
    (void) _addr;
    auto& env = static_cast<ExtVMFace&>(*_context);
    assert(fromEvmC(*_addr) == env.myAddress);
    u256 key = fromEvmC(*_key);
    *o_result = toEvmC(env.store(key));
}


virtual u256 store(u256 _n) override final { return m_s.storage(myAddress, _n); }

The final work comes to State::storage

 u256 State::storage(Address const& _id, u256 const& _key) const
{
    if (Account const* a = account(_id))
    {
        auto mit = a->storageOverlay().find(_key);
        if (mit != a->storageOverlay().end())
            return mit->second;

        // Not in the storage cache - go to the DB.
        SecureTrieDB<h256, OverlayDB> memdb(const_cast<OverlayDB*>(&m_db), a->baseRoot());          // promise we won't change the overlay! :)
        string payload = memdb.at(_key);
        u256 ret = payload.size() ? RLP(payload).toInt<u256>() : 0;
        a->setStorageCache(_key, ret);
        return ret;
    }
    else
        return 0;
}

The function first gets the corresponding Account object based on address

Account* State::account(Address const& _addr)
{
    auto it = m_cache.find(_addr);   
    if (it != m_cache.end())
        return &it->second;

    if (m_nonExistingAccountsCache.count(_addr))  
        return nullptr;

    // Populate basic info.
    string stateBack = m_state.at(_addr);  
    if (stateBack.empty())
    {
        m_nonExistingAccountsCache.insert(_addr);
        return nullptr;
    }

    clearCacheIfTooLarge();

    RLP state(stateBack);  
    auto i = m_cache.emplace(
        std::piecewise_construct,
        std::forward_as_tuple(_addr),
        std::forward_as_tuple(state[0].toInt<u256>(), state[1].toInt<u256>(), state[2].toHash<h256>(), state[3].toHash<h256>(), Account::Unchanged)
    );  
    m_unchangedCacheEntries.push_back(_addr);
    return &i.first->second;  
}

The following comment is a description of the partial Account object, the Account object is used to represent the state of an Ethernet account, and the Account object and addr are stored in the State object through the Map. Each Account account contains a storage trie to index its nodes in the entire StateDB. Account’s operations on storage are first performed on the map of storageOverlay, and the data is updated to trie when needed later.

/**
 * Models the state of a single Ethereum account.
 * Used to cache a portion of the full Ethereum state. State keeps a mapping of Address's to Accounts.
 *
 * Aside from storing the nonce and balance, the account may also be "dead" (where isAlive() returns false).
 * This allows State to explicitly store the notion of a deleted account in it's cache. kill() can be used
 * for this.
 *
 * For the account's storage, the class operates a cache. baseRoot() specifies the base state of the storage
 * given as the Trie root to be looked up in the state database. Alterations beyond this base are specified
 * in the overlay, stored in this class and retrieved with storageOverlay(). setStorage allows the overlay
 * to be altered.
 *

Go back to the State::storage function. After getting the Account, check whether the value of the specified key is in the storageOverlay of the Account. If not, go to the DB to find it. Take Account->m_storageRoot as the root and get a db from State->m_db. Look in this tire‘s copy and format it after it is in m_storageOverlay.

It can be seen that before the actual data is synchronized to the block, EVM provides a level 2 cache mechanism for both storage and account to improve the efficiency of the memory:

storage: Level 1 cache -> account->m_storageOverlay; Level 2 cache -> state -> m_db
account: level 1 cache -> state -> m_cache; level 2 cache -> state -> m_state

Similarly, we start from the storage storage entry point SSTORE, the main function is VM::interpretCases, SSTORE opcode will eventually access a hash table of type unordered_map

void VM::interpretCases(){
        // .....
        CASE(SSTORE)
        {
            ON_OP();
            if (m_message->flags & EVMC_STATIC)
                throwDisallowedStateChange();

            updateSSGas();
            updateIOGas();

            evmc_uint256be key = toEvmC(m_SP[0]);
            evmc_uint256be value = toEvmC(m_SP[1]);
            m_context->fn_table->set_storage(m_context, &m_message->destination, &key, &value);
        }
        NEXT
        // .....
}

|-
    evmc_context_fn_table const fnTable = {
        accountExists,
        getStorage,
        setStorage,
        getBalance,
        getCodeSize,
        copyCode,
        selfdestruct,
        eth::call,
        getTxContext,
        getBlockHash,
        eth::log,
    };


    void setStorage(
        evmc_context* _context,
        evmc_address const* _addr,
        evmc_uint256be const* _key,
        evmc_uint256be const* _value
    ) noexcept
    {
        (void) _addr;
        auto& env = static_cast<ExtVMFace&>(*_context);
        assert(fromEvmC(*_addr) == env.myAddress);
        u256 index = fromEvmC(*_key);
        u256 value = fromEvmC(*_value);
        if (value == 0 && env.store(index) != 0)                   // If delete
            env.sub.refunds += env.evmSchedule().sstoreRefundGas;  // Increase refund counter

        env.setStore(index, value);    // Interface uses native endianness
    }

    |-
        void ExtVM::setStore(u256 _n, u256 _v)
        {
            m_s.setStorage(myAddress, _n, _v);
        }

        |-

            void State::setStorage(Address const& _contract, u256 const& _key, u256 const& _value)
            {
                m_changeLog.emplace_back(_contract, _key, storage(_contract, _key));
                m_cache[_contract].setStorage(_key, _value);
            }

            |-

                class Account{
                    // ...
                    std::unordered_map<u256, u256> m_storageOverlay;
                    // ...
                    void setStorage(u256 _p, u256 _v) { m_storageOverlay[_p] = _v; changed(); }
                    // ...
                }

memory

Still starting with MSTORE, see the processing of memory in EVM

CASE(MSTORE)
{
    ON_OP();
    updateMem(toInt63(m_SP[0]) + 32);
    updateIOGas();

    *(h256*)&m_mem[(unsigned)m_SP[0]] = (h256)m_SP[1];
}
NEXT

It can be seen that memory is only valid in the current running environment, and is not stored in any position related to state, so memory is only valid in the current running environment, that is, Memory only takes effect in one transaction.

code

Code is similar to storage and is also related to Account, so code is also stored in the corresponding structure of Account, the first level cache is account->m_codeCache; the second level cache is stored in state->m_db[codehash]

void State::setCode(Address const& _address, bytes&& _code)
{
    m_changeLog.emplace_back(_address, code(_address));
    m_cache[_address].setCode(std::move(_code));
}

summary

Although the problem of hash collisions appears in a CTF-like “Stealing coins” competition, we should also pay attention to variable coverage and hash collisions due to EVM storage design issues. I hope that developers of smart contracts will In the development, we pay attention to the data storage in the code to avoid the loss caused by such problems.

Timeline

June 28th - found variable coverage and hash collision problems

November 6th - Contract found to have a hash collision problem

Reference

[1] https://github.com/ethereum/solidity/issues/1550

[2] https://lilymoana.github.io/ethereum_theory.html

[3] https://github.com/FISCO-BCOS/Wiki/tree/master/%E6%B5%85%E8%B0%88Ethereum%E7%9A%84%E5%AD%98%E5%82%A8#StateDB%E6%A8%A1%E5%9D%97

[4] https://github.com/ethereum/cpp-ethereum