Towards a Toolchain for Exploiting Smart Contracts on the Ethereum Blockchain
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Towards a Toolchain for Exploiting Smart Contracts on the Ethereum Blockchain by Sebastian Kindler M.A., University of Bayreuth, 2011 Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Bachelor of Science in the Computer Science Program Faculty of Computer Science Supervisor: Prof. Dr. Stefan Traub Second Assessor: Prof. Dr. Markus Schäffter External Assessor: Dr. Henning Kopp Ulm University of Applied Sciences March 22, 2019
Abstract The present work introduces the reader to the Ethereum blockchain. First, on a con- ceptual level, explaining general blockchain concepts, and viewing the Ethereum blockchain in particular from different perspectives. Second, on a practical level, the main components that make up the Ethereum blockchain are explained in detail. In preparation for the objective of the present work, which is the analysis of EVM bytecode from an attacker’s perspective, smart contracts are introduced. Both, on the level of EVM bytecode and Solidity source code. In addition, critical assem- bly instructions relevant to the exploitation of smart contracts are explained in detail. Equipped with a definition of what constitutes a vulnerable contract, further practical and theoretical aspects are discussed: The present work introduces re- quirements for a possible smart contract analysis toolchain. The requirements are viewed individually, and theoretical focus is put on automated bytecode analysis and symbolic execution as this is the underlying technique of automated smart contract analysis tools. The importance of semantics is highlighted with respect to designing automated tools for smart contract exploitation. At the end, a min- imal toolchain is presented, which allows beginners to efficiently analyze smart contracts and develop exploits. i
Contents Introduction 1 1 Preliminaries 3 1.1 Blockchain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Ethereum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.2.1 Ethereum from Different Perspectives . . . . . . . . . . . 10 1.2.2 Ethereum World State σ . . . . . . . . . . . . . . . . . . 13 1.2.3 Ethereum Account Types . . . . . . . . . . . . . . . . . . 15 1.2.4 Ethereum Transactions . . . . . . . . . . . . . . . . . . . 16 1.2.5 Ethereum Virtual Machine (EVM) . . . . . . . . . . . . . 23 1.2.6 Ethereum Peer-to-Peer Network . . . . . . . . . . . . . . 26 1.3 Smart Contracts . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 1.3.1 Smart Contracts at EVM Bytecode Level . . . . . . . . . 29 1.3.2 Solidity and the Structure of Smart Contracts . . . . . . . 35 1.4 Vulnerability of Smart Contracts . . . . . . . . . . . . . . . . . . 43 1.4.1 Critical Bytecode Instructions in Smart Contracts . . . . . 43 1.4.2 Exploitation of Critical Instructions . . . . . . . . . . . . 46 1.4.3 Defining Vulnerable Smart Contracts and Exploits . . . . 49 ii
2 Towards a Smart Contract Exploit Development Toolchain 50 2.1 Requirements Analysis . . . . . . . . . . . . . . . . . . . . . . . 50 2.2 Requirement 1: EVM Bytecode Deployment . . . . . . . . . . . . 53 2.3 Requirement 2: Manual EVM Bytecode Analysis (Tools) . . . . . 55 2.4 Requirement 3: Automated Analysis (Theory) . . . . . . . . . . . 57 2.4.1 Symbolic Execution . . . . . . . . . . . . . . . . . . . . 57 2.5 Requirement 4: Automated Exploit Development . . . . . . . . . 70 2.6 Toolchain for Automated EVM Bytecode Analysis and Exploit Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Conclusions 74 iii
Introduction The concept of smart contracts was introduced in 1994 by cryptographer Nick Szabo [46], who offers the following definition: A smart contract is a computerized transaction protocol that executes the terms of a contract. The general objectives of smart contract design are to satisfy common contractual conditions (such as payment terms, liens, confidentiality, and even enforcement), minimize exceptions both malicious and accidental, and minimize the need for trusted intermediaries. Related economic goals include lowering fraud loss, arbitration and enforcement costs, and other transaction costs1 . A smart contract is as binding as a legal contract. The bytecode of a smart contract constitutes the contractual conditions, to which users subject themselves when they execute the respective smart contract. However, unlike a legal contract, a smart contract can neither be circumvented nor fought in court. As programs, and from the perspective of the end user, smart contracts execute precisely the way they are designed. In this sense, Ethereum blockchain technology is an implementation of a decentralized crypto-law system [51]. In contrast to national legal systems, Ethereum non-contract account owners do not decide by which law they want to abide, but rather by which law they want to be bound. Once called via a message call transaction, smart contract execution cannot be stopped, and the contractual conditions are binding in the absolute sense. However, if program code is what constitutes the law, then programming errors are part of the law as well. Hence, by definition, the abuse of programming errors in a 1 The term transaction costs goes back to the article The problem of Social Cost by Ronald Coase [10], the theses of which were later summarized as the Coase theorem [12]. Transaction costs have a negative connotation and refer to the time and effort as well as the resources that are required to negotiate the exchange of legal entitlements. According to the interpretation [12] of Coase’s proposal, from the perspective of efficiency, the original allocation of resources is of no concern as long as transactions of legal entitlements are costless. Reducing transaction costs facilitates the efficient exchange of legal entitlements and increases cooperation between competing parties. 1
decentralized system such as Ethereum cannot constitute a violation of the system’s crypto-law. Any condemnation of attacks against error-prone smart contracts builds on the remnants of thinking in centralized legal systems. A decentralized system thus eradicates such thinking. Public blockchain implementations such as Ethereum are transparent but trustless environments. However, the trust people put in Ethereum does not depend on centralized legal institutions. Rather, people put their trust in algorithms [35] the security of blockchain technology is build on. Regarding the consensus algorithm, i.e., the proof-of-work, such trust may be justified. However, complete trust in the correct execution of arbitrary programs seems ill-placed. Especially, when these programs manage ’real’ people’s money: Ethereum smart contracts own Ether worth millions of US dollar, and heists [33] on the Ethereum blockchain have shown how vulnerable and insecure smart contracts can be. To comprehend the severity of smart contract vulnerabilities as well as the importance of a toolchain for smart contract vulnerability analysis, the subsequent work serves as a thorough introduction to the Ethereum blockchain. The present work introduces the reader to the Ethereum blockchain. First, on a conceptual level, explaining general blockchain concepts, and viewing the Ethereum blockchain in particular from different perspectives. Second, on a practical level, the main components that make up the Ethereum blockchain are explained in detail. In preparation for the objective of the present work, which is the analysis of EVM bytecode from an attacker’s perspective, smart contracts are introduced. Both, on the level of EVM bytecode and Solidity source code. In addition, critical assembly instructions relevant to the exploitation of smart contracts are explained in detail. Equipped with a definition of what constitutes a vulnerable contract, further practical and theoretical aspects are discussed: The present work introduces requirements for a possible smart contract analysis toolchain. The requirements are viewed individually, and theoretical focus is put on automated bytecode analysis and symbolic execution as this is the underlying technique of automated smart contract analysis tools. The importance of semantics is highlighted with respect to designing automated tools for smart contract exploitation. At the end, a minimal toolchain is presented, which allows beginners to efficiently analyze smart contracts and develop exploits. 2
1 Preliminaries 1.1 Blockchain Blockchain as an append-only linked list The term blockchain refers to a data structure, which can be loosely described as an append-only linked list, whose data content and sequence of data elements are immutable. In comparison, a standard singly linked list is a linear sequence of individual data elements, each of which contains some data and a reference to the next element. Thus, the elements themselves implement the list by referencing the address of the respective next element as shown in Figure 1. Each of the elements in the singly linked list can be modified, moved within the sequence or be deleted. Moreover, new elements can be inserted at any point in the sequence: at the beginning, the middle or the end. Thus, a singly linked list is neither append-only nor is it immutable with regards to data content and sequence of data elements. address : 0x0004 address : 0x000A address : 0x0032 data 12 data 35 data 17 next 0x000A next 0x0032 next N ull F irst element Second element Last element Figure 1: A singly linked list consisting of three elements, each of which stores an integer value and references the next element by pointing to its address. In contrast, a blockchain is designed as an append-only linked list: the linear sequence of previously added data elements is immutable, so is the data stored in the data elements. The data elements on a blockchain are referred to as blocks. Each block is comprised of two sections: (1) a block header that contains various pieces of information particular to a block on the blockchain, and (2) a data section, 3
which contains the data, i.e., records of information. Just like the elements added to a linked list, the blocks on a blockchain contain a reference to a neighboring block on the blockchain. The reference to the neighboring block is stored in the block header. However, instead of referencing the next data element, each block references the previous block in the sequence, i.e., the parent block. Instead of using a pointer to the address of the next element in the sequence, each block contains the hash value of its parent block as depicted in Figure 2. block header : block header : block header : [...] [...] [...] fHash (Block #0) fHash (Block #1) data section : data section : data section : dataA dataB dataC Block #0 Block #1 Block #2 Figure 2: In a blockchain, the elements of the linear sequence are linked through the hash value of the respective precursor. The hash function fHash () takes the entire block, i.e., block header and data section, as input and produces a hash value as output. The placeholder [...] in the block header represents additional pieces of information that are particular to a block. Blockchain cryptographically protected by hash values The properties of hash values allow for data validation and provide a means for checking the integrity of the data stored on the blockchain: Hash functions take an arbitrarily large size of data as input and map it to a fixed-sized output, e.g., a 256-bit hash value. In addition, good hash functions map data in a way that exhibits the so-called avalanche effect: If only one bit of the input data is inverted, each of the bits of the output hash value will change with a probability of fifty percent [50, p. 524]. This makes the output of hash functions entirely unpredictable. Only identical blocks of data will produce the same hash value. 4
Thus, hash functions create a fingerprint of the respective data, which can be used to check the data’s integrity. In this way, each block references its precursor and cryptographically contains the identity of the precursor’s data in the form of a fingerprint, i.e., the hash value. When a block is added to the end of the sequence, the hash value of the last block in the blockchain becomes part of the new block’s header information, and subsequently determines the new block’s hash value. Modifying, appending or deleting even one character in a block’s data, therefore, would change that block’s hash value significantly as shown in Figure 3. Hence, linking blocks on the blockchain through their hash values is a way of providing a means for detecting manipulation of the data stored on the blockchain. Hypothetically, if someone were to manipulate the records of information stored in a block, this person would have to re-calculate the hash values of all subsequent blocks to escape detection. In any other case, the integrity of the data would be violated, and the blockchain be broken. Blocks of data Resulting hash values ”dataA 0xe35f 47d 1 ” 0x60cbc1a87c2c7bad994784ded812af 98 ”dataA 0xe35f 47d 2 ” fMD5 (data) 0xb3ae55f 566e756ef a3af 8ebda65d6332 0x0cd26af 0131478ae2be6caeead727502 ”dataA 0xe35f 47d 3 ” Figure 3: Modifying only one character of a piece of data, e.g., a string, changes that data’s resulting hash value, e.g., 128-bit MD5 hash value, significantly. Hashes as proof-of-work Calculating a hash value with the respective hash function does not require a lot of computational resources. Anybody with access to the data of the blockchain described so far could manipulate the records of information in one or several 5
blocks and re-calculate the hash values of all subsequent blocks in the sequence. To prevent such data manipulation, blockchain technology employs a mechanism, which demands that the hash value of a block be below a specified target value. By introducing such a difficulty level, the mechanism imposes a significant computa- tional effort on whoever wants to add a block to the blockchain. The desired hash value can only be found with brute force by continuously changing the block’s data: For this purpose, a nonce is added to the block header every time the hash value is being calculated. This process is repeated until it produces a hash value smaller than the difficulty level. The nonce that produced the acceptable hash value then remains part of the block header as seen in Figure 4. block header : block header : block header : [...] [...] [...] fHash (Block #0) fHash (Block #1) nonce1 nonce2 data section : data section : data section : dataA dataB dataC Block #0 Block #1 Block #2 Figure 4: The nonce is part of the block header. As such it determines the resulting hash value of the block. Hence, finding a nonce that yields a hash value below the difficulty level poses a severe computational problem, which can only be solved with brute force. Anybody, who wants to find the acceptable hash value to add a new block to the blockchain, has to commit extensive computational resources to this work. The result of such computational efforts is called proof-of-work [51, p. 6]: The nonce and the resulting hash value below the specified difficulty level are proof of someone’s computational work. The mechanism or process of solving the proof-of-work is called mining. Blockchain users, who commit computational resources to solve 6
the proof-of-work for a block, are called miners. To put this work in perspective: From February 2018 until November 2019, the average daily hash rate for a block on the Ethereum blockchain used to be around 250,000 Gigahashes per second (250k GH/s)2 . A powerful GPU, e.g., the GeForce GTX 1080 Ti, is capable of calculating 31.3 · 106 hashes per second (31.3 MH/s or 0.0313 GH/s)3 . A miner who solves the proof-of-work in 10 seconds, while the specific block difficulty is approximately 3.13 · 1015 , has to commit computing equipment capable of a hash rate of 313,000,000 MH/s hashes per second (313k GH/s). That is 10 million times the hash power of the GTX 1080. From this follows, that the mining mechanism disproportionately impedes the insertion of a new block in the middle of the blockchain or the manipulation of the records of information in existing blocks, as the proof-of-work would have to be solved for each subsequent block. Moreover, computing power translates into electrical energy consumption, which puts a real cost burden on miners, and therefore discourages them from committing their computational resources to data manipulation. Proof-of-work as a consensus mechanism in a decentralized network The proof-of-work also serves as a consensus mechanism: A blockchain is not stored on a centralized server like a database within a client-server architecture as shown in Figure 5. 2 Cfr. etherscan.io/chart/hashrate. (Last visited February 2, 2019.) 3 Cfr. www.techspot.com/article/1438-ethereum-mining-gpu-benchma rk/. (Last visited February 2, 2019.) 7
Figure 5: Centralized network based Figure 6: Decentralized peer-to-peer on the client-server architecture as it is network as it is used for blockchain tech- used for databases and web services. nology. Blockchain technology demands the generation of a decentralized organization as shown in Figure 6, i.e., a peer-to-peer system consisting of devices, which support the same blockchain protocol, the same processes. These processes transform configured devices into network nodes. Interaction between the nodes is symmetric: Each node acts simultaneously as a client and a server [47]. The blockchain network distinguishes between full nodes and light nodes. Full nodes store a complete copy of the blockchain, while light nodes only download the block headers. The mining process requires that each miner host a copy of the entire blockchain on their node, i.e., they have to run a full node. There is no trust between the nodes. Hence, the consensus mechanism, i.e., the proof-of-work, allows nodes to verify the blockchain data they receive from other nodes. If a node has validated a newly created record of information or change to the blockchain, i.e., the proof-of-work for a newly added block, the node will then propagate the data further to other nodes. Each node can decide for itself, whether the received data is valid or not. The final decision on which block is appended to the blockchain is the result of comparing data with other nodes. The information about verified blocks has to be spread over the network. Broadcasting information in a peer-to-peer network To this end, two types of data have to be broadcasted on the blockchain’s peer- 8
to-peer network: (1) newly created records of information, and (2) the resulting changes to the blockchain. Full nodes redistribute information about changes to the blockchain as well as newly created data, i.e., records of information, on the network, while light nodes may serve as endpoints that only broadcast new records of information created by their owners. Miners decide, which records of information they want to include in the block they are about to mine: They collect the new records of information broadcasted to the network into new blocks, which they then try to mine so that they may be added to the blockchain. Thus, there is some incentive for a rogue miner not to propagate, i.e., broadcast, newly created records of information and collect them in a new block to be mined in secrecy. However, as even endpoint nodes broadcast to more than one node, such behavior would only work with records of information created by the rogue miner. In the end, such behavior would be futile if the rogue miner did not also have the resources to finish the proof-of-work before anybody else’s proof-of-work. Incentive for sustaining the blockchain network With the respective blockchain software and adequate hardware, the blockchain network is accessible to anyone. While light nodes may run on less powerful devices, miners require powerful computing equipment to solve the proof-of-work and add new blocks to the blockchain. Hence, miners continuously compete with each other over the proof-of-work for the next block in the sequence. The reason why miners commit their computing power in the first place is that blockchain technology is intrinsically linked to cryptocurrencies, whose value today is pegged to real-world fiat currencies. A blockchain serves as a public ledger, which stores the transactions between accounts. The transactions are the records of information, which miners group in blocks. Accounts are identified by hexadecimal numbers, i.e., the account’s address. Transactions describe monetary value transfers from one address to another, i.e., from one account to another account. The real-world value of such transfers depends on the exchange rate between the blockchain’s cryptocurrency and other real-world fiat currencies, e.g., 9
the US Dollar or the Euro. The miner, who first solves the proof-of-work for a new block, receives compensation in the blockchain’s respective currency, which has real-world value. In addition, senders pay a small fee for each transaction they send to another account. Miners thus use their hash power to compete against all the other miners on the blockchain network. 1.2 Ethereum 1.2.1 Ethereum from Different Perspectives Ethereum can be viewed from three different perspectives: (1) from a theoretical point of view as a whole, (2) according to its implementation, and (3) its practical application and meaning. Blockchain as a transaction-based state machine The Yellow Paper [51] states that Ethereum as a whole is a transaction-based state machine as shown in Figure 7. From this perspective, Ethereum is defined through a so-called world state, formally denoted as σ. As shown in Figure 7, only valid transactions can cause the world state to transition from one state to another: e.g., σt −→ σt+1 . 10
T : transaction σ : world state T1 , T2 T3 σt σt+1 σt+2 Figure 7: Ethereum can be considered as a transaction-based state machine [51], which changes from one state to another. The transitions of the world state σ are caused by valid transactions. Blockchain as a record of state-changing causes From an implementation perspective, however, transactions as such do not change the world state. Transactions have to be grouped into a block first. Only valid transactions of a valid block4 cause a change in the world state. From this perspec- tive, Ethereum is a sequence of blocks chained together through the backward hash reference as shown in Figure 8. The blocks contain the records of the causes for change in state. 4 The blockchain is a sequence of valid blocks. However, there are different types of blocks, which are part of the Ethereum blockchain [2]: The first block in a blockchain is called the (1) genesis block. Valid blocks are appended to valid blocks, starting with the genesis block. However, not all block become canonical blocks. Miners must cease working on a block as soon as the network has validated another block. An unfinished block is called a (2) stale block, i.e., a block that had to be discarded by the miner because a competing miner found the proof-of-work for the next block in the blockchain first. In the case that two competing miners finish almost at the same time, the finished but rejected block becomes an (3) uncle block, also referred to as ommer block [51]. On the Ethereum blockchain, miners receive rewards for both, valid blocks and uncle blocks. A parallel chain of uncle blocks can grow substantially if parts of the peer-to-peer network believe it to be the canonical chain. For the same reason, these parallel chains can be split up as well. 11
B : Block T : Transaction σ : World state Bb Bb+1 fHash (Bb−1 ) fHash (Bb ) T1 T3 T2 σt σt+1 σt+2 Figure 8: Transactions are grouped in blocks. On the block-level, transitions of the world state are caused through the addition of finalized blocks to the blockchain [51]. Each block can contain a series of transactions. Child block Bb+1 is linked to its parent block Bb because the child block contains the hash, e.g., fHash (Bb ), of the parent block in its block header. This linkage is depicted by the arrow, connecting the two blocks. Blockchain as a ledger In practical terms, Ethereum constitutes a ledger composed of all valid transactions between accounts grouped in blocks as shown in Figure 9. The blocks can be viewed as the pages of a ledger, on which the transactions contained in a block are recorded. 12
B: Block Bb+2 Bb+1 Bb Figure 9: The Ethereum blockchain viewed as a ledger. With each validated block the ledger’s ’volume’ increases, containing the record of every valid transaction. 1.2.2 Ethereum World State σ Mapping between addresses and account states The world state σ [51] contains all Ethereum accounts as objects. An account is the mapping of an address a to the state of the account, i.e., the account state. Thus, each state σt of the world state maps all Ethereum addresses to their respective account states. The account state is denoted as the tuple σ[a] and contains four fields as shown in Figure 10: (1) nonce (σ[a]n ), (2) balance (σ[a]b ), (3) storageRoot (σ[a]s ), and (4) codeHash (σ[a]c ). The nonce is a scalar value, which counts the transactions sent from this account’s address. The balance is a scalar value, which represents the accrued amount of money owned by the address. While Ether is Ethereum’s native currency, the balance is calculated in Wei, the smallest unit of Ethereum’s currency. One Ether is equivalent to 1018 Wei. The storageRoot is a 256-bit hash5 value of the data stored in the account storage in the Ethereum state database. The codeHash is the 256-bit hash value of the bytecode that belongs to the account, and which is stored in the Ethereum state database. Bytecode on the Ethereum blockchain is immutable. Hence, the value of the codeHash field will never change. 5 Hash here refers to the Keccak 256-bit hash used on Ethereum. 13
World state σ State database Address a Account state σ[a] Storage 160-bit identifier nonce balance Bytecode storageRoot 6060... codeHash Figure 10: The world state σ maps an address to an account state. While the transactions that lead to some world state σt are stored in the respective blocks on the blockchain, the data structure for the mapping of addresses to account states, as well as the account’s storage and bytecode, is stored in the Ethereum state database [51]. Data structure of the Ethereum state database The data structure of the Ethereum state database constitutes a Merkle Patricia tree [13] [29], a combination of a (1) Merkle tree and a (2) Radix tree. A Merkle tree builds up from the leave nodes at the bottom via intermediate nodes to the root node at the top: The leave nodes at the bottom contain the data. The intermediate nodes contain the hash of their child nodes. Child nodes are either intermediate nodes or leave nodes. The root node at the top also contains the hash of its child nodes, which are intermediate nodes. The hash stored in the root node of the Merkle tree is called the root hash. The radix tree is built on top of many Merkle trees. The leave nodes of the radix tree represent the root nodes of the Merkle trees. The root node of the radix tree and its intermediate nodes contain each a single character from these Merkle tree root hashes. Following down the path of a radix 14
tree, while searching for a specific Merkle tree root hash, leads to the root hash of the corresponding Merkle tree. The root hash of a Merkle tree for the state database is contained in each block [51]. In this way, the world state and the blockchain are cryptographically linked to each other. The data at the bottom of the data structure is thus verifiable for each individual state because the respective block, which caused the state, also contains that state’s root hash. The hash algorithm that is used in Ethereum is a 256-bit Keccak hash, also referred to as SHA3 hash [51]. Furthermore, all data in Ethereum is serialized by the data format RLP (Recursive Length Prefix) [22]. Data can be either a string (byte array) or a byte array of byte arrays (strings). 1.2.3 Ethereum Account Types From this look at the Ethereum account state, it becomes evident that Ethereum, viewed from its practical aspects, is more than a ledger that records transactions between accounts. Ethereum supports a decentralized computer: The Ethereum virtual machine (EVM) allows for the execution of bytecode. Accounts can own EVM bytecode, which is stored in the Ethereum state database. A transaction addressed to an account, which owns EVM bytecode, triggers the execution of that EVM bytecode. To this end, Ethereum supports two types of accounts [17]: (1) non-contract accounts, and (2) contract accounts. A non-contract account is an externally owned account (EOA). Access to a non-contract account is controlled by a private key. Non-contract accounts do not contain EVM bytecode. Therefore, the fields storageRoot and codeHash remain empty as shown in Figure 11. By contrast, as shown in Figure 12, a contract account is publically available and contains EVM bytecode, which can be executed on the Ethereum virtual machine (EVM). Hence, a contract account is controlled by its EVM bytecode, while bytecode execution is triggered by transactions sent to the contract account. 15
Address Account state Address Account state Storage 160-bit identifier nonce 160-bit identifier nonce balance balance Bytecode storageRoot storageRoot 6060... codeHash codeHash Figure 11: Externally owned account Figure 12: Contract account (CA) with (EOA). An EOA does not own EVM EVM bytecode and storage for contract bytecode. data. 1.2.4 Ethereum Transactions Transactions Only owners of non-contract accounts (EOA) can send transactions from their account’s address to the address of other accounts, including contract accounts (CA). Sending transactions to another account’s address allows for transfer of value, i.e., Ether, to the respective account. Transactions sent from a non-contract account’s address to the address of a contract account cause the execution of the EVM bytecode, such an account owns. Fees (Gas) The processing of transactions, i.e., the mining of blocks, requires computing power, which translates into real-world costs for electrical power. Therefore, Senders are charged a fee for their transactions. The fee is credited to the account of the miner, who first delivers the proof-of-work [2]. On Ethereum, fees for computing power are charged in gas6 . The minimum fee required for a transaction to be processed at all is 21,000 gas. The price of gas in Wei for processing a 6 If Ethereum were an engine, then transaction fees would be the fuel, i.e., the gas, that power it [17]. 16
transaction depends on how much senders are willing to pay7 , as well as the miners’ price preferences. Miners, who have more hashing power, also have higher ask prices. Hence, senders can incentivize miners to process their transactions more rapidly by simply increasing their price offer for gas [44]. The amount of gas purchased for transactions to addresses of contract accounts is a more subtle calculation, as the sender of that transaction has to pay a fee for code execution as well. However, different EVM bytecode instructions consume varying amounts of gas [17]. Senders who want EVM bytecode to be executed on the Ethereum blockchain must, therefore, purchase a sufficient amount of gas with their transaction. Any Ether for unused units of gas is refundable, and it is possible to set a limit for how much gas can be purchased in one transaction. However, should the amount of purchased gas not be sufficient, then the transaction is reverted. Used gas is non-refundable. In any case, there is an upper limit as to how much gas can be purchased in one transaction8 . The price for gas (in Wei) that needs to be paid for transactions, which result in EVM bytecode execution, depends on the amount of gas, each EVM bytecode instruction consumes. Transaction senders pay the miners in Ether (Wei) for the gas, their transactions consume during EVM bytecode execution. While the amount of gas to be paid depends on the type of EVM bytecode instruction that is being executed, and, therefore, is fixed, the price for gas depends on a free market economy [17]: Miners decide which transactions they want to mine, and at what price they sell their gas. They can refuse to mine transactions from senders that offer gas prices below their minimum acceptance level. In Ethereum, the price for gas determines how fast a transaction is being processed [44]. Senders, on the other hand, decide how much they are willing to pay for gas, i.e., for faster processing of their transactions. 7 At the time of writing, the average minimum price for one gas varied between 2 · 109 Wei (2 Gwei) and 3 · 109 Wei (3 Gwei), which, in fiat currency, amounts to a transaction fee of $0.006 and $0.009, respectively, at slower processing times [44]. 8 The upper limit ofgas for one transaction ensures that any EVM bytecode that is executed on the Ethereum virtual machine will terminate; either on its own or because the transaction did not purchase enough gas, and EVM bytecode execution ran out of gas. 17
Types of transactions and transaction fields According to the Yellow Paper [51], there are two types of transactions: (1) transactions that result in a message call from a non-contract account (EOA) to another account, e.g., contract (CA) or non-contract account (EOA), and (2) transactions that result in the creation of a new contract account, i.e., contract creation. As transactions originate outside the blockchain and the world state (state database), transactions must be signed with the private key of the EOA’s address, the transactions are being sent from. Figure 13 shows how different transactions affect the world state: (1) The owner of EOA1 sends a transaction from the address of EOA1 (address1 ) to the address of EOA2 (address2 ). The transaction is signed with the private key belonging to address1 . As a result, when the transaction is executed, a message call is sent from EOA1 to EOA2 . Transactions between non-contract accounts are used to transfer value, i.e., Ether. (2) Another signed transaction is sent from the address of EOA1 (address1 ); this time to a the address of the contract account CA3 (address3 . This transaction results in a message call from EOA1 to CA3 . Subsequently, the EVM bytecode owned by CA3 is executed. In addition, it is possible to transfer value to a contract this way. (3) A signed transaction, which is sent from the address of EOA1 (address1 ) without specifying the recipient’s address (null address), results in the subsequent contract creation of CA4 . 18
Signed transaction Tf rom address1 World state σt World state σt+1 Message call 1 Address2 EOA2 EOA1 EOA2 Value transfer Message call 2 Address3 CA3 EOA1 CA3 Bytecode execution Value transfer 3 Null address EOA1 CA4 Contract creation Figure 13: (1) Signed transaction resulting in a message call from the sender’s account (EOA1 )to EOA2 , and a value transfer from EOA1 to EOA2 . (2) Signed transaction resulting in a message call from the sender’s account (EOA1 ) to CA3 , and subsequent EVM bytecode execution, as well as possible value transfer from EOA1 to CA3 . (3) Signed transaction resulting in the subsequent contract creation of CA4 . Note: The world state σt+1 here does not refer to the final state, but to a substate, in which message calls and contract creation are executed. The final state only contains the result of these message calls as well as the newly created contracts. Transaction fields Therefore, sending a valid transaction requires several fields of information as shown in Figure 14. Both transactions, those resulting in message calls and those resulting in contract creation, require six common fields, which are defined as follows [51]: (1) The nonce is the number of transactions sent from the address of a non-contract account (EOA). (2) The gasPrice refers to the monetary value in Wei the sender of the transaction is willing to pay per unit of gas. (3) The gasLimit refers to the maximum units of gas, the sender wants to purchase for the transaction. The price for gas is paid up-front, and only unused gas is refundable. 19
(4) The to field requires the account address (160-bit identifier) of the recipient if the transaction is to result in a message call to either another EOA or a CA. By contrast, for contract creation, the to field must remain empty. As a result, a new contract account is created, and its address is returned. (5) The value field contains the monetary value defined in Wei, which is to be credited to the recipient’s account. (6) The values v, r, and s are the result of ECDSA [6] signing the transaction with the private key of the sender’s account address9 . Ethereum transactions do not contain a from field because the sender’s account address is recoverable from the outputs v, r, and s10 . Transactions for the purpose of contract creation further use (7) the init field, which is an unlimited size byte array containing (a) the contract’s loader code (constructor) as well as (b) the contract’s EVM runtime bytecode (body). The loader code is executed only once at contract creation and returns the contract’s EVM bytecode11 , which is stored in the Ethereum state database [49]. Each message call sent to this contract afterwards triggers the execution of the contract’s EVM bytecode. Transactions, which cause such message calls, use (8) the data field, an unlimited size byte array. The payload of the data field consists of function-identifying data and the respective parameter values, and is interpreted as bytecode to be executed by the EVM. 9 Cfr. github.com/ethereum/EIPs/blob/master/EIPS/eip-155.md. (Last vis- ited March 10, 2019.) 10 The recovery of the sender’s account address is described formally in Appendix F of the Yellow Paper [51]. 11 This code is often referred to as EVM runtime bytecode. 20
Transaction T 1 nonce 2 gasPrice Empty in contract creation transaction. 3 gasLimit 4 to 5 value Used in contract creation transaction. 6 v, r, s Used in message call transactions. 7 init 8 data Figure 14: An Ethereum transaction T consists of six fields, common to both, contract creation and message call transactions, and two fields (gray) that are specific to contract creation and message call transactions, respectively. Transaction execution Transactions sent from non-contract accounts change the world state σ. As this change cannot be reverted, transactions need to be validated first [51]: (1) A transaction must be a well-formed recursive length prefix (RLP). (2) Transactions must have a valid signature, i.e., the recovered address from the ECDSA signature must be a valid address. (3) The nonce of the transaction must be the same as that of the sender’s account state. (4) The gasLimit must be at least equal to the minimum amount of gas required for a transaction to a non-contract account, e.g., 21,000 gas. (5) The balance of the sender’s account state must own sufficient funds to cover at least the up-front payment for purchasing the 21.000 units of gas necessary for any transaction sent from a non-contract account. After successful validation, a transaction is resolved according to the contents of its fields either to a message call or contract creation. Successfully executed transactions are stored on the blockchain, while the resulting changes, they caused, are stored in the state database12 . 12 Each block on the Ethereum blockchain contains the hash of the root of the state database, i.e. the world state σt+1 resulting from the execution of all the transactions grouped in the respective block. 21
Message call and contract creation Valid transactions result either in message calls between account states or in contract creation as seen in Figure 13, both of which require a set of parameters for execution. For a message call, execution needs the following parameters [51]: (1) the sender, i.e., the address of the account, from which the message call originates13 ; (2) the transaction originator, i.e., the address of the non-contract account (EOA), from which the transaction originated, and which is retrieved from the transaction’s ECDSA signature; (3) the recipient, i.e., the address of the account, to which the message call is being sent; (4) the address of the contract account14 , whose EVM bytecode is to be executed; (5) available gas; (6) the gas price; (7) the value that is being transferred; (8) input data, which is a byte array of arbitrary length; (9) the current depth of stack of message calls and contract creations; (10) the permissions to modify the world state. A message call either results in value transfer, EVM bytecode execution or both. Additionally, in the case of EVM bytecode execution, a message call can result in further message calls or additional contract creation. Message calls, which directly result from a transaction, are called top-level message calls, while message calls resulting from a top-level message call are called inner message calls [23]. Message calls are not stored in the state database because they are the deterministic result of executing transactions. The resulting final world state, however, is stored in the state database. For contract creation, the following parameters are needed [51]: (1) the sender, (2) the transaction originator, (3) available gas, (4) gas price, (5) endowment, i.e., value, (6) a byte array of arbitrary length with the new contract’s initialization bytecode (loader code), (7) the current depth of stack of message calls and contract creations, and (8) the permissions to modify the world state. Contract creation determines the address of the new contract account, sets the nonce of the contract account to one, transfers any value (endowment) to the account’s balance, sets its storage to empty, and initializes the account by executing the loader code (constructor), which returns the contract’s EVM bytecode (body) that is stored 13 The sender of a message call is not always the transaction orginator, e.g. when a message call is sent from a CA to another CA or an EOA. 14 In most cases, this address of the contract account is the same as the recipient. 22
in the state database. The corresponding hashes to the contract’s storage data and the EVM bytecode, storageRoot and codeHash respectively, are stored in the contract’s account state as well. Code execution can result in further message calls or additional contract creation. Again, as contract creation is the result of executing a transaction, only the new contract as part of the resulting final world state is stored in the state database. 1.2.5 Ethereum Virtual Machine (EVM) The Ethereum virtual machine (EVM) as shown in Figure 15 executes the bytecode owned by a contract account (CA). The EVM is a simple stack machine, which consists of (1) a stack, (2) memory (RAM), and (3) a program counter for the RAM. (4) The account storage, which is located in the state database, contains generally-accessible persistent contract data. The location, where the EVM byte- code resides in the state database, serves as (5) a virtual ROM for the EVM, from which it is loaded into RAM. Bytecode executed in the EVM can only manipu- late the stack, the memory, and the account storage. The word size of stack and memory is 256 bits (32 bytes). While the stack is limited to 1024 words, memory is an infinitely expandable word-addressed byte array15 . The account storage is a word-addressed word-array, which contains key-value pairs. Key and value each have a word size of 256 bit. Machine state µ Similar to the world state σ, which maps addresses to account states, there is a state, which keeps track of the volatile aspects of the EVM: the machine state. The 15 While in storage a word-sized (256-bit) key addresses a word-size value, in memory, a word- sized address points to a single byte. Reading from memory is word-sized, while it is possible to write either 256-bit or 8-bit values to memory. However, memory expansion can only be achieved in word-sized steps. Writing to a higher address in memory will first expand the memory in word-sized steps until the called address is included. Then, memory is expanded further until sufficient word-sized space is allocated to write the data to memory. Each step of word-sized memory allocation, as well as writing to memory costs gas. 23
machine state is defined by the tuple µ, which includes (1) the available gas (g), (2) the program counter (pc), (3) the contents of RAM (m), (4) the active number of words in RAM (i), and (5) the contents of the stack (s). As the execution of EVM bytecode instructions consumes gas, the machine state µ tracks the change in available gas: g −→ g 0 . Code execution stops if the program counter reaches the end of the EVM bytecode in RAM. Exceptional halting only occurs if either there is insufficient gas16 available, or the program experiences an exception due to unusual stack behavior. When execution stops, the machine state µ has transitioned to µ0 . Ethereum execution environment The EVM’s execution environment computes the transition σt −→ σt+1 17 , which depends on the machine state’s transition µ −→ µ0 , as well as the value of the remaining gas g 0 . Hence, in order to compute σt+1 , the execution environment has to know the world state σ, the machine state µ, and the available gas g provided by the transaction. The execution environment also needs to be provided with additional information, which is defined by the tuple I 18 . 16 Ethereum’s gas limit ensures that code execution always terminates, i.e., infinite loops are impossible by design [51]. 17 In the Yellow Paper [51], the resulting world state σt+1 is denoted as σ 0 . 18 Ia : the address of the account that owns the code to be executed; Io : the sender address of the transaction that triggered the code execution; Ip : the price of gas, which may vary and determines the gas available for code execution; Id : the byte array that contains the input data for the code execution; Is : the account address that caused the code execution, which may not be the same as Io ; Iv : the monetary value, in Wei, sent with the transaction; Ib : the byte array with the byte code that is to be executed; IH : the block header of the current block; Ie : a number that states how many contract accounts are being called for execution or how many contract account creations are to be executed; Iw : the necessary permission to make modifications to the state, e.g., to the account storage. 24
Execution environment I World state σ EVM State database Machine state µ Account storage Counter (pc) (RAM) Stack i RAM EVM bytecode (ROM) Gas 6060... 6060... Figure 15: Ethereum virtual machine (EVM) [51]. The contract’s persistent storage and read-only EVM bytecode are physically located in the Ethereum state database. However, during contract execution, both, the storage and the EVM bytecode constitute logical components of the EVM. Before execution, the EVM bytecode is loaded into memory (RAM). Input data for the contract’s EVM bytecode is loaded either on the stack or into memory. The world state σ, the machine state µ, the available gas and the necessary additional information I constitute the execution environment, where the bytecode execution is performed. 25
1.2.6 Ethereum Peer-to-Peer Network The Ethereum peer-to-peer network is comprised of a set of network protocols referred to as devp2p [15]. Users run Ethereum implementations 19 on their physical machines. The software implements devp2p and turns user machines into peer-to-peer network nodes. In addition, the software is an implementation of the formal specifications for the Ethereum protocol20 as defined in the Yellow Paper [51]. Thus, machines running an Ethereum implementation turn into Ethereum peer-to-peer nodes as shown in Figure 16: (1) Ethereum full nodes host a copy of the entire blockchain and the Ethereum state database, which represents the current world state σ. Full nodes verify blocks and transactions, and relay them to the network [40]. Miners are required to run full nodes. (2) Ethereum light nodes [21], on the other hand, synchronize with full nodes and download the latest blockchain’s block headers21 . As light nodes do not host a copy of the state database, i.e., the world state, they retrieve data from full nodes on demand. Additionally, light nodes require full nodes to relay the transactions their users send to the Ethereum network [40]. 19 An Ethereum implementation refers to an official implementation of the Ethereum protocol [18]. Such software allows users to turn their machines into Ethereum nodes, as well as to connect to their node with an Ethereum client instance. Ethereum implementations are usually referred to as Ethereum clients. 20 By contrast, the present work has explained these specifications in a descriptive manner. 21 Current Ethereum implementations have a list of trusted full node peers built into their code [40]. Hence, users have to trust the developers, who created the software. 26
Full node World state σ Light node P2P connection World state σ World state σ User World state σ P2P connection Figure 16: Ethereum peer-to-peer network comprised of decentralized nodes: (1) full nodes, which host complete copies of both, the Ethereum state database (world state σ) and the Ethereum blockchain, and (2) light nodes, which depend on full nodes for downloading blockchain information, e.g. block headers and world state data. In addition, light nodes depend on full nodes to replay transactions, sent from light nodes, to other full nodes in the Ethereum peer-to-peer network. Ethereum nodes communicate with each other over a P2P connection, provided by Ethereum implementations running on physical machines. Ethereum implementations Users can choose from various Ethereum implementations. Some implementations allow users to turn their machines into Ethereum nodes, and provide Ethereum clients, which allow users to connect to their local Ethereum nodes, e.g., Geth or Parity. Other implementations, such as Mist, Metamask, the Remix IDE or Truffle only provide Ethereum clients. Most clients use the JavaScript Web3 API [19] to exchange data with their local nodes via the JSON RPC protocol22 , i.e., JSON remote procedure calls [20]. Ethereum nodes are part of the Ethereum peer-to-peer network, where they exchange data via P2P connections as seen in Figure 17. 22 The JavaScript Web3 API (JavaScript API [19]) is a wrapper around the JSON RPC API [20]. 27
JSON RPC connection P2P connection Ethereum peer-to-peer network User machine Client Node World state σ Web3 API Web3 API devp2p User machine User machine User machine Client Node Client Node Client Node World state σ World state σ World state σ Web3 API Web3 API Web3 API Web3 API Web3 API Web3 API Operation system devp2p devp2p devp2p Operation system Operation system Operation system Figure 17: Most Ethereum clients connect to a local Ethereum node on a user machine via the JSON RPC protocol, using either the JSON RPC API [20] or the JavaScript Web3 API [19]. Ethereum nodes, whether full nodes, hosting complete copies of both, the state database and the Ethereum blockchain, or light nodes, which only download such data on demand, connect with each other via a P2P connection. All Ethereum nodes as a whole constitute the decentralized Ethereum peer-to-peer network. 1.3 Smart Contracts Contract accounts that own EVM bytecode are also called smart contracts. This part of the present work highlights the link between source code, EVM bytecode and assembly code (opcode). 28
1.3.1 Smart Contracts at EVM Bytecode Level Solidity source code Ethereum smart contracts23 are programs written in the high-level programming language Solidity, which must be compiled to EVM bytecode for deployment and execution. Listing 1 shows example source code of a contract, which was created with the browser version of the Remix IDE. Solidity’s syntax is similar to ECMAScript, i.e., JavaScript. However, the language has been purposefully adapted to the 256-bit architecture of the EVM stack-based machine. pragma solidity ^0.5.5; contract AddContract{ uint256 public result; function addNumbers(uint256 a, uint b) public { result = a + b; } } Listing 1: Example of Solidity code. EVM bytecode Solidity source code must be compiled to EVM bytecode instructions consisting of byte-sized hexadecimal values [39] in big endian order, i.e., network byte order [51]. EVM bytecode is the machine language, which the EVM processes during execution. The Solidity compiler is called solc, and its installed version corresponds to the version of Solidity the compiler can compile. The source code from Listing 1 compiles to the following EVM bytecode instructions shown in Listing 2: 23 Ethereum is not the only blockchain environment that offers smart contract capabilities. Bitcoin has offered smart contract capabilities from its very beginnings in 2009 [43]. Unlike Ethereum, Bitcoin offers a scripting system called Script, which is not Turing-complete [1]. 29
6080604052 34801561001057600080 fd5b5060c78061001f6000396000f3 fe 6080604052 348015600f57600080 fd5b506004361060325760003560e0 1c806365372147146037578063ef9f c50b146053575b600080fd5b603d60 88565b604051808281526020019150 5060405180910390f35b6086600480 36036040811015606757600080fd5b 810190808035906020019092919080 359060200190929190505050608e56 5b005b60005481565b808201600081 905550505056fea165 627a7a72 3058 20ab0685d5291ce6b9b1a1ea3ca99b 6b44a588bd1655f704921a9f64f080 18d2dd0029 Listing 2: The source code from Listing 1 compiled to EVM bytecode. The EVM bytecode from Listing 2, which is a continuous string of hexadecimal values, consists of three parts [49] separated for illustration only: (1) The loader code (constructor), which is used at contract initialization. The loader code returns (2) the body, i.e., the contract’s EVM runtime bytecode, which is executed every time, the contract receives a message call. The EVM runtime bytecode is stored in the Ethereum database and is immutable. (3) The Patricia Merkle trie hash, which is used to retrieve data from the storage associated with the contract after initialization. This hash is also called the Swarm hash or bzzhash [24] because of the hash’s magic number, i.e., 0x627a7a72, which is ASCII for bzzr. Function signatures and semantics In Ethereum, a function signature refers to the human readable text representation of a Solidity source code function, e.g., addNumbers(uint256,uint256)24 [16]. However, at bytecode level, only the first four bytes of the Keccak256 24 Human readable canonical representation of the function signature only uses argument types [16]. 30
hash (SHA3) are used to identify functions. These four-byte signatures are called function selectors, e.g., 0xEF9FC50B at (2E) in Listing 3. As the input of a hash function cannot be retrieved from the hash function’s output, there is no function to map function selectors back to human readable function signatures25 . However, the Ethereum Function Signature Database [16] maps function selectors to human readable function signatures that have been provided by users. Function signatures provide valuable semantic insight for EVM bytecode analysis: (1) Most programming best practices demand that a function’s name should carry the function’s semantics. (2) The type of a function’s arguments provides some information about what a contract considers valid input. Although the database is an essential tool for identifying function signatures, it is by far not complete26 . This is apart from the fact that names of functions and variables may not be in English: The function signatures ergebnis() and result() carry the same semantic content, albeit in two different human languages, German and English respectively. However, the corresponding function selectors, 0x529B8E60 and 0x65372147, differ significantly. EVM bytecode execution During execution, the EVM bytecode is loaded from the state database into the execution environment, where the EVM bytecode constitutes the EVM’s ROM. The EVM sets the program counter at the beginning of the EVM bytecode and begins to execute each instruction at the current program counter location. As the EVM does not possess registers, input arguments and parameters for instructions are pushed onto the stack. The program counter is incremented after an instruction has been executed. Jumps can change the program counter to any location within the contract’s EVM bytecode, moving it to the respective jump destination. The instruction JUMP [7] moves the program counter to a location in the EVM bytecode. This location is given by the last element pushed onto the stack. The instruction JUMPI [7], i.e., conditional jump, only moves the program counter to the jump destination if some condition is met. The condition is the second to last element 25 In any case, a function selector is not a complete hash, but misses the rest of its 224 bits 26 Cfr. gist.github.com/holiman/563da876c4ce15629f57ffdc4046383b. (Last visited March 20, 2019.) 31
You can also read