Shades of Finality and Layer 2 Scaling
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Shades of Finality and Layer 2 Scaling Bennet Yee† , Dawn Song† , Patrick McCorry‡ , Chris Buckland‡ † Oasis Labs, ‡ Infura January 21, 2022 Abstract semantics, and group theory to help model smart con- arXiv:2201.07920v1 [cs.CR] 19 Jan 2022 tract systems as a way to guide how we look at these Blockchains combine a distributed append-only log design choices, the security issues that arise, and their with a virtual machine that defines how log entries are scaling implications. interpreted. By viewing transactions as state trans- We introduce the following distinct flavors of final- formation functions for the virtual machine, we sep- ity: arate the naming of a state from the computation of its value and reaching consensus on that value. This • log finality, when an entry has been irrevocably distinction allows us to separate the notion of trans- appended to the blockchain log. action order finality from state value finality. Further • transaction order finality, when a transaction’s ef- consideration of how blockchain governance handles fect on the VM state is irrevocably determined, catastrophic common-mode failures such as zero day without necessarily first computing it. exploits lead us to the notion of checkpoint finality. Consensus on the transaction order determines the • state value finality, when the computed state value ground truth. Everything else—computing the value is determined as an efficiently accessed data struc- of a state or handling catastrophic failures such as ture. This is irrevocable except for recovery from bugs / zero-day based attacks—are just optimizations. critical infrastructure failures (determined by gov- ernance), e.g., an adversary exploits a bug in code used by all participants or an adversary was able 1 Introduction to violate a security assumption, such as bribing an above-threshold number of committee partici- pants. The core ideas behind smart contracts and blockchain systems are straightforward, based on a • checkpoint finality, when the computed state value distributed, decentralized append-only log and how can no longer be changed by hard forks and thus we interpret the meaning of those log entries using an becomes truly irrevocable. abstract virtual machine (VM). We argue that transaction order finality, built using While the basic idea is simple, there are many de- log finality, is the key to reasoning about a blockchain sign choices that need to be made before the full system. State value finality and checkpoint finality blockchain system is specified, implemented, and be- are nonetheless important optimizations: they respec- comes usable, e.g., the format of messages that can be tively enable application for which independent pri- logged and their meaning, the VM state machine se- vate state determination is too expensive and permit mantics, etc. These design choices can have important the storage layer to reclaim storage. implications on how the resultant system behaves. In the next section, we present the ideas, notations, While practitioners understand the nuances, expli- and terminologies that we use to analyze blockchain cating these notions can make it easier for their impact properties and use them to discuss some well-known to be discussed and for alternative design choices to blockchain systems and their properties. Then, in Sec- be investigated. This paper attempts to use some ba- tion 3, we discuss what we think are “ideal” rollup sic ideas and notation from programming languages, properties and attempt to sketch what such a system The git hash for this paper is 09ce1681ec7d3117bbe446cd- would be like. Finally, we present concluding remarks 6030b29202e99c3a. in Section 4. 1
2 Key Concepts Bitcoin tokens under the control of the corresponding private keys. In Ethereum-like systems, Key would be In this section, we introduce ideas, notation, and ter- a union of several disjoint sets, e.g., public key hashes minology used to describe and delineate the design associated with Externally Owned Accounts (EOAs), space of smart contract systems, using some exist- contract addresses, contract address and 256-bit ad- ing well-known systems as reference points. Applying dress tuples for naming persistent store locations, etc. these ideas will help us create a simpler, easier-to- Correspondingly, Value would be public keys, EOA bal- understand mathematical / mental model of what all ances, contracts’ code and balance, and 256-bit values blockchain systems do, and to discuss / analyze trade- from contracts’ persistent store, etc.1 offs in current and future designs. These notions apply both to standard blockchains as well as to scaling so- 2.3 Transactions As Curried Func- lutions such as rollups. tions 2.1 Append-Only Log Users of blockchain-based systems propose transac- tions that are recorded by being appended to the The append-only logs used in Bitcoin and Ethereum blockchain. We normally think of transactions as v1 are Proof-of-Work (PoW) based designs, whereas altering state. In a mathematical description, they those used in Ethereum v2, Cosmos, Polkadot, Oasis, are functions that transform state, i.e., State → etc are Proof-of-Stake (PoS) based designs. State. Smart contract entry points take arguments— At the level of abstraction needed here, the only in Ethereum, the callData—and does not fit this thing that we care about is that the append opera- type signature; however, by currying all the user- tion for a PoW log is probabilistic, and an entry is not supplied arguments including authorization informa- considered successful until about 6 additional entries tion (msg.sender, etc), all transactions can be mod- have been made (the distance to the end of the chain eled in this manner.2 is a security parameter here). Typically the idea for Depending on the smart contract system and the needing additional entries in PoW is called “proba- type of analysis we need to do, we may need a bilistic finality”. model that exposes more structure of transactions. PoS logs use committee elections and digital sig- For example, it is useful to model an Ethereum(- natures instead of solving cryptographic puzzles, and like) transaction t as two component functions: first, once the consensus protocol completes and a new log tc : State → State⊥ , representing the call into a con- entry is made, it is considered final. While it takes tract entry point, returning an updated state mod- time for the consensus protocol to run, no additional eling the effects of the contract code execution on entries are required and this is often referred to as contract persistent storage (no change if the trans- “instant finality.” action aborts, with ⊥ representing non-termination); We will refer to the general idea as “log finality,” and second, tg : State → N⊥ representing the gas con- whether probabilistic (with appropriate security pa- sumed from execution (until either commit or abort), rameter) or instant. Log finality is the mechanism returning ⊥ if gas limit would be exceeded (including upon which other notions of finality is built: log final- non-termination). The function t is then the following ity only says that some entry is successfully appended composition of tc and tg , with ab representing the key to the log, but says nothing about what that entry for the balance for account a (which is msg.sender means. for t) and store update notation from semantics: [ab 7→ s(ab ) − gaslim · gas$ ]s if tg (σ) = ⊥, 2.2 Virtual Machine State t(s) = [ab 7→ tc (s)(ab ) − tg (s) · gas$ ]tc (s) otherwise Since “state” can be encoded and represented in many ways, we need to start with mathematical necessities This is a useful separation since most contract virtual to describe what we mean by “state” in a way that machines implement standard ACID transaction se- is divorced from any particular representation. By 1 This is just one way to model Ethereum state; other models, “state”, we mean a function from a key set, its do- e.g., as a set of mappings one for each key type, would be more precise. main, to a value set, its range, i.e., State : Key → Value. 2 Not all state transformations are valid transactions, e.g., a In Bitcoin, Key would essentially be public keys, and state transformation that changes a contract’s state in violation Value would be numbers representing the quantity of of its code invariants. 2
mantics, and an explicit transaction abort reverting This does not mean, however, that we must have an the contract state (failure atomicity) is just tc return- efficient representation of that state, merely that it is ing its input state, though gas fees for computation computable—efficiently, and unambiguously. We ar- performed until the abort must still be paid. gue that the most natural definition for blockchain system state is based on consensus agreement on the 2.4 Natural Names of State order of all transactions since the genesis state (or since a checkpoint state, see Section 2.7). Because users think of their transactions as being ex- Once a transaction’s parameters are logged (has ecuted in a strict serial order and submit transactions achieved log finality), its execution order relative to based on their model of what the current system state earlier logged transactions is determined. Via a sim- (e.g., their EOA balance), a natural way to refer to ple inductive argument, since the input state is com- any reachable state is the sequence of transactions putable by the execution order finality of earlier trans- that yields that state. actions, the resultant state from this transaction can, in principle, also be computed. We call this “transac- 2.4.1 Names And Composition of Transfor- tion order finality.” mations Consensus about the current state is based on states’ natural names, and computing an efficient rep- Transactions are state transformations, and trans- resentation is not necessarily coupled with this deci- forms form a group under function composition. We sion. Everything else is an optimization. use the “;” function composition notation from pro- gramming language semantics, so we write the state after the j th transaction as sj = (t1 ; t2 ; · · · ; tj )(s0 ). 2.6 Efficient Representations of State For a state s that is reachable via a sequence transac- All blockchain-based systems use an append-only log tion from genesis, we call “t1 ; t2 ; · · · ; tj ” (one of) its to record ground truth. From those log entries the “natural name(s).” system determines the consensus reality on the state Note that for Bitcoin, Ethereum, and Arbitrum, of one (or more) virtual machines. However, different the natural name is exactly what is recorded on the blockchains handle state differently. blockchain. Other smart contract systems may only Bitcoin only records transactions in the blockchain implicitly record the execution order, e.g., as part of blocks. The VM is relatively simple and the state a zkSNARK proving correct execution. is the number of tokens controlled by public keys— the so-called Unspent Transaction Outputs (UTXOs). 2.4.2 Names Are Not Unique The logged messages specify how to transfer tokens Note that names are not unique. Many states have using the VM rules, but state is implicit. In contrast, multiple names. If two transactions f and g com- Ethereum also records a representation of the new mute at a given state s, i.e., the resultant state is state that results after all the transactions named (f ; g)(s) = (g; f )(s), then there are (at least) two within the block are applied to the state from the names for the resultant state. Of course, if both f previous block. This state is recorded via a crypto- and g were proposed, the system will eventually de- graphic commitment—the “stateRoot”, which is the cide on some order. This is analogous to how 0 + 1 + 2 root hash summary of the Merkle Patricia trie rep- and 0 + 2 + 1 are both ways to name the value 3. resentation of the Ethereum Virtual Machine (EVM) Note also that some states have no names in this state. Hyperledger [1] takes a Bitcoin-like approach nomenclature. For example, a state where an EOA and leaves the interpretation of results—the effect on controls more tokens than the total token supply is the VM—to clients. unreachable, since (absent a catastrophic bug) there Note that Bitcoin miners must also have an efficient are no sequences of transactions from the genesis state representation of state, since in order to verify new that would result in such a state. This is fine, since blocks double spending checks etc must be performed— such unreachable states are of no interest. it is just that this state does not show up on-chain. Availability is separate from permanence: in either case, miners has to either “catch up” by replaying 2.5 Ground Truth transactions since a checkpoint (more on that later) Ground truth for smart contract / blockchain systems or obtain a copy from somebody else. Apart from re- is knowing what the current state is supposed to be. playing transactions, there is no way for Bitcoin min- 3
ers to verify state obtained from an untrusted party; which they are to be executed. This names the resul- Ethereum miners, on the other hand, can readily ver- tant state, and everything else is secondary. Comput- ify the Merklized data structure (stateRoot). ing and agreeing on this state can come later, as fulfill- Rollup systems use an existing underlying ing a promised value. The result—a correspondence blockchain as a security anchor to allow “off-chain” between a named state and the state representation— execution, where smart contract code executes in a can be similarly recorded to make state representation separate VM (“rollup”), so that many transactions sharing easier. That is to say: Bitcoin logs only ex- can execute there while requiring fewer (or cheaper) ecution order; Ethereum logs both execution order transactions at the underlying blockchain. This is an and an efficient representation of the resultant state attractive scaling solution, since (presumably) compu- in a tightly coupled manner; decoupled layer 2 rollup tation on the rollup is cheaper, and a single underlying designs, on the other hand, logs both but separately, blockchain can support multiple rollups. Optimistic so that problems with the validity of the computed rollup designs like Arbitrum [8] commit the trans- state does not necessarily invalidate the committed action order to a bridge contract in the underlying transaction order. blockchain, computes the resultant state in the rollup, Note that it is important to calculate state and and sends that result to the bridge contract in the to reach consensus on it, since blockchains are not underlying blockchain to validate. We will discuss var- closed systems: transfers between rollup and underly- ious validation schemes later (see Section 2.10); the ing blockchain is an example of external actions that key observation here is that transaction order can be depend on state; in general, any contracts that can determined in a separate step from state computation. cause off-chain effects, such as the shipment of goods, Another way to think of this is that the resultant require consensus and state value finality. state from the execution of a transaction t is named In our view, obtaining an equivalent but more effi- once we have the order of all transactions starting cient representation of a given state is an important from the genesis state up to and including t. We may optimization. Indeed, committing the transaction ex- not yet have computed that state as a data structure ecution order yields one sense of finality—transaction that allows the key-value lookup to be performed effi- order finality, allowing us to name a committed state. ciently, but there is no doubt what that state would Computing the stateRoot and reaching agreement on be—everyone given its name will arrive at the same the result yields a second sense of finality—“state abstract mapping, even if its concrete representation— value finality”, allowing us to confidently use the actual choices of data structures—may differ. Bitcoin state’s value or representation. only names the current state.3 Ethereum-like systems both name and compute one particular representation 2.7 Checkpoints of state. Some rollups separate naming from state com- putation, where the transaction order is committed to Replaying all transactions from genesis is expensive the layer 1 blockchain first, and the correct resultant and as a blockchain system is used it quickly becomes state is computed off-chain, in the layer 2 rollup, and prohibitively so for new participants wishing to join committed to layer 1 at a later time. Other rollups use the system. The storage cost of the blockchain blocks a mempool design for the rollup, so that the transac- grows linearly over time, and for Ethereum maintain- tion order is determined by rollup nodes rather than ing any sort of availability for old trie state can become in the underlying blockchain, and that order is com- quite expensive. In the past, Bitcoin performed a sort mitted along with the computed state hash. Such a of uber-commit by releasing new clients with “check- design reduces the number of transactions in the un- points”, where the state at some block height is built derlying blockchain, trading off reduced transaction into the client. Blocks with lower block numbers can processing cost there for tight coupling between trans- be safely discarded, since everyone has a state—the action order finality and state determination. checkpoint state—from which to replay newer transac- tions. Bitcoin has since removed checkpoints because The separation of concerns should be explicit. The it was viewed as creating confusion/misconceptions consensus layer is responsible for making an im- around the security model. mutable, append-only log. Its primary purpose is to In addition to reducing the entry cost for joining the record the transaction—the calldata—in the order in system, checkpointing reduces the cost of record keep- 3 Though states are realized periodically, when checkpoints ing for existing participants. This is a “meta” level of are created. See Section 2.7 for more details. finality: the effect of transactions that landed prior to 4
the checkpoint cannot be disputed, since the records vestments) than with computation power limitations. associated with them are likely to be unavailable. Furthermore, such a length estimate only applies to Beyond cost of entry or on-going operational costs, the whole chain segment since the victim fell asleep checkpoints are also used in conjunction with gov- and does not help much with the “longest” predicate ernance mechanisms for catastrophic error recovery. since the fork can be quite recent. Many blockchain systems that have experienced prob- Without Inception-like powers to mount eclipse at- lems, e.g., massive token losses due to bugs in critical tacks [6], an adversary should be unable to confuse blockchain code or smart contracts, have resorted to potential victims. A potential victim can verify that hard forks to recover from such errors despite transac- they have a current view of the blockchain: they just tions reaching state finality; checkpointing too soon— securely query N sources for the hashes of recent if records older than checkpoints are not kept around— blocks on the chain. If a majority M of these hashes would prevent recovery by reverting to the checkpoint are on the same chain and these sources are honest, state and (optionally) re-executing (using fixed ver- not eclipsed, and have continued to be blockchain ob- sions of the code, etc) transactions in their original servers , the potential victim will be able to distinguish logged order. the global consensus chain from a forked chain cre- ated with long-range leaked keys. Here M ≤ N is a 2.7.1 Checkpoints vs Long-Range Attacks security parameter, which can be chosen so that the Inception-esque adversary will have to additionally Note that checkpoints addresses a different problem compromise prohibitively many more keys (or their than long-range attacks. In long-range attacks, we are holders) than just those of old consensus committee worried about exposure of old cryptographic signing members, since any blockchain observer can witness keys used by past consensus committee members in recent blocks and thus N can be much larger than a PoS design [2]. Such members may have exited the the size of a consensus committee. ecosystem and their old keys are no longer handled carefully; worse, such past members may rationally 2.7.2 Zero-day Attacks / Common-mode decided to auction their signing keys on the dark web, Failures since they no longer hold any tokens and have nothing to lose. Such keys are useless when used to present bo- Checkpointing is intended for addressing the handling gus information to blockchain participants that know of relatively recent zero-day attacks where a super- the current committee composition and have been threshold number of the current consensus commit- tracking transactions and committee elections. How- tee members have been compromised, or a newly dis- ever, consider a threat model where a Rip van Winkle covered vulnerability in the rollup software is being victim wakes up after a period of inactivity and is exploited. We do not envision it being useful for han- somehow placed in an Inception style virtual world / dling attacks that had not been noticed for a long time, faux information bubble. That bubble filters out legit- since any actual means of addressing it will be com- imate information about a PoS blockchain’s current plicated. The cascading causal relationship of newer state and instead only makes available information transactions depending on the output states of older constructed to make a forked chain—made feasible transactions is likely to lead to an explosion of trans- due to a super-threshold number of exfiltrated/com- actions that will be aborted in a new interpretation promised keys of members from an old consensus com- of their effects when they had successfuly committed mittee [2]. before. The Inception information bubble is an interesting The design decision is whether to perform check- threat model. If applied to PoW blockchains, a victim pointing at all, and if so, how old—in real time, block cannot know if the chain that they see is indeed the numbers, etc—must a finalized transaction be before longest chain—the “longest” predicate amounts to a it might be included in a checkpoint. This decision is universal quantifier, and global information is needed. essentially the blockchain version of a statute of limi- Estimates for how long the chain should be might be tations in many legal systems. Unlike normal statutes feasible if there is trusted time, but that’s probabilis- of limitations that specify a time limit that is specific tic in nature: block production rate depends on both to the type of crime—and for some crimes there are protocol parameter changes and the number of active no limits—here the checkpoint is global in scope: all miners, which has more to do with the economic at- transactions, regardless of which contracts they might tractiveness for participating (relative to all other in- be associated with, have to be treated the same way. 5
In our mathematical model, deciding / making a make sense, it must include all callData for the state checkpoint is essentially choosing a state and “hard- to be computed. This is a data availability require- coding” the association between its natural name ment. Transactions (their callData) does not have to (or block number) and a specific value in a com- be directly in the blocks, though that is the simplest mon, efficient representation. The treatment of check- design choice. If a highly available data repository, i.e., points is analogous to how the genesis state s0 ’s a data availability layer (e.g., IPFS [3]), can be used, value is hard coded in the system. Periodically, a then lists of transactions can be stored there and the state scj that has been long finalized is chosen to on-chain data can simply be the name by which the become a checkpoint, where c0 = 0, cj ≪ cj+1 , and transaction list can be obtained. scj+1 = (tcj +1 ; · · · ; tcj+1 )(scj ). For all intents and pur- Here, feasibility must include some notion of verify- poses, the checkpoint state behaves like the genesis ing data availability, e.g., signatures from a quorum of state; after the checkpoint creation, state names can availability providers. It is not enough to use a simple be specified relative to the checkpoint. content-addressable storage (CAS) without availabil- The checkpoint’s state / value association is a tem- ity guarantees, since otherwise we face the following poral barrier: transactions earlier than the checkpoint dilemma when an adversary computes the CAS name have “checkpoint finality”, since records that would without actually making the data available. In such enable their replay in the alternate bug-fixed environ- a scenario, we either sacrifice liveness, having to wait ment are unavailable. Note that unlike transaction indefinitely for the callData to become available, or order finality and state finality which typically occur try to use timeouts and treat unavailable callData as at the same time as when their log entries are made, implicit aborted transactions and move on. This latter i.e., at log finality, checkpoint finality could occur long choice sacrifices finality: a user can nullify their trans- after the identification of a state as a checkpoint can- actions after the transaction’s execution order has didate. For example, a system could log that a state been decided—by making their callData unavailable— will become the next checkpoint once the blockchain’s if the user decides that the execution order is not block height reaches a certain value (which is expected to their advantage, so until state finality has been to occur in about six months, say) or when a quorum reached, transaction order finality does not uniquely of time oracles attest that a certain date has been specify a resultant state. reached. The log entry makes the decision irrevoca- Note that Ethereum’s stateRoot has a symmetric ble, but checkpoint finality does not necessarily occur CAS availability problem. The state output from a until other gating conditions are met. transaction is needed as input to the next transac- Separating state finality from checkpoint finality to tion, and a lack of availability here means that the handle the possibility of catastrophic failures intro- new state’s representation cannot be easily computed, duces risk for those who need to take actions external destroying liveness.4 to the rollup based on state finality. External actions cannot always be rolled back. We believe that this can be handled using an insurance model: for example, an 2.9 Execution Ordering insurance policy based on the type of external action Execution ordering can be quite important, since any and risk profile could be offered to make participants given two state transforming functions might not com- whole, should a checkpoint/replay invoked due to a mute with each other. The notion of “front running” catastrophic failure cause the proper external action is not new with blockchains but has been unethically to change. (and illegally) practiced in stock and commodities mar- kets [12]. Front-running uses the ability to influence 2.8 Execution Parameters: callData execution ordering results, where additional orders are injected in front of (and often also behind) orders that When we name states via transaction order, it is a may move the market. For example, if Trey submits “natural” name as a sequence of state-transformation a transaction g to buy a large amount of some com- functions as applied to the genesis or a checkpoint modity, it is likely to cause a price increase (“move state. The state-transformations must be fully speci- the market”). If Fanny knew about his order and can fied, i.e., the transaction data (callData) are function inject in a pair of transactions to sandwich his: the net parameters to the transaction call that, when curried, allow us to view the transaction as state transforms. 4 Obviously the input state can be reconstructed by transac- In order for the commitment of transactions to tion replay, but that is not efficient. 6
is a pseudo-conjugate transaction g 0 = f ; g; f ∗ , where assumptions (e.g., one-way functions), but also com- f denotes a transaction to buy the same commodity pute resources such as hashing power and monetary at market price, and f ∗ denotes a “pseudo-inverse” resources such as control of some fraction of the total transaction to sell the same amount, at what will be a stake. By scaling efficiency, we mean both in terms higher market price, to exit the market and reap prof- of normal resource usage and in terms of resources its.5 In Bitcoin, Ethereum, and most layer 2 rollup needed to achieve a desired level of security. Table 1 designs, the (layer 1) miners choose transactions from shows several different rollups and their state transi- a mempool of proposed transactions and can choose tion security mechanisms. and order transactions within the new block in any order that they want. Rollup Base Security VM chain Types Ideally, transactions that offer approximately the Arbitrum Ethereum Multi-round EVM/AVM same gas fees should perhaps be handled on a fraud proof first-come, first-serve basis, with the order decided Oasis Oasis Bare-metal Rust based on approximate timestamps. Standard Bitcoin, consensus fraud proof (Wasm), Ethereum, and rollup designs are all vulnerable to or- layer (Ten- EVM, dermint) (extensible) der conjugation, though there are research on ways Optimism Ethereum One-round EVM to maintain fair ordering for some definition of fair- fraud proof ness [9, 13, 4]. StarkEx Ethereum Validity Swaps How to decide on an execution ordering is a largely proof ZKSync Ethereum Validity Transfers orthogonal design decision. The scheme used in Bit- proof coin and Ethereum (v1) is leaderless and probabilis- tic, since miners can independently choose the trans- Table 1: Comparison of Different Rollup Designs actions to include in the blocks they mine. For L2 rollups with decoupled ordering and state value deter- mination, the transaction order can be determined in 2.10.1 Proof of Work many ways: (1) as a “null hypothesis”, in L1 submis- sion order (susceptible to L1 front running); (2) via While Bitcoin uses a PoW-based log, there is no ex- trusted off-chain schedulers that (fairly) order batches plicit decision to be made about the correct state since of transactions; (3) via trustless decentralized applica- no state is recorded. tions on other blockchains that perform batch schedul- In Ethereum v1, PoW is used and hashing is the ing, using commit-reveal of transaction details to pre- resource bottleneck. All miners are validators, and the vent biased ordering; etc. The potential design space security assumption is that malicious actors cannot is large, and exploring this is on-going research. control more than half of the hashing power or mount a “51% attack.” Here, validation means re-executing the transactions to verify that the new stateRoot 2.10 Deciding on the Correct State matches. Invalid blocks are ignored and not consid- ered as extending the chain. The mechanism by which the correct state is decided One issue is that even though it appears that all is security critical. There are many different design miners independently validate transactions, the in- choices that have been explored, with different secu- centives for rational miners are to behave otherwise. rity assumptions and scaling efficiency. By security as- Because the execution order and stateRoot must be sumptions, we mean not only common cryptographic placed in the same block and thus are tightly coupled, 5 The notion of conjugacy from group theory captures the there is a performance advantage for small miners idea of frontrunning, f ∗ is not the inverse because transac- to join together and form a mining pool/cooperative. tions are not invertible: gas fees must be paid. Furthermore, Rather than individual miners re-executing transac- the transactions occur at the market price and the portion of tions from the block to compute the stateRoot, a min- state representing the token balance for Fanny would not re- turn to its original value—the point of front-running is to make ing pool could compute the stateRoot once on the money! Because of this, we say that f ∗ is a “pseudo-inverse.” pool’s fastest machine, and then devotes all of their Additionally, if Fanny wants to hide her activities the trail- compute power to solving the hash puzzle. This obvi- ing transaction could, for example, sell a smaller amount to reap most of her profits without completely exiting the mar- ously also centralizes what would have been replicated ket, making identifying and matching the leading and trailing transaction executions, robbing the system of the de- transactions difficult. gree of independent verification represented by the 7
mining pool members. 2.10.4 Optimistic Rollups This is not (yet) an acute problem, since hashing costs dominate that of VM execution to process the In optimistic rollups, validators are any entity that can transactions, and the savings from sharing stateRoot re-execute the transactions and compare the resultant is not significant. The incentive to do so increases state. An executor commits its result as a Disputable if/when the gas limit becomes high enough so that Assertion (DA) to the underlying blockchain, and any VM execution cost becomes noticeable. validator that finds a discrepancy can issue a chal- lenge (and the validator becomes a challenger). After dispute resolution, if the DA is found to be incorrect, 2.10.2 Proof of Stake the executor is slashed and the successful challenger Ethereum v2, Cosmos, etc are PoS based blockchains. earns a reward. This means a single honest validator All consensus committee members are also valida- is enough to keep the executor honest. tors, and the security assumption is that malicious Validators must be given time to re-execute the actors cannot amass more than a supermajority (usu- transactions and to generate challenges. Because re- ally 2/3) of the total stake. Validators earn block execution is more expensive, this can create a signifi- rewards. Stake-based voting and reward distribution cant delay for state finality: a DA is only considered remove most of the incentives to mount Sybil attacks, accepted when the disputation period has passed. The but does not address other attacks such as exploiting appropriate length of the disputation period depends zero days, which are common-mode faults. on the maximum gas allowed in a rolled up batch of transactions and other factors; current designs allow as much as a week. 2.10.3 ZK Rollups In Arbitrum, users are expected to not have to wait Zero-knowledge (ZK) rollups use zkSNARKs to prove for the disputation period, based on the notion of the correctness of the rollup virtual machine execution. “trustless finality”. The scenario is that a user has sub- Because security proof verification is involved, a single mitted a transaction and it is in-flight: a DA includes honest verifier is enough to keep the executor/prover it, but the disputation period has not yet passed. The honest. user wishes to propose a new transaction that depends The key idea is that proof verification is very on its result. If there are doubts about the DA, the cheap; thus, the security parameter for the num- user might be hesitant to propose the new transac- ber of validator—proof verifiers—can be easily in- tion. The argument is that since the user could act as creased. Unfortunately, proof generation is relatively a validator and re-execute all the transactions leading expensive, so while payment transactions are feasi- up to the DA and thus independently validate their ble (ZCash [7]), the case for general smart contract mental model of system state, they should be able to computation is more difficult. Proof generation for freely submit their new transaction proposal. This is Ethereum-style computation is being developed/re- essentially using transaction order finality: users trust searched (zkEVM [11]). It remains to be seen how the underlying blockchain’s recording of the rollup good a scaling solution this represents, since the transaction order and that the VM is defined and rollup where the proof generation occurs may be implemented correctly to be deterministic. slow/resource intensive, so even though the underly- Another argument is because having multiple ex- ing blockchain might be able to run bridge contracts ecutors/validators is feasible. Here, they can post ad- for many ZK rollup rollups, the aggregate through- ditional DAs (for new transactions) that depend on put and efficiency may still be insufficient for general, earlier DAs that have not yet reached state finality. complex applications. These additional DAs provide additional assurance The proof verification algorithm takes time to run. that the earlier DAs are correct, since presumably the This should be done within the bridge contract, and new dependent DAs have validated the earlier DAs. the typical description of ZK rollups is that an in- Currently, Arbitrum has a single centralized executor, valid proof is treated as if no proof were submitted, though anyone can be a validator/challenger. i.e., bridge contract aborts the faux proof submission Verifiers suffer from the verifier’s dilemma [10, 5]. transaction. As long as there is enough replication in Even though proof verification might be cheap, it is the underlying blockchain so that the consensus there cheaper still to assume that somebody else has done reflects the correct execution of proof verification, the the verification, when the odds that an executor will security of the rollup is guaranteed. try to cheat appears to be low. 8
2.10.5 Bare-Metal Fraud Proofs Equivocation is punished by slashing stake like in many consensus designs, and we will assume that each The bare-metal fraud proof approach is a traditional committee member will sign at most one state as the fast-path / slow-path approach seen in systems design, result from the execution of a batch of transactions where the fast path efficiently handles the common henceforth. case, with a slower path that acts as a fallback for the uncommon case(s). Here, a single honest node The key idea behind DD is that when all compute in the fast path suffices to trigger slow-path verifica- committee members agrees on the resultant state, we tion. It as agnostic with respect to the rollup VM, can commit it to the log, and this is secure as long since there is no need to single-step VM execution to as the size of the primary compute committee has find exactly where an error occurred. To understand enough members so that the probability that none of the bare-metal fraud proof approach, it is useful to the reporting members are honest is negligibly small. understand the system architecture. If there is any disagreement among the primary com- The bare-metal fraud proof approach can be viewed mittee member, we do not know which of the reported as a form of optimistic rollup, but with a committee output states is correct—this is the key difference from of nodes executing in parallel verifying each other other state validation schemes: DD only performs er- rather than taking the executor vs challenger view. ror detection and not error correction. Instead, we The VM execution of smart contracts is separated switch to the DR protocol with the larger backup from consensus. There are two types of committees: a committee, and that output state is deemed to be the consensus committee executing a consensus protocol, correct one. In essence, the DR protocol is a param- and one or more compute committees executing the eter to the bare-metal fraud proof approach: the DR VM—similar to rollup executors running smart con- protocol can use a much larger committee—even all tracts off-chain—doing the heavier lifting of general available nodes—and use much more resources. Be- smart contract execution. The consensus layer records cause those DD nodes that reported a differing out- the resultant VM state from the compute committee. put state from that determined by DR will have their stake slashed, there is no incentive to deviate from Unlike the consensus committee which runs a sin- the protocol unless the adversary can either control gle consensus protocol, we view the compute layer as the entire DD committee or enough of the backup running two protocols: the fast-path protocol of dis- committee for DR to fail. crepancy detection (DD), where we can use a smaller sized primary committee; and the slow-path protocol Note that the choice of DR does not have to only of discrepancy resolution (DR), where a (much) larger involve larger committees: we could use a bisection backup committee would be used. The goal, which is algorithm to find the single VM instruction at which justified by the security parameter calculations (see the computation of the two (or more) resultant states Appendix A), is to use small primary committees with- first diverged [14, 8]. While this approach is great out compromising security. This is the source of the from an algorithmic standpoint, it is challenging to efficiency improvement of the bare-metal fraud proof implement since the entire VM instruction set must approach: the amount of replicated computation is be re-implemented in the bridge contract to see which significantly reduced in the fast path, with an incen- instruction executed incorrectly and thus resolve the tive design such that the slow path should never be discrepancy. Maintaining VM agnosticism allows the used. system greater flexibility: a VM can be designed to al- Note that transaction ordering is an orthogonal low the VM programs to be compiled to (sandboxed) design decision from the use of DD/DR. It can be native code, allowing applications such as data anal- decoupled and committed to the consensus layer as ysis that is not currently feasible on blockchains or separate log entries prior to smart contract execution, rollups. or a leader from the compute committee could be The consensus layer accepting and committing a selected (possibly rotating) to choose an execution state from DD/DR yields state finality. Just as an order from the available transaction proposals. exploit using zero-day vulnerability/bug in imple- However execution schedules are determined, all mentation would be a common-mode failure in all members of the compute committee run in parallel blockchains is handled using replaying transactions and determines the state that would result from the from checkpoints, external challengers can provide ev- execution of a batch of transactions. Compute com- idence of malfeasance even when DD/DR fails. In such mittee members sign their computed output states. a scenario, the transactions since the last checkpoint 9
(or the disputed point) is replayed in the same order, times? With a design that separates transaction or- using software with bug fixes applied, to compute the der finality and state finality, as long as the average correct state. transaction processing rate is sufficient, the system The Oasis network is a blockchain system that uti- should be able to tolerate occasional high execution lizes the bare-metal fraud proof approach. Multiple time transactions, since users will primarily care about rollup VMs or “ParaTimes” are supported. Transac- transaction order finality, assuming that the expen- tion ordering is handled via a mempool, and a rotating sive transaction do not interfere or interact with most leader in the compute committee chooses the transac- users’ contracts’ state. tion order in its transaction batch. This reduces the This is still an area of research and some design number of consensus transactions since transaction constraints are unknown. We present a sketch of an order finality is not critical when the rollup execu- “ideal” design below. tion is fast, though decoupling can be introduced in later iterations as needed. The Oasis ParaTime archi- tecture, where the compute layer results are subject 3.1 VM Job Queue and Transaction to DD/DR before being committed to the consensus Order Finality layer, is a minimal rollup design: the bridge contract that validates the rollup VMs is baked-in and only The ideal rollup design uses “instant” finality of a PoS supports DD/DR validation, keeping the consensus blockchain (e.g. Tendermint, Eth2) to obtain transac- layer simple. tion order finality. That is, transaction proposals are essentially jobs entered into the virtual machine job queue; once entered into the queue, its order is final. 3 Ideal Layer 2 Design This implies that the transaction callData cannot be validated with full virtual machine semantics, since The ideal rollup design allows efficient (off-chain) ver- that requires close coupling between the log and the ifiable computation with fast finality. Here, efficiency virtual machine. One extreme design choice is to al- means: low arbitrary messages to be logged to target the VM’s job queue, and rely on the cost of logging to • Reduced replicated computation. We do not want deter denial-of-service attacks. This means, however, to use resources, e.g., use an enormous amount of that the cost of VM execution to verify and discard electricity, when it is unnecessary for the required ill-formed message (e.g., unsigned, unparseable, non- level of security. Reduced replication should also existent contract entry points, etc) must be included reduce the cost per transaction, making the sys- as part of the cost of message handling. tem more scalable from an economic perspective. Message posting has to cost something in or- • High confidence in results. This is the converse to der to prevent denial-of-service attacks. This means replication: users should be able to trust the re- some checks must already be done at the underlying sults from the verified computation, even though blockchain: the message itself must be signed, and it the replication factor might be lower. must authorize payment from the EOA to minimally pay for the message posting itself, as well as a (lim- • High throughput. The overall system throughput ited) payment authorization for the VM execution should be high, so that the system will perform (gas limit). well with a high workload. Since signature check is effectively a sunk cost, an • Low latency. Proposed transactions should not alternative is to perform some very simple verification have to wait too long in queues. Users should of proposed transactions: have reasonable confidence that once a transac- tion is accepted, it will have the desired/expected • Transaction authorization: valid signature on the effect. proposal. What an “ideal” design might look like depends • EOA account has enough tokens to pay for mes- on other factors, such as the expected workload. For sage posting. example, as systems become more capable, will appli- cations need to run larger transactions that have a • (Optional) EOA account has enough tokens to high gas limit? What is the distribution of execution pay for the maximum gas payment, using an 10
lower-bound estimate from the queue of trans- the secondary key might approximate a “fair” order, actions for which state finality has not been it allows a frontrunner to trivially creating pseudo- reached. conjugate transactions by proposing a pair of trans- actions at a slightly higher and slightly lower gas fee. The contract call parameters (callData) is just opaque More research is needed here, especially since imple- bytes at this point. menting an approximate fair scheduler will introduce All signed transaction proposals with these ba- additional latency to the transaction processing. sic checks are considered “well-formed transactions” (WFTs). Note that we do not perform contract en- try point-specific type checks or other per-contract 3.2 Data Availability and Garbage pre-condition checks at the bridge. Well-formedness Collection does not imply that the message makes sense. WFTs that do not pass message format checks will simply be The availability of callData and state representation aborted with the sender charged a small gas fee: in our are both important. From the transactions and the formalism, this means that the interpretation is that ordering, we can reconstruct the state value. From a tc returns with its input state. The same thing ap- finalized state, earlier callData is not needed except plies for other pre-conditions: those would be checked for catastrophic failures; from a checkpoint finalized by the contract code (e.g., the equivalent of require state, earlier callData is no longer accepted. checks in Solidity) and similarly cause the transaction While the callData is typically stored on-chain, the to be aborted. This conforms with the decoupled na- state representation is too large and cryptographic ture of the system design: the bridge contract does commitments are used instead. That doesn’t mean not know what (possibly new) contracts have been that transaction callData has to be on-chain. instantiated on the rollup, and doing typechecking One scenario is that an external trusted fair schedul- would be impractical. ing service could be authorized to determine transac- Contract invocations in Ethereum-like blockchains tion order, with the job schedule stored in an external are internally structured as essentially remote proce- CAS store with availability guarantees, so that all that dure calls, with a message that must be (partially) the only on-chain data is a cryptographic hash com- parsed to identify the intended entry point and dis- mitment. The external highly available storage would cover the type signature, and then to further parse have to have service-level agreements to maintain the and typecheck the rest of the message. While it might data to remain accessible until at least the data ages be reasonable to say that the message encoding is ar- beyond checkpoint finality. chitecturally fixed and performed by the rollup VM, The scheduler could run as a decentralized applica- it is feasible—as is done by Solidity for the EVM—to tion on a separate blockchain. In this case, the schedul- perform this in the contract code itself (in Solidity’s ing algorithm / code is open for inspection and audit case, in the language runtime which performs the RPC and does not have to be trusted; its execution would dispatch). Since different contract language runtimes derive integrity from the blockchain on which it runs, could, in principle, require completely different data and data availability derives from the availability of serialization formats, performing message format ver- this blockchain as well. ification in the bridge is infeasible without knowing Rather than using a separate blockchain, the sched- details about how each contract’s RPC message re- uler could run on the rollup VM, with the traditional ceiver/demultiplexor is implemented. mempool used only for submitting callData to the The simplest design is to view the bridge contract scheduler via job-submission transactions. In such a as accepting WFTs and require runtime checking for design, the callData and transaction order would sim- rollup-specific, contract-language specific, or contract- ply be stored as scheduler state in the existing decen- specific pre-conditions. tralized data availability layer. The rollup executors The arrival order at the job queue maintained by would have access to the storage services that holds the bridge contract does not necessarily specify the this and no external highly available data storage ser- execution order. Clearly, as long as a rollup allows vice would be needed. transaction proposers to offer variable gas fees (as op- With this kind of integration, the normal state posed to a fixed market rate as a gating threshold), garbage collection mechanism handles reclaiming pre- jobs will need to be sorted by gas price. And while checkpoint callData: the scheduler can keep the call- using gas price as the primary key and arrival order as Data in its persistent storage and only remove these 11
entries after a checkpoint has occurred. 3.4 Checkpoint Finality DD/DR handles independent failures, e.g., through bribery of key personnel, bugs in custom deployment infrastructures, etc. The existence of zero day vulnera- 3.3 State Finality bilities / bugs mean that there are common-mode fail- ures. The “traditional” way to address such common- State finality determination is where replicated com- mode failures—especially when their exploit was large putation occurs, and where efficiency and computa- scaled—is to hard fork and change a checkpoint’s tion integrity appear to be diametrically opposing hard-wired mapping from a natural name to a state goals. value/representation. Using reasonable security parameter estimates (see We don’t have a better solution, just (hopefully) bet- calculations from Appendix A), we see that if zk- ter terminology / clearer analysis: checkpoints must SNARK proof generation costs more than about 25× lag the current head of the blockchain significantly the cost of normal execution, then it will have essen- to permit detection and recovery from common-mode tially no replication efficiency advantage, ignoring the failures, but not so far as to make the long-term high- costs of the replicated proof verification. Even if it availability data retention too costly. This is a bal- could be faster, it increases the time to state finality ancing act, and the appropriate value depends on the as compared to DD/DR, where the rollup commit- estimated likelihood of common-mode failures, busi- tee members execute in parallel. If we are willing to ness requirements of dependent contracts, etc. move away from EVM compatibility for new smart When to perform a checkpoint should be formal- contracts, depending on the contract language and ized and made part of the system design and be sub- runtime environment design, it should be feasible to ject to change by governance. Storage reclamation allow smart contract to execute at (near) native code / garbage collection and catastrophic fault handling speeds with comparatively less engineering effort than are intimately connected. The checkpoint state repre- proof generation/verification, making the comparison sentation and all newer transaction callData must be even more one sided. available to allow recomputation of transaction results More importantly, the DD/DR approach is straight- in case of common-mode catastrophic faults, but stor- forward to analyze and its implementation is much age for older state representations or callData can be simpler. Both are critical traits for practical security: safely reclaimed. In order for this to work, the servers users need to understand why they should trust a sys- in the decentralized data availability layer must be tem; more importantly, complexity is antithetical to aware of rollup state, e.g., act as a light client and correct, auditable implementations. observe checkpoint determination events themselves or rely on witnesses that relay the information. Absent a breakthrough in proof generation perfor- mance, a DD/DR implemented between commitchain committee and bridge contract provides a good design 4 Conclusion choice for security, efficiency, and generality. Note that just as there is the verifier’s dilemma for The main contribution of this paper is that transac- ZK and optimistic rollups, there is a parallel prob- tion order finality should be viewed as the key for lem in replicated execution / voting schemes. Execu- determining system state. Computing an efficient rep- tion piggybacking is a potential problem, where an resentation is just an optimization, since state can executor doesn’t bother to compute their own resul- be reconstituted from a checkpoint state and those tant state but just re-uses the results from another transactions that follows it. We identify the following (hopefully honest) executor. To address this, we could shades of finality: log finality, transaction order finality, try to do something like commit/reveal for results, or state finality, and checkpoint finality. These notions are have confidential compute requirement for the rollup. useful for reasoning about blockchain design and the Except for trusted computing style mechanisms, such design space for error/fault handling, from indepen- approaches are insufficient: a group of rational execu- dent faults due to Byzantine actors to common-mode tors can collude to save costs. Fortunately, as long as faults due to zero-day software defects. there is at least one altruistic executor to cross verify, Based on considering how and when these finality the DD/DR approach works. properties should be achieved, we developed a prelim- 12
You can also read