The Availability-Accountability Dilemma and its Resolution via Accountability Gadgets
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
The Availability-Accountability Dilemma and its Resolution via Accountability Gadgets Joachim Neu Ertem Nusret Tas David Tse jneu@stanford.edu nusret@stanford.edu dntse@stanford.edu ABSTRACT This point of view is advocated by Buterin and Griffith [6] in the Byzantine fault tolerant (BFT) consensus protocols are tradition- context of their effort to add accountability (among other things) ally developed to support reliable distributed computing. For ap- to Ethereum’s Proof-of-Work (PoW) longest chain protocol, and plications where the protocol participants are economic agents, is also central to the design of Gasper [7], the protocol running recent works highlighted the importance of accountability: the Ethereum 2.0’s Proof-of-Stake (PoS) beacon chain. In these pro- ability to identify participants who provably violate the protocol. tocols, accountability is used to incentivize proper behavior by We propose to evaluate the security of an accountable protocol slashing of the stake of protocol-violating agents. Other protocols in terms of its liveness resilience, the minimum number of Byzan- that are designed to provide accountability include Polygraph [13] tine nodes when liveness is violated, and its accountable safety and GRANDPA [40]. A recent comprehensive work [25] shows that resilience, the minimum number of accountable Byzantine nodes accountability can also be added on top of some but not all existing when safety is violated. We characterize the optimal tradeoffs be- Byzantine fault tolerant (BFT) protocols. tween these two resiliences in different network environments, and Given the importance of accountability, a key question is how identify an availability-accountability dilemma: in an environment the traditional definition of consensus protocol security needs to with dynamic participation, no protocol can simultaneously be be modified to reflect its accountability? accountably-safe and live. We provide a resolution to this dilemma by constructing an optimally-resilient accountability gadget to 1.2 Accountable Security checkpoint a longest chain protocol, such that the full ledger is live Defining an appropriate notion of accountable security of a protocol under dynamic participation and the checkpointed prefix ledger is is the first contribution of this work. accountable. Our accountability gadget construction is black-box As a starting point, we observe there is an inherent asymmetry and can use any BFT protocol which is accountable under static between safety and liveness as far as accountability is concerned. participation. Using HotStuff as the black box, we implemented While safety violations such as due to double-voting can be caught our construction as a protocol for the Ethereum 2.0 beacon chain, in a provable way, liveness violations such as transaction censor- and our Internet-scale experiments with more than 4000 nodes ing cannot be. Indeed, the goal in [6] is to provide accountable show that the protocol can achieve the required scalability and has safety. Hence, to set the stage to incorporate accountability, it is better latency than the current solution Gasper, while having the helpful to split the single metric of resilience into two individual advantage of being provably secure. To contrast, we demonstrate a metrics: safety resilience and liveness resilience. Safety resilience is new attack on Gasper. the minimum number of Byzantine faults to cause a safety violation. Liveness resilience is the minimum number of Byzantine faults to 1 INTRODUCTION cause a liveness violation. Classic resilience is simply the minimum 1.1 Accountability of the two. Safety and liveness are the two fundamental security properties of To capture accountable security, the notion of liveness resilience consensus protocols. A protocol run by a distributed set of nodes is remains the same, but safety resilience should be strengthened to safe if the ledgers generated by the protocol are consistent across a notion of accountable safety resilience: the minimum number of nodes and across time. It is live if all honest transactions eventually faults that can be found accountable when there is a safety violation. enter into the ledger. More precisely, a protocol has an accountable safety resilience of Traditionally, consensus protocols are developed for fault-tolerant means that in the case of a safety violation, at least nodes distributed computing, where a set of distributed computing devices can be held accountable in a provable manner. By definition, the aims to emulate a reliable centralized computer. In such a context, accountable safety resilience of a protocol cannot be larger than its the security of consensus protocols is naturally measured by its safety resilience. resilience: the minimum number of Byzantine protocol-violating Splitting the traditional notion of resilience into safety and live- nodes needed to cause a loss of safety or liveness. In modern ap- ness resiliences, although usually not explicitly done, is implicit in plications such as cryptocurrencies and decentralized applications many previous works in the BFT literature. Indeed, one can think platforms, consensus nodes are no longer just disinterested com- of the design objective of a consensus protocol as achieving a good puting devices but are agents acting based on economic and other tradeoff between safety and liveness resiliences. For example, in- incentives. To provide the proper incentives to encourage nodes creasing the threshold of the quorum, a central concept in many to follow the protocol, it is important that they can be held ac- BFT protocols, increases the protocol’s safety resilience while de- countable for their protocol-violating behavior in a provable way. creases its liveness resilience. This separate treatment of safety and liveness resiliences is recently formalized in [29] through the no- The authors contributed equally and are listed alphabetically. tion of alive-but-corrupt faults. Indeed, results in the literature on 1
Joachim Neu, Ertem Nusret Tas, and David Tse optimal resilience achievable in a given network environment (syn- Partially Dynamic Synchronous chronous, partially synchronous, etc.) can be refined into results Synchronous Participation on the optimal tradeoffs achievable between safety and liveness s s s resiliences. (i) (ii) 1 (iii) In applications with an economic context, treating safety and liveness resiliences separately makes sense, because the mecha- 1 1 2, 2 1 2, 2 nisms of attacking safety are different from the mechanisms of 2 2 3, 3 2 attacking liveness and therefore the costs to the attacker are also different. Trading off the two resiliences allows the protocol de- /2 l /2 l 1/2 1 l signer to maximize the cost to the attacker. In this light, shifting the attention from safety resilience to accountable safety resilience a a a makes sense under the assumption that the dominant cost to the (iv) (v) 1 (vi) attacker is the cost from being punished, from slashing of the stake for example. 1 A natural question then is: given a network environment, what is 2 2 2 the optimal tradeoff achievable between accountable safety resilience and liveness resilience by any protocol? /2 l /2 l 1/2 1 l 1.3 Optimal Accountable-Safety vs Liveness Figure 1: Above: optimal tradeoffs between (traditional) Tradeoffs safety and liveness resilience in three environments. Below: Our second contribution is the characterization of the optimal trade- optimal tradeoffs between accountable safety resilience and offs between accountable safety resilience and liveness resilience in liveness resiliences. In each environment, the optimal trade- three network environments: a) synchronous, where all nodes are off is described by a region of feasible resilience points, each online and all messages are delivered between honest nodes within point achievable by some protocol; points outside the re- a known delay bound; b) partially synchronous, where all nodes gion cannot be achieved by any protocol. The classic sin- are online but messages suffer arbitrary delays before a Global gle resilience metric is maximized at the point on the re- Stabilization Time (to model network partition), after which the gion boundary where the two resiliences are the same. In network becomes synchronous; c) dynamic participation, where the synchronous and partially synchronous environments, the number of nodes online is varying but messages delivered be- resilences are expressed in terms of the number of Byzan- tween online honest nodes have a known delay bound. The results tine nodes; in the dynamic participation environment, re- are shown in Figure 1 (the lower part) and compared to the optimal siliences are expressed in terms of the number of Byzantine tradeoff between (traditional) safety and liveness resiliences (the nodes as a fraction of the total online nodes. upper part). The point in each of the optimal tradeoff regions which max- all these cases, the same protocol simultaneously maximizes imizes the traditional single resilience metric is the point on the the safety resilience and the accountable safety resilience for a boundary where the safety and liveness resiliences are the same. given liveness resilience. That means one can simultaneously In general, one may want to operate at a different point on the have optimal accountable security without sacrificing tradi- boundary, reflecting the different costs of attacking safety versus tional security. liveness. • In contrast to the synchronous and partially synchronous en- Several interesting conclusions regarding Figure 1: vironments, no accountability can be supported under dy- • As is well known, stronger security guarantees can be pro- namic participation: the accountable safety resilience is zero vided in the synchronous environment than in the partially for any positive liveness resilience (Figure 1 (vi)). Protocols like synchronous environment. This is reflected in an optimal trade- Ouroboros [16] and SnowWhite [15] can tolerate dynamic par- off between (traditional) safety and liveness resiliences which ticipation and are safe and live under a resilience of 50%, thus is better in the synchronous environment than in the partially achieving the optimal point in Figure 1 (iii). But they cannot synchronous environment. In contrast, the optimal tradeoff be- be made accountable. tween accountable safety resilience and liveness resilience is the same in the two environments. Thus the synchrony assump- 1.4 Availability-Accountability Dilemma tion does not improve accountable security. This is related to In public permissionless blockchains like Bitcoin and Ethereum, dy- an impossibility result in [25], which we discuss in Section 1.6. namic participation is a central feature. In Bitcoin, for example, the • Many protocols achieve the optimal tradeoffs in synchronous total hash rate varies over many orders of magnitude over the years. and partially synchronous environments. For example, Poly- Yet, the blockchains remain continuously available, i.e., live. Our graph [13], HotStuff [41] and Streamlet [10] achieve the optimal results say that it is impossible to support accountability for such tradeoff in the partially synchronous environment. Sync Hot- dynamically available protocols, i.e., protocols that are live under Stuff [2] and Sync Streamlet [10] achieve the optimal tradeoff dynamic participation. We call this the availability-accountability in the synchronous environment (see Appendix E). In fact, in dilemma. 2
The Availability-Accountability Dilemma and its Resolution via Accountability Gadgets The works Casper [6] and Gasper [7] provide some hints on how Checkpoint decisions to get around this dilemma. One way to interpret these works is that Confirmed Checkpoint decisions they aim to design an accountability gadget to provide a checkpoint- blocks Votes Votes Accountable ing mechanism on top of a dynamically available chain, so that the interpreter consensus generator Checkpoint- txs LOGacc Vote Vote available chain can continue to grow while the ledger up to the lat- respecting longest LOGbft est checkpoint is accountable. While the availability-accountability chain dilemma says that a single ledger cannot be made dynamically avail- Πlc Πbft able and accountable at all times, these works can be interpreted Πacc Accountability gadget LOGda as aiming to generate two ledgers: 1) a full ledger LOGda , which is dynamically available, 2) an accountable ledger LOGacc , which Figure 2: We construct an accountability gadget Π acc from is the checkpointed prefix of the full ledger. Unfortunately these any accountable BFT protocol Πbft and apply it to a longest- works have several significant limitations: chain-type protocol Π lc as follows: The fork choice rule of • They lack a formulation to specify what security properties Πlc is modified to respect the latest checkpoint decision. they want to achieve for these two ledgers. In particular, it is Blocks confirmed by Π lc are output as available ledger LOGda . not clear how closely the checkpointed ledger can track the They are also the basis on which nodes generate a proposal available ledger. and vote for the next checkpoint. To ensure that all nodes • Liveness attacks have been discovered for these protocols [31, reach the same checkpoint decision, consensus is reached 33, 33, 35]. (To reinforce this point, in Appendix I we present on which votes to count using Π bft . Checkpoint decisions are a new practical attack which dispenses with the adversarial output as accountable ledger LOGacc and fed back into the network delay employed by earlier attacks.) protocol to ensure consistency of future block production in Πlc and future checkpoints with previous checkpoints. 1.5 Resolution via Accountability Gadgets The third contribution of this work is the design and implementa- Accountable prefix ledger Available full ledger tion of an accountability gadget which, when applied to a longest Ledger length [blocks] chain protocol, generates a dynamically available ledger LOGda 200 and a checkpointed prefix ledger LOGacc with provably optimal security properties. Consider a network with a total of permissioned nodes, and 100 an environment where the network may partition and the nodes may go online and offline. 0 (1) (P1: Accountability) For any < /2, the accountable ledger 0 500 1,000 1,500 2,000 2,500 LOGacc can provide an accountable safety resilience of −2 +2 Time [s] at all times, and it is live after the partition heals and greater than − honest nodes come online. Figure 3: Ledger dynamics of a longest chain protocol outfit- (2) (P2: Dynamic Availability) The available ledger LOGda is ted with our accountability gadget based on HotStuff, mea- guaranteed to be safe after network partition and live at all sured with 4,100 nodes distributed around the world. The times, provided that fewer than 1/2 of the online nodes are available full ledger grows steadily. The accountable pre- adversarial. fix periodically catches up whenever a new block is check- Note that while the checkpointed ledger is by definition always pointed. (Here, no attack; for attack, cf. Figure 6.) a prefix of the full available ledger, the above result says that the checkpointed ledger will catch up with the available ledger when respect the checkpoints. That is, new blocks are proposed and the the network heals and a sufficient number of honest nodes come ledger of confirmed transactions is determined based on the longest online. chain among all the chains containing the latest checkpointed block. The achieved resiliences are optimal. This can be seen by com- This gives the available full ledger LOGda . Periodically, nodes vote paring this result with Figure 1 (iii) and (v). The checkpointed ledger on the next checkpoint (following a randomly selected leader’s LOGacc cannot achieve a better tradeoff between accountable safety proposal). To ensure that when tallying votes all nodes base their resilience and liveness resilience than the tradeoff in (v); it in fact decision for the next checkpoint on the same set of votes, any achieves exactly the same tradeoff. The dynamically available led- accountable BFT protocol designed for a fixed level of participation ger LOGda cannot achieve a better resilience than the (1/2, 1/2) can be used (entirely as a black box) to reach consensus on the votes. point in (iii); the ledger in fact achieves it. Moreover, even if the The chain up to the latest checkpoint constitutes the accountable network were synchronous at all times, no protocol could have prefix ledger LOGacc . Consistency of blocks confirmed by Πlc and generated an accountable ledger with better resilience, in light of future checkpoint proposals with established checkpoints is ensured Figure 1 (iv). So we are getting partition-tolerance for free, even throughout. though accountability is the goal. Since there are many accountable BFT protocols [25], we have The accountability gadget construction is shown in Figure 2. It a lot of implementation choices. Due to its maturity and the avail- is built on top of any existing longest chain protocol modified to 3
Joachim Neu, Ertem Nusret Tas, and David Tse ability of a high quality open-source implementation which we though they are different, it turns out that some, but not all, pro- could employ practically as a black box, we decided to implement a tocols that resolve the availability-finality dilemma can be used to prototype of our accountability gadget using the HotStuff protocol resolve the availability-accountability dilemma. The first resolution [41]. Taking the Ethereum 2.0’s beacon chain as a target appli- of the availability-finality dilemma is the class of snap-and-chat pro- cation and matching its key performance characteristics such as tocols [35], which combines a longest chain protocol with a partially latency and block size, we performed Internet-scale experiments synchronous BFT protocol in a black box manner to provide finality. to demonstrate that our solution can meet the target specification If the partially synchronous BFT protocol is accountable, it is not with over 4000 participants (see Figure 3). In particular, for the cho- too difficult to show [34] that the resulting snap-and-chat protocol sen parameterization and even before taking reduction measures, would also provide a resolution to the availability-accountability the peak bandwidth required for a node to participate does not dilemma. On the other hand, checkpointed longest chain [39], an- exceed 1.5 MB/s (with a long-term average of 78 KB/s) and hence other resolution of the availability-finality dilemma, is not account- is feasible even for many consumer-grade Internet connections. At able, as shown in Appendix F. the same time, our prototype provides 5× better average latency of The accountability gadget we designed combines elements from LOGacc compared to the instantiation of Gasper currently used for snap-and-chat protocols and from the checkpointed longest chain. Ethereum 2’s beacon chain. A strength of snap-and-chat protocols is its black box nature which gives it a flexibility to provide additional features. A drawback is 1.6 Related Works that the protocol may reorder the blocks in the longest chain proto- 1.6.1 Accountability. Accountability in distributed protocols has col to form the final ledger [34]. This means that when a proposer been studied in earlier works [23, 24]. [23] designed a system, Peer- proposes a block on the longest chain, it cannot predict the ledger Review, which detects faults. [24] classifies faults into different state and check the validity of the transactions by just looking at types and studies their detectability. Casper [6] focuses on account- the earlier blocks in the longest chain. This lack of predictive valid- ability and fault detection when there is violation of safety, and led ity opens the protocol to spamming and limits the use of standard to the notion of accountable safety resilience we use in this work. techniques to support light clients and sharding. Checkpointed Polygraph [13] is a partially synchronous BFT protocol which is se- longest chain builds upon a line of work called finality gadgets cure when there are less than /3 adversarial nodes, and when there [6, 7, 18, 40] and overcomes this limitation of snap-and-chat pro- is a safety violation, at least /3 nodes can be held accountable. In tocols because the longest chain protocol is modified to respect our formulation, this corresponds to achieving the point ( /3, /3) the checkpoints. However, checkpointed longest chain’s finality on Figure 1 (v). [38] builds upon [13] to create a blockchain which gadget is not black box but specifically uses Algorand BA [11], can exclude Byzantine nodes that were found to have provably which is not accountable [25]. Our accountability gadget solution violated the protocol. builds on the checkpointed longest chain but, like snap-and-chat Many of these previous works focus on studying the accounta- protocols, allows the use of any BFT protocol as a black box. When bility of specific protocols and think of accountability as an add-on an accountable BFT protocol like HotStuff is used, the checkpointed feature in addition to the basic security properties of the protocol. ledger is guaranteed to be accountable. [25] follows this spirit but broadens the investigation to formulate a framework to study the accountability of many existing BFT pro- 1.7 Outline tocols. More specifically, their framework augments the traditional The remainder of this paper is structured as follows: First, we in- resilience metric with accountable safety resilience (which they troduce the notation and model for a formal treatment of the trade- call forensic support). The present work is more in the spirit of offs among safety, liveness and accountable safety resiliences in [6] where accountability is a central design goal, not just an add- Section 2. Then, Section 3 presents the proof of the availability- on feature. To formalize this spirit, we split traditional resilience accountability dilemma and formalizes the tradeoffs visualized in into safety and liveness resiliences, upgrade safety resilience to Figure 1. Section 4 elaborates on the accountability gadgets intro- accountable safety resilience, and formulate accountable security duced in Section 1.5 and argues for their security. We discuss de- as a tradeoff between liveness resilience and accountable safety tails of a prototype implementation and experimental performance resilience. Further, we broaden the study to the important dynamic results in Section 5. Finally, we conclude with a generalization of participation environment, where we discovered the availability- accountability gadgets to Proof-of-Work and Proof-of-Space longest accountability dilemma. Despite these differences in formulation chain protocols in Section 6. and in scope, we are able to adopt the proof of the impossibility result Theorem B.1 in [25], because at the heart of it, that theorem 2 MODEL is really about the tradeoff between liveness and accountable safety resiliences, although not stated as such. We first give an overview of the client-server model for state ma- chine replication (SMR) protocols and introduce the notation that 1.6.2 Availability-Finality Dilemma and Finality Gadgets. The avail- will be used in subsequent proofs. In the classical SMR formulation, ability-finality dilemma [22, 27, 35] states that no protocol can pro- nodes take inputs called transactions and enable clients to agree on vide both finality, i.e., safety under network partitions, and availa- a single sequence of transactions, called the ledger and denoted by bility, i.e., liveness under dynamic participation. The availability- LOG, that produced the state evolution. For this purpose, nodes accountability dilemma states that no protocol can provide both exchange messages, e.g., blocks or votes, and each node records accountable safety and liveness under dynamic participation. Al- its view of the protocol by time in an execution transcript T . 4
The Availability-Accountability Dilemma and its Resolution via Accountability Gadgets To obtain the ledger at time , clients query the nodes running dynamic participation: For any given time slot, A determines which the protocol. When a node is queried at time , it produces evidence honest nodes are awake/asleep at that slot, subject to the constraint w by applying an evidence generation function W to its current that at all time slots, at most fraction of awake nodes are adversar- transcript: w ≜ W (T ). Upon collecting evidences from some ial and at least one honest node is awake. At all times, the adversary subset of the nodes, each client applies the confirmation rule C is required to deliver messages sent between honest nodes in at to this set of evidences to obtain the ledger: LOG ≜ C({w } ∈ ). most Δ slots. Adversarial nodes are assumed to be always awake. Protocols typically require to query a subset containing at least Examples: To illustrate the model above, consider a client that one honest node. queries nodes running a Nakamoto-style longest chain protocol Environment: We assume that the transactions are input to the under (Ada, Zda ) at some time . Suppose < 1/2. The transcript nodes by the environment Z. There are in total nodes numbered T held by node at time consists of the blocks received by node from 1 thru . There exists a public-key infrastructure and each by time . Given the transcript T , W outputs as evidence the node is equipped with a unique cryptographic identity. There is a longest chain implied by T . Upon collecting evidences from a random oracle, serving as a common source of randomness for the subset of awake nodes with at least one honest node, a client calls protocols. Time is slotted and the nodes have synchronized clocks. C which selects the longest chain in the set {w } ∈ and outputs the -deep prefix of that longest chain as the ledger. Corruption: Adversary A is a probabilistic poly-time algorithm. We can also consider propose-and-vote-style BFT protocols such Before the protocol execution starts, A gets to corrupt (up to) as HotStuff, LibraBFT, Streamlet and PBFT [9, 10, 28, 41] with = nodes, then called adversarial nodes. Adversarial nodes surrender 3 + 1 nodes under (Ap, Zp ). In this case, the transcript T held by their internal state to the adversary and can deviate from the proto- a node at time consists of all received messages such as proposals col arbitrarily (Byzantine faults) under the adversary’s control. The and votes. Given T , W outputs as evidence a sequence of proposals remaining ( − ) nodes are called honest and follow the protocol with votes attesting to them. Upon collecting evidences from a as specified. subset of nodes containing at least one honest node, a client calls C, Sleeping: To model dynamic participation, we adopt the concept which outputs the largest possible sequence of proposals that can be of sleepiness from [37]. In the setting with dynamic participation, confirmed given the votes attesting to them. The confirmation rule A chooses, for every time slot and node, whether an honest node is typically requires votes from − +1 nodes on consecutive proposals awake (i.e., online) or asleep (i.e., offline) in that slot. Z then wakes to guarantee safety which follows from a quorum intersection up or puts nodes to sleep following the schedule determined by argument. Liveness ensues from the fact that the honest evidence A. An honest node that is awake in a slot executes the protocol within includes all of the confirmed proposals submitted by honest faithfully in that slot. An honest node that is asleep in a slot does nodes. Existence of an honest evidence in is typically enforced not execute the protocol in that slot, and messages that would have by collecting evidences from at least + 1 nodes. arrived in that slot are queued and delivered in the first slot in which Safety and Liveness Resiliences: Safety and liveness are defined the node is awake again. Adversarial nodes are always awake. We as the traditional security properties of SMR protocols: define as the maximum value of the fraction of adversarial nodes over the total number of awake nodes throughout the execution of Definition 1. Let confirm be a polynomial function of the security the protocol. parameter of an SMR protocol Π. We say that Π with a confirma- tion rule C is secure and has transaction confirmation time confirm Networking: Nodes can send each other messages, which arrive if ledgers output by C satisfy: with a certain delay controlled by the adversary, subject to con- • Safety: For any time slots , ′ and sets of nodes , ′ satisfying straints elaborated below. the requirements stipulated by the protocol, either LOG ≜ ′ Network Environments: Given the definitions above, we provide C({w } ∈ ) is a prefix of LOG ′ ≜ C({w } ∈ ′ ) or vice versa. three sets of assumptions on the environment Z and the adver- • Liveness: If Z inputs a transaction to an awake honest node sary A to model a synchronous network, a partially synchronous at some time , then, for any time slot ′ ≥ + confirm and any network and a synchronous network with dynamic participation, set of nodes satisfying the requirements stipulated by the ′ respectively. These assumptions are expressed as the (A, Z) tuples: protocol, the transaction is included in LOG ≜ C({w } ∈ ). (As, Zs ) formalizes the model of a synchronous network, where Definition 2. For static (dynamic) participation, safety resilience of the adversary corrupts nodes, and all of the nodes are awake a protocol is the minimum number of adversarial nodes (minimum throughout the execution. At all times, adversary is required to fraction of adversarial nodes among awake nodes) to cause a deliver all messages sent between honest nodes in at most Δ slots. safety violation. Such a protocol provides -safety ( -safety). (Ap, Zp ) formalizes a partially synchronous network, where the adversary corrupts nodes, and all of the nodes are awake Definition 3. For static (dynamic) participation, liveness resilience throughout the execution. Before a global stabilization time GST, A of a protocol is the minimum number of adversarial nodes (mini- can delay network messages arbitrarily. After GST, A is required mum fraction of adversarial nodes among awake nodes) to cause to deliver all messages sent between honest nodes in at most Δ a liveness violation. Such a protocol provides -liveness ( -liveness). slots. GST is chosen by A, unknown to the honest nodes, and can Accountable Safety Resilience: To formalize the concept of ac- be a causal function of the randomness in the protocol. countable safety resilience, we define an adjudication function J , (Ada, Zda ) formalizes the model of a synchronous network with similar to the forensic protocol defined in [25], as follows: 5
Joachim Neu, Ertem Nusret Tas, and David Tse Definition 4. An adjudication function J takes as input two sets from honest nodes and to have confirmed a ledger based solely on of evidences and ′ with conflicting ledgers LOG ≜ C( ) and the longest chain provided by the adversarial evidences. Indeed, LOG ′ ≜ C( ′ ), and outputs a set of nodes that have provably both would obtain non-empty ledgers, because the longest chain violated the protocol rules. So, J never outputs an honest node. is dynamically available, but these two ledgers would conflict. Yet, based on the two sets of evidences, the judge J can neither dis- When the clients observe a safety violation, i.e., at least two tinguish who is honest client and who is co-conspirator, nor tell sets of evidences and ′ such that LOG ≜ C( ) and LOG ′ ≜ which nodes are honest or adversarial. So none of the adversarial C( ′ ) conflict with each other, they call J on these evidences to nodes can be held accountable (without risking to falsely convict identify nodes that have violated the protocol. an honest node). Accountable safety resilience builds on the concept of -account- A formal proof building on this observation is as follows: able-safety first introduced in [6]: Definition 5. For static (dynamic) participation, accountable safety resilience of a protocol is the minimum number of nodes (mini- mum fraction of nodes among awake nodes) output by J in the Proof. For the sake of contradiction, suppose there exists an event of a safety violation. Such a protocol provides -accountable- SMR protocol Π that provides l -liveness and a -accountable-safety safety ( -accountable-safety). for some l, a > 0 under (Ada, Zda ). Then, there exists an adjudi- Note that -accountable-safety implies -safety of the protocol cation function J , which given two sets of evidences attesting to (and the same for ) since J outputs only adversarial nodes. conflicting ledgers, outputs a non-empty set of adversarial nodes. Suppose there are nodes in Z. Without loss of generality, we 3 THE AVAILABILITY-ACCOUNTABILITY may assume that is even; otherwise, Z puts one node to sleep throughout the execution. Let and partition the nodes into DILEMMA two disjoint equal groups with | | = | | = /2. We denote by [tx] In this section, we investigate the fundamental tradeoffs between a ledger consisting of a single transaction tx at its first index. liveness, safety and accountable safety resiliences shown in Figure 1 Next consider the following worlds: under three different network environments: synchrony (As, Zs ), World 1: Nodes in are honest and awake throughout the partial synchrony (Ap, Zp ) and dynamic participation (Ada, Zda ). execution. Z inputs tx1 to them. Nodes in are asleep. Since Π satisfies liveness for some l > 0 under (Ada, Zda ), nodes in 3.1 Accountability and Liveness are eventually generate a set of evidences 1 such that C( 1 ) = [tx1 ]. Incompatible Under Dynamic Participation World 2: Nodes in are honest and awake throughout the We observe that the strictest tradeoff between the liveness and execution. Z inputs tx2 to them. Nodes in are asleep. Since Π accountable safety resilience occurs for dynamically available pro- satisfies liveness for some l > 0 under (Ada, Zda ), nodes in tocols under (Ada, Zda ) (Figure 1 (vi)), a result which was named eventually generate a set of evidences 2 such that C( 2 ) = [tx2 ]. the availability-accountability dilemma in Section 1.4: World 3: Z wakes up all nodes, and inputs tx1 to the nodes in and tx2 to the nodes in . Nodes in are honest. Nodes in are Theorem 1. No SMR protocol provides both a -accountable-safety adversarial and do not communicate with the nodes in . All nodes and l -liveness for any a, l > 0 under (Ada, Zda ). stay awake throughout the execution. Since the worlds 1 and 3 are Theorem 1 states that under dynamic participation it is impos- indistinguishable for the nodes in , they eventually generate a set sible for an SMR protocol to provide both positive accountable of evidences 1 such that C( 1 ) = [tx1 ]. Nodes in simulate the safety resilience and positive liveness resilience. In light of this execution in world 2 without any communication with the nodes in result, protocol designers are compelled to choose between pro- . Hence, they eventually generate a set of evidences 2 such that tocols that maintain liveness under fluctuating participation, and C( 2 ) = [tx2 ]. Thus, there is a safety violation. So, J takes 1 protocols that can enforce the desired incentive mechanisms high- and 2 , and outputs a non-empty set 3 ⊆ of adversarial nodes. lighted in Section 1.1 via accountability. Since both of the above World 4: Z wakes up all nodes, and inputs tx1 to the nodes in features are desirable properties for Internet-scale consensus pro- and tx2 to the nodes in . Nodes in are honest. Nodes in are tocols, the availability-accountability dilemma presents a serious adversarial and do not communicate with the nodes in . All nodes obstacle in the effort to obtain an incentive-compatible and robustly stay awake throughout the execution. Since the worlds 2 and 4 are live protocol for applications such as cryptocurrencies. indistinguishable for the nodes in , they eventually generate a set To build some intuition for the proof of Theorem 1, let us consider of evidences 2 such that C( 2 ) = [tx2 ]. Nodes in simulate the a permissioned longest chain protocol under (Ada, Zda ) where half execution in world 1 without any communication with the nodes in of nodes are adversarial. Adversarial nodes avoid all communica- . Hence, they eventually generate a set of evidences 1 such that tion with honest nodes and build a private chain that conflicts with C( 1 ) = [tx1 ]. Thus, there is a safety violation. So, J takes 1 the chain built collectively by the honest nodes. Such diverging and 2 , and outputs a non-empty set 4 ⊆ of adversarial nodes. chains mean the possibility of an (ostensible) safety violation. Think Note however that worlds 3 and 4 are indistinguishable from the of an honest client towards whom adversarial nodes pretend to perspective of the adjudication function J . Thus, it is not possible be asleep and who confirms a ledger based solely on the longest that J reliably outputs a non-empty set which in the case of world chain provided by the honest evidences; and a co-conspirator of 3 contains only elements of and in the case of world 4 contains the adversary who pretends to not have received any evidences only elements of , as would be required by Definition 4. □ 6
The Availability-Accountability Dilemma and its Resolution via Accountability Gadgets 3.2 Tradeoff Between Accountable Safety and Theorem 4. For any SMR protocol that satisfies s -safety and l - Liveness Resiliences liveness under (Ap, Zp ), l ≤ ⌈ /2⌉ and s ≤ − 2 l + 2. Proof of Theorem 1 relies on the fact that in a dynamically available Proof of Theorem 4 is given in Appendix C. It states the funda- protocol, adversarial nodes, by private execution, can always create mental safety-liveness resilience tradeoff for partially synchronous a set of evidences that yields a conflicting ledger through the con- protocols. It is a generalization of the celebrated /3 resilience firmation rule C. This is because dynamically available protocols bound [19] for the security of partially synchronous protocols. cannot set a lower bound on the number of evidences eligible to Theorem 5. For any SMR protocol that satisfies s -safety and l - generate a non-empty ledger through C, and thus are forced to liveness under (Ada, Zda ), l, s < 1/2. output ledgers for evidences from any number of nodes. However, in the case of a BFT protocol with a fixed level of participating Proof of Theorem 5 is given in Appendix D. It states the funda- nodes, such an attack on accountability will not be possible as the mental safety-liveness resilience tradeoff for dynamically available protocol could require a valid input to C to contain evidences from protocols, and generalizes the intuition behind a similar result for at least a certain fraction of the nodes. In this context, instead of Proof-of-Work protocols given by [36, Theorem 3]. Dynamically an availability-accountability dilemma, we can talk about a softer available protocols are designed to output ledgers for sets of evi- tradeoff between the accountable safety and liveness resiliences dences containing responses from any number of nodes, as they do (Figure 1 (iv) and (v)), which is formalized below: not know a priori the number of awake nodes. In this case, given two sets of evidences for conflicting ledgers, their relative sizes Theorem 2. For any SMR protocol that satisfies a -accountable- become the only selection criterion for such protocols, unlike in a safety and l -liveness, a ≤ max(0, − 2 l + 2). static environment where the protocol can require evidences from a Proof of Theorem 2 follows from the proof of [25, Theorem B.1] fixed number of nodes. Thus, the adversary is able to violate safety and is given in Appendix A. Both proofs rely on the observation and liveness whenever it controls the majority among awake nodes. that for an SMR protocol (or BA protocol in the case of [25, Theorem Finally, we show that the curves of Figure 1 (i)–(iii) implied B.1]) that satisfies l -liveness ( l -validity for [25, Theorem B.1]), no by Theorems 3, 4, and 5 are tight. Sync Streamlet [10] and Sync adjudication function is able to output more than max(0, − 2 l + 2) HotStuff [2] achieve all liveness and safety resilience points ( l, s ) nodes in the event of a safety (agreement for [25, Theorem B.1]) shaded in blue in Figure 1 (i) for (As, Zs ). Streamlet and HotStuff violation without incorrectly accusing an honest node. [41] achieve all ( l, s ) shaded in blue in Figure 1 (ii) for (Ap, Zp ). Finally, note that no SMR protocol provides l -liveness for l > Sleepy and Ouroboros [3, 15, 16, 26, 37] achieve all ( l, s ) shaded ⌈ /2⌉ under any setting, as will be explained in Section 3.3. This, in blue in Figure 1 (iii) for (Ada, Zda ). A more detailed discussion together with Theorems 1 and 2, completes the characterization on these protocols is given in Appendix E. of the curves of Figure 1 (iv)–(vi). We proceed to show that these curves are tight. Tightness of (vi) under (Ada, Zda ) follows directly 4 ACCOUNTABILITY GADGETS from the nature of the dilemma, any dynamically available protocol In this section, we give a detailed description of the accountability ‘achieves’ it. On the other hand, Sync Streamlet [10] and Sync Hot- gadget introduced in Section 1.5. For ease of exposition, we con- Stuff [2] achieve all liveness and accountable safety resilience points struct an accountability gadget from an accountable BFT protocol ( l, a ) shaded in blue in Figure 1 (iv) for (As, Zs ), and Streamlet Πbft with accountable safety and liveness resilience of ⌊ /3⌋. and HotStuff [41] achieve all ( l, a ) shaded in blue in Figure 1 (v) for (Ap, Zp ). A more detailed discussion on the protocols, in par- 4.1 Protocol Description ticular on how the synchronous protocols can be made to provide Accountability gadgets, denoted by Π acc , can be used in conjunction accountability, is given in Appendix E. with any dynamically available longest chain (LC) protocol Π lc such as Nakamoto’s PoW LC protocol [30], Sleepy [37], Ouroboros 3.3 Tradeoffs Between Safety and Liveness [3, 16, 26] and Chia [14] (Figure 2). The protocol Π lc then follows a Resiliences modified chain selection rule where honest nodes build on the tip In this section, we formalize the tradeoffs between safety and live- of the LC that contains all of the checkpoints they have observed. ness resiliences shown by Figure 1 (i)—(iii) under the three different We call such chains checkpoint-respecting LCs. At each time slot network environments (As, Zs ), (Ap, Zp ) and (Ada, Zda ): , each honest node outputs the -deep prefix of the checkpoint- respecting LC (or the prefix of the latest checkpoint, whichever is Theorem 3. For any SMR protocol that satisfies s -safety and l - longer) in its view as LOG da, . liveness, l ≤ ⌈ /2⌉ and s ≤ − l + 1. The accountability gadget Π acc has three main components as Proof of Theorem 3 is given in Appendix B. The theorem applies shown on Figure 2: a checkpoint vote generator (Algorithm 1) to all SMR protocols under any network environment. It formalizes that issues checkpoint proposals and votes, an accountable SMR the common intuition that in the presence of clients, no consensus protocol Π bft that is used to reach consensus on which votes to protocol can maintain security when the adversary controls half or count for the checkpoint decision, and a checkpoint vote interpreter more of the nodes. In particular, it shows that no safe SMR protocol (Algorithm 2) that outputs checkpoint decisions computed deter- can provide liveness if the adversary controls the majority of the ministically from the ordered sequence of checkpoint votes output nodes, as the adversary can always use its majority to either commit by Πbft . The protocol Πbft can be instantiated with any accountable safety violations or to censor transactions. BFT protocol such as Streamlet [10], LibraBFT [28] or HotStuff 7
Joachim Neu, Ertem Nusret Tas, and David Tse Algorithm 1 Pseudocode for Checkpoint Vote Generator their checkpoint iterations (lines 7 and 19 of Algorithm 1). For 1: lastCp, props ← ⊥, { : ⊥ | = 0, 1, ... } ⊲ Last checkpoint, proposals any given iteration , RecvVerifiedProp() returns only proposals 2: for currIter ← 0, 1, ... that were signed by the legitimate leader L ( ) of that iteration. A 3: if lastCp ≠ ⊥ 4: while waiting cp time ⊲ Wait cp time after new checkpoint proposal is said to be valid in the view of a node if the proposed 5: on Checkpoint( , ) ← GetNextCp () with = currIter block is within ’s checkpoint-respecting LC and extends all pre- 6: goto 23 ⊲ Jump to conclusion of current iteration vious checkpoints observed by . During an iteration , each node 7: on Proposal( , ) ← RecvVerifiedProp () with props[ ] = ⊥ 8: props[ ] ← ⊲ Keep track of proposals from authorized leader checks if the proposal received for that iteration is valid using 9: if CpLeaderOfIter(currIter) = myself the procedure IsValidProposal() (line 13 of Algorithm 1). If it has 10: Broadcast ( ⟨propose, currIter, GetCurrProposalTip () ⟩myself ) indeed received a valid proposal with the proposed block , it votes 11: while waiting to time ⟨accept, , ⟩ (line 14 of Algorithm 1). Otherwise, if the proposal 12: on props[currIter] ≠ ⊥, but at most once ⊲ Act on the first proposal received for iteration is invalid or if does not receive any pro- received from authorized leader before end of cp -wait and to -timeout 13: if IsValidProposal (props[currIter]) ⊲ Valid proposal extends posal for a timeout period to , it votes ⟨reject, ⟩ (lines 16 and 21 of latest checkpoint and is consistent with current checkpoint-respecting LC Algorithm 1). Votes are input as payload to Πbft , which outputs an 14: SubmitVote ( ⟨accept, currIter, props[currIter] ⟩myself ) ordered sequence of votes in the form of the ledger LOGbft . Thus, 15: else 16: SubmitVote ( ⟨reject, currIter⟩myself ) ⊲ Reject invalid proposal it enables nodes to reach consensus on which votes to count for an 17: on Checkpoint( , ) ← GetNextCp () with = currIter upcoming checkpoint decision. 18: goto 23 ⊲ Jump to conclusion of current iteration The checkpoint vote interpreter (Algorithm 2) processes the se- 19: on Proposal( , ) ← RecvVerifiedProp () with props[ ] = ⊥ quence of votes in LOGbft to produce checkpoint decisions. Each 20: props[ ] ← ⊲ Keep track of proposals from authorized leader 21: SubmitVote ( ⟨reject, currIter⟩myself ) ⊲ Reject due to timeout node receives verified votes (i.e., with valid signature) in the order 22: wait on Checkpoint( , ) ← GetNextCp () with = currIter they appear on LOGbft via the procedure GetNextVerifiedVote- 23: lastCp ← ⊲ Keep track of checkpoint decision FromBft() (line 4 of Algorithm 2). Upon observing unique accept votes ⟨accept, , ⟩ from more than 2 /3 nodes for a block and Algorithm 2 Pseudocode for Checkpoint Vote Interpreter the current iteration , each node outputs as the checkpoint for 1: for currIter ← 0, 1, ... iteration via the procedure OutputCp() (line 10 of Algorithm 2). 2: currVotes ← { (pk, ⊥) | pk ∈ committee} ⊲ Latest vote of each node The checkpointed blocks output by OutputCp() over time, together 3: while true ⊲ Go through votes as ordered by Πbft 4: vote ← GetNextVerifiedVoteFromBft () ⊲ Verify signature with their respective prefixes, constitute the ledger LOG acc, . Fur- 5: if vote = ⟨accept, , ⟩pk with = currIter thermore, the checkpoint decisions are fed back to Πlc and the 6: currVotes[pk] ← Accept( ) ⊲ Count accept vote for block checkpoint vote generator so that they can ensure consistency of else if vote = ⟨reject, ⟩pk with = currIter future block production in Πlc and checkpoint proposals with prior 7: 8: currVotes[pk] ← Reject ⊲ Count reject vote 9: if ∃ : | {pk | currVotes[pk] = Accept( ) } | > 2 /3 checkpoints. 10: OutputCp (Checkpoint(currIter, )) ⊲ New checkpoint decision On the other hand, upon observing reject votes ⟨reject, ⟩ from 11: break /3 nodes, each node outputs ⊥ as the checkpoint decision for 12: else if | {pk | currVotes[pk] = Reject} | ≥ /3 13: OutputCp (Checkpoint(currIter, ⊥)) ⊲ Abort current iteration the current iteration (line 13 of Algorithm 2). Here, ⊥ signals 14: break that an iteration is aborted with no new checkpointed block, which happens if honest nodes suspect a faulty checkpoint leader and vote ‘reject’ because they have not seen progress for too long. Note that [41]. It is used as a black box ordering service within Π acc and is once a node outputs a checkpoint decision for its current iteration assumed to have confirmation time confirm . We denote the ledger , the checkpoint vote interpreter jumps to iteration + 1; thus, only output by Πbft as LOGbft , and emphasize that it remains internal to a single decision is output per iteration. the protocol Π acc . Checkpoint vote generator and interpreter are Upon receiving a new checkpoint for the current iteration via run locally by each node and interact with Π bft and LOGbft . Hence, the procedure GetNextCp(), nodes terminate iteration of the whenever we refer to LOGbft in the following paragraphs, we mean checkpoint vote generator and enter iteration + 1 (lines 6 and 18 of the ledger in the view of a specific node. Algorithm 1). If the checkpoint decision was for a non-empty block The accountability gadget Πacc proceeds in checkpoint iterations , nodes wait for cp time, denoted as the checkpoint interval, before denoted by , each of which attempts to checkpoint a new block they consider checkpoint proposals for iteration + 1. Similarly, in Πlc . The checkpoint vote generator produces requests which an honest leader for iteration + 1 waits for cp time before it can be of three forms: a proposal ⟨propose, , ⟩ proposing block broadcasts the new checkpoint proposal. As will become clear in for checkpointing in iteration , an accept vote ⟨accept, , ⟩ in the analysis, the checkpoint interval is crucial to ensure that Π lc ’s favor of checkpointing a block in iteration , or a reject vote chain dynamics are ‘not disturbed too much’ by accommodating ⟨reject, ⟩ for iteration . Here, ⟨...⟩ denotes a message signed by and respecting checkpoints. node . Each iteration has a publicly verifiable and unique leader L ( ) sampled using a random oracle. The leader obtains the cp - deep block on its checkpoint-respecting LC via the procedure 4.2 Security Properties GetCurrProposalTip() and broadcasts it to all other nodes as the In this section, we formalize and prove the security properties P1 checkpoint proposal for iteration . (line 10 of Algorithm 1). Nodes and P2 from Section 1.5 for accountability gadgets based on permis- receive checkpoint proposals from the network via the procedure sioned LC protocols [3, 16, 26, 37]. (For an extension of the security RecvVerifiedProp(), and order according them with respect to analysis to Proof-of-Work and Proof-of-Space LC protocols, see Sec- 8
The Availability-Accountability Dilemma and its Resolution via Accountability Gadgets 4 Gap and recency properties 5 Recurrence of checkpoint-strong of Π acc (Appendix G.4) pivots (Appendix H.3) 1 Consistency of check- 2 Accountability 6 Security of Π after 7 Liveness of Π after 9 Checkpointing 10 Security of Π lc bft lc pointed blocks in Π lc of LOGbft max(GST, GAT) (Appendix H) max(GST, GAT) cp -deep blocks under synchrony 3 Accountability 8 Liveness of LOG after acc 11 Security of LOG of LOGacc (Appendix G.2) max(GST, GAT) (Appendix G.3) da Figure 4: Dependency of the security properties of LOGacc and LOGda on the properties of Π acc , Πlc and Π bft . tion 6.) For this purpose, we first fix ≤ ⌈ /2⌉ and consider an ac- The resiliences achieved by LOGacc and LOGda are optimal, as countability gadget Π acc instantiated with a partially synchronous can be seen from Theorem 2 which states that for any protocol BFT protocol Πbft that provides ( − 2 + 2)-accountable-safety at satisfying a -accountable-safety and l -liveness, it must be the case all times, and -liveness under partial synchrony after the network that a ≤ − 2 l + 2, and from Theorem 5 which states that for any partitions heal and sufficiently many honest nodes come online. protocol satisfying s -safety and l -liveness under (Ada, Zda ), it (An example of such a protocol is HotStuff [41] with a quorum size must be the case that l, s ≤ 1/2. of − + 1. See Appendix E for further discussion.) To provide the To prove Theorem 6, we will first focus on the security of LOGda same accountable safety resilience as Π bft for LOGacc , we select under (Ada, Zda ) (box 11 of Figure 4). We know from [3, 16, 26, 37] the thresholds for the number of accept and reject votes required that Πlc is safe and live with some security parameter under to output a new checkpoint as − and in lines 9 and 12 of the original LC rule when < 1/2 (box 10 of Figure 4). Hence, if Algorithm 2, respectively. cp is selected as an appropriate linear function of , once a block Let (Apda, Zpda ) denote a partially synchronous network with becomes cp -deep at time in the LC held by a node, it stays on dynamic participation. It extends the partially synchronous network the LCs held by all of the honest nodes for all times ≥ . Since defined in Section 2 (with a global stabilization time GST) to include any block checkpointed by an honest node at time has to be at asleep/awake nodes and a global awake time GAT. Before GAT, the least cp -deep in the LC held by the proposer of that block (box 9 adversary determines which honest nodes are awake or asleep at of Figure 4), checkpoints are and will stay as part of the original each slot. After GAT, all honest nodes are awake. Here, GAT, just LCs held by every other honest node for all times ≥ . Then, the like GST, is chosen by the adversary, unknown to the honest nodes evolution of the checkpoint-respecting LCs will be the same as the and can be a causal function of the randomness in the protocol. But, evolution of the LCs in the absence of checkpoints. This implies that while GST needs to happen eventually (GST < ∞), GAT may be LOGda inherits the security properties of the original LC protocol infinite. So, whereas GST represents the time when the partition Πlc . heals, GAT represents the time when sufficiently many honest nodes We next focus on the accountability and liveness of LOGacc un- wake up. With GST and GAT, (Apda, Zpda ) enables us to express der (Apda, Zpda ) with global stabilization and awake times GST security properties for accountability gadgets under environments and GAT (boxes 3 and 8 of Figure 4). The pseudocode of Π acc stipu- with both network partitions and dynamic participation. lates that honest nodes vote accept only for checkpoint proposals Let denote the security parameter associated with the employed that are consistent with previous checkpoints they have observed cryptographic primitives. Similarly, let denote the security pa- (box 1 of Figure 4). Moreover, each new checkpoint requires at rameter associated with the LC protocol Πlc . Then, we can state the least − + 1 accept votes (line 9 of Algorithm 2). Thus, in the following theorem for the security properties of the ledgers LOGacc event of a safety violation, either there should be two inconsistent and LOGda output by the accountability gadget Πacc and the per- ledgers LOGbft held by honest nodes, or at least − 2 + 1 nodes missioned LC protocol Π lc (modified to be checkpoint-respecting): (which cannot be honest) must have voted on inconsistent check- points with respect to a single ledger LOGbft . In both cases, at least Theorem 6. For any , , and confirm, , cp linear in : − 2 + 2 adversarial nodes can be identified by either invoking (1) (P1: Accountability) Under (Apda, Zpda ), the accountable led- ( − 2 + 2)-accountable-safety of LOGbft (box 2 of Figure 4) or the ger LOGacc provides ( − 2 + 2)-accountable-safety at all times consistency requirement for checkpoints (box 1 of Figure 4). This (except with probability negl( )), and there exists a constant implies ( − 2 + 2)-accountable-safety of LOGacc , a detailed proof C such that LOGacc provides -liveness with confirmation time of which can be found in Appendix G.2. confirm after C max(GST, GAT) (except with probability negl( )). Liveness of LOGacc (box 8 of Figure 4) is the most involved (2) (P2: Dynamic Availability) Under (Ada, Zda ), the available part of the proof and requires the existence of iterations after ledger LOGda provides 1/2-safety and 1/2-liveness at all times max(GST, GAT) where all honest nodes vote accept for proposals (except with probability negl( ) + negl( )). by honest leaders. This, in turn, depends on whether the proposals (3) (Prefix) LOGacc is always a prefix of LOGda . by honest leaders are consistent with the checkpoint-respecting LCs held by the honest nodes after max(GST, GAT). To show that Here, negl(.) denotes a negligible function, i.e., a function that this is indeed the case, we prove that Π lc recovers its security af- decays faster than all polynomials. 9
Joachim Neu, Ertem Nusret Tas, and David Tse ter max(GST, GAT) (box 6 of Figure 4). For this purpose, we first Checkpoints observe that in the presence of checkpoints, honest nodes aban- Get/validate vote interpreter Tip to grow duction lottery vote generator don their LC if a new checkpoint is revealed on another (possibly LC block pro- tree manager proposals Checkpoint Checkpoint LC block HotStuff shorter) chain. Then, there can be honest blocks that do not con- Votes Votes tribute to chain growth due to checkpoints arising at competing LOGacc chains. This feature of checkpoint-respecting LCs violates a core Checkpoints assumption of the standard proof techniques [21, 26, 37] for LC protocols. Hence, to bound the number of such abandoned honest New Proposals HotStuff messages LOGda blocks and demonstrate the self-healing property of checkpoint blocks respecting LCs, we follow a different approach introduced in [39]. New blocks In this context, we first observe the gap and recency properties for libp2p Gossipsub network Network Πacc (Appendix G.4) which were highlighted in [39] as necessary conditions for any checkpointing mechanism to ensure the self- Figure 5: Components and their interactions in our imple- healing of Π lc (box 4 of Figure 4). The gap property states that the mentation of Figure 2. Gray: off the shelf components used checkpoint interval cp has to be sufficiently longer than the time as black boxes. Blue: taken from Πlc without modification. it takes for a checkpoint proposal to be checkpointed by Πacc . The Green: taken from Π lc , modified to respect checkpoints. recency property requires that newly checkpointed blocks were held in the checkpoint-respecting LC of at least one honest node within a short time interval preceding the checkpoint decision. attack which we have discovered (detailed in Appendix I). Using the gap and recency properties, we next extend the analy- Implementation: Our prototype is implemented in the program- sis of [39] to Proof-of-Stake protocols by introducing the concept ming language Rust. A diagram of the different components and of checkpoint-strong pivots, a generalization of strong pivots from their interactions is provided in Figure 5. We use a longest chain [37]. Whereas strong pivots count honest and adversarial blocks protocol modified to respect latest checkpoints as Πlc , with a per- to claim the convergence of the LC in the view of different honest missioned block production lottery with winning probability per nodes, checkpoint-strong pivots consider only the honest blocks node and per time slot of duration slot ; and HotStuff2 as Π bft . Hon- that are guaranteed to extend the checkpoint-respecting LC, thus est nodes pause HotStuff (including its timeouts) while waiting for resolving the issue of non-monotonicity for these chains. Recur- the next checkpoint proposal. All communication (including Hot- rence of checkpoint-strong pivots after max(GST, GAT) (box 5 of Stuff’s) takes place in a broadcast fashion via libp2p’s Gossipsub Figure 4) along with the gap and recency properties of Π acc enable protocol3 , mimicking Ethereum 2’s network layer [1], to be able us to assert the security of Π lc after max(GST, GAT). Details of the to scale to thousands of nodes. Thus, we assume that under nor- security analysis for Π lc can be found in Appendix H. Given the mal conditions every message received by one honest node will be self-healing property of Πlc , liveness of LOGacc follows from the received by all honest nodes within some bounded delay. Since re- liveness of Π bft after max(GST, GAT) (box 7 of Figure 4), full proof sponsiveness is not so important for our checkpointing application of which is given in Appendix G.3. and to avoid broadcasting quorum certificates, we use a variant Finally, the prefix property follows from the fact that both LOGda of HotStuff where to ensure liveness the leader waits for the net- and LOGacc are derived from the checkpoint-respecting LC. In par- work delay bound before proposing a block. Our prototype does ticular, while LOGacc corresponds to the prefix of the latest check- not implement the application logic of the beacon chain (such as point, LOGda corresponds to either the prefix of the latest check- validators joining and leaving, integration with shard chains and point or that of the -deep block on the checkpoint-respecting LC, Ethereum 1, etc.) which can be realized on top of consensus in selecting whichever one is longer. Hence, by construction, LOGacc the same way as currently done in Ethereum 2, and our prototype is always a prefix of LOGda . does not use any orthogonal techniques to reduce bandwidth by constant factors (such as signature aggregation, short signature 5 EXPERIMENTAL EVALUATION schemes, compression of network communication, etc.) which are To evaluate whether the protocol of Section 4.1 can serve as a drop- not fundamental to the consensus problem. in replacement for Gasper as the Ethereum 2 beacon chain protocol, we have implemented a prototype1 . Having proved security and Choice of parameters: We chose the parameters of our protocol accountability in Section 4.2, we are interested in its real-world in the experiments to match the parameters of Ethereum 2’s beacon behavior and performance characteristics at operating points cho- chain. The beacon chain has = 32 slots per epoch and = 128 sen to match those of Ethereum 2. We then compare Gasper and validators per slot, for a total of = 4096 validators (per epoch), our protocol in terms of the average required bandwidth and the which is the approximate number of nodes that we run our experi- latency of the accountable ledger, and conclude that our protocol ments with. To match the block inter-arrival time (i.e., the duration can exceed the performance of Gasper. In particular, our protocol of one slot) of 12 s in the beacon chain, we set = 1/ and account incurs comparable bandwidth at reduced latency. Finally, we ob- for the probability of no node winning the block production lottery serve that Gasper’s resilience decreases as the number of nodes and choose slot = 7.5 s. We also match the block payload size of increases, for fixed latency of the accountable ledger, due to a new 2 We used this Rust implementation: https://github.com/asonnino/hotstuff 1 Source code: https://github.com/tse-group/accountability-gadget-prototype 3 We used this Rust implementation: https://github.com/libp2p/rust-libp2p 10
You can also read