Strategic Mining in Proof-of-Stake with Practical Random Election

Zhuo Cai

doi:10.33140/AMLAI.06.04.01

Advances in Machine Learning & Artificial Intelligence(AMLAI)

ISSN: 2769-545X | DOI: 10.33140/AMLAI

Impact Factor: 1.755

Researchers and authors can directly submit their manuscript online through this link Online Manuscript Submission.

Track Your Submission

Share this page:

Indexing

Open Access Journals

Research Article - (2025) Volume 6, Issue 4

View PDF Download PDF

Strategic Mining in Proof-of-Stake with Practical Random Election

Zhuo Cai ^*

Hong Kong University of Science and Technology Hong Kong SAR, China

^*Corresponding Author: Zhuo Cai, Hong Kong University of Science and Technology Hong Kong SAR, China

Received Date: Sep 26, 2025 / Accepted Date: Oct 21, 2025 / Published Date: Nov 07, 2025

Copyright: ©Â©2025 Zhuo Cai. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Citation: Cai, Z. (2025). Strategic Mining in Proof-of-Stake with Practical Random Election. Adv Mach Lear Art Inte, 6(4), 01-10.

Abstract

The security of blockchain systems relies on the honest majority assumption. However, strategic mining threatens this assumption, because selfish miners can gain more block rewards than honest miners by attacks such as withholding blocks. Due to its significant implication, blockchain mining games have been studied in PoW and PoS under various settings using different methods. Nonetheless, this paper argues that the practical limitation of random beacons has not been exploited in strategic mining in PoS blockchains.

Current PoS blockchains use random beacons to randomly select validators for each slot. However, the randomness is usually fixed for multiple slots, due to the latency of distributed random beacon protocols. This indicates that validators actually know some information about the election result in the future, which contrasts with the Markov process models in previous analysis. Using this information, this paper presents a close to optimal mining strategy based on an optimal interval scheduling algorithm for each epoch. For proof-of-stake protocols with no propagation delay, we show that a validator with arbitrary proportion of stake can strictly benefit from strategic mining and get significantly higher block rewards than the previous strategies.

Keywords

Game Theory, Blockchain Mining Games, Miner Revenues

Introduction

The security of blockchain consensus protocols relies on a super- majority of honest miners or validators. For example, the Bitcoin consensus protocol, based on Proof-of-Work (PoW) and longest- chain fork selection rule, requires that more than 50% of the hash power should be controlled by honest miners. Proof-ofStake (PoS) protocols, such as Ouroboros, also assume that honest miners own more than 50% of the total stake [1]. If the honest majority assumption does not hold, disastrous attacks might happen, including double spending, censoring transactions and reverting the history. Even though it is reasonable to believe that malicious miners can never afford to become the super majority, blockchain security is further undermined by selfish mining attacks. Showed that malicious miners α < 1/2 of hash power can publish more than α of blocks in the finalized chain, by withholding block proposals in attempt to exclude honest blocks from the blockchain, so that they can earn more block rewards than they should [2-4]. Gradually, malicious miners become relatively richer and will control more than 50% of hash power in the end to perform disastrous attacks.

Proof-of-Stake protocols have gained popularity in recent years due to its energy-efficiency compared to Proof-of-Work, as evidenced by Ethereum’s update to switch from PoW to PoS. Selfish mining on PoS blockchains is also studied in the literature. For example, shows that a malicious miner with 32.8% of the total stake can strictly outperform a honest miner with the same stake, in proof-of-stake protocols with perfect randomness [5]. Perfect randomness means there is a perfect decentralized random beacon that emits a random number in each slot to select the miner for the slot. However, for security and efficiency reasons, in practice, decentralized random beacons in PoS emits fresh random numbers only once every epoch, where one epoch consists of multiple slots. For example, an epoch consists of 32 slots in Ethereum 2.0 and 432,000 slots in Cardano [6,7]. Therefore, miners know which of the future slots in the current epoch or even in the next epoch belong to themselves. In contrast to PoW or PoS with perfect randomness where miner do not know whether they can publish blocks in the future, malicious miners can utilize this information to launch selfish mining attacks with arbitrarily small stake and earn more block rewards.

Our Contribution Our main contribution is to characterize the effect of the Knowledge about Future miner Election results (KFE) in proof-of-stake selfish mining. As far as we know, this is the first work to address the knowledge of future election results in PoS selfish mining. We extensively study selfish mining with KFE in different settings. In more detail, we have the following results:

• In the basic setting of Proof-of-Stake protocol with longest chain rule, perfect communication and the majority (1 − α) of stake controlled by honest miners, we show that selfish mining strategy strictly outperforms honest mining strategy for a malicious miner with α of stake for any α > 0.

• We present a deterministic optimal algorithm for the malicious miner to maximize his relative mining reward for one epoch. Using the algorithm for one epoch, we present a mining strategy that achieves close to optimal block ratios.

• We run simulation experiments to show that relative mining reward is vastly increased by our selfish mining algorithm, compared with previous strategies.

In practice, mining rewards include block rewards and transaction fees. For simplicity, we only consider a fixed block reward.

Related work

Selfish Mining

Selfish mining is one class of attacks on blockchain consensus protocols. A miner gets a chance to produce a block for a specific block with a probability that is proportional to her hash power in PoW and proportional to her deposited stake in PoS. Ideally, all valid blocks form a chain without forks, and miners receive rewards for contributing their blocks. Therefore, miners should receive block rewards in proportion to their hahs power or deposited stake. However, if there are forks, only one fork, for example the longest fork, can be selected to form the consensused chain. Blocks in discarded forks do not yield block rewards for their owners.

The direct goal of the selfish mining attack is to deliberately exclude blocks proposed by honest miners from the finalized blockchain so that an attacker with less than 50% of hash power or stake can receive more block rewards than they should. In PoW blockchains like Bitcoins, assume honest miners always publish their blocks immediately when they successfully mine new blocks, all messages are immediately delivered to all nodes in the peer-to- peer network without delay and honest miners always extend after the longest observed chain [2,3]. When the current block head is B₀, if a malicious miner finds a block B₁, he might deviate from the prescribed protocol by withholding the block B₁ hoping that he can mine the next block B₂ before other honest miners. In one of the lucky situations, the malicious miner owns but withholds B₁ and B₂, while honest miners own B₃ pointed to B₀. Then the malicious miner can publish blocks B₁ and B₁ as a fork (B₀ ← B₁ ← B₂) longer than (B₀ ← B₃). Honest miners will extend after B₂ instead of B₃ according to the longest chain fork choice rule. In other cases, honest miners get the block B₂ and B₃ before the malicious miner, the malicious miner might give up on B₁ because the fork (B₀ ← B₁) is not likely grow longer than (B₀ ← B₂ ← B₃). Models this mining game as a Markov Decision Process (MDP) for the malicious miner [3,4]. They show that a miner with 33% of hash power can benefit from selfish mining, i.e., he owns more than 33% of blocks in the finalized chain and consequently receives more than 33% of the total block rewards. Assuming PoS blockchains use a perfect random beacon to select a miner randomly at each slot and, more specifically, miners do not know any additional information about the selection result prior to the slot, extends the MDP analysis of selfish mining to PoS and shows that malicious miners only need to deposit 30.8% ∼ 32.5% of the total stake to benefit from selfish mining. In this work, based on the practical usage of random beacons that update the randomness only once per epoch, we show that selfish mining is much easier and more profitable than in PoS blockchains with perfect random beacons that update per slot [5].

Subsequent works on selfish mining address more realistic issues. Discusses the case where block rewards diminishes and transaction fees become the dominant part of mining rewards [8]. Besides making the mining rewards non-uniform among different blocks, shows that selfish miners might fork an existing block that contains many profitable transactions instead of extending after it [8]. Moreover, miners are incentivized to not include the entire remaining transactions to incentivize subsequent miners to extend after their blocks rather than fork their blocks, even if the block capacity is not the bottleneck. Deciding how many transactions increase the action space and state space from discrete to continuous.

Besides, miners with arbitrary stake are incentivized to do some form of strategic mining, so that it is more meaningful to study the equilibrium of all miners, rather than the optimal strategy of single malicious miner. As a result, the analysis becomes highly complicated so that uses simulation rather than rigorous mathematical MDP analysis [8]. A recent work even adopts deep reinforcement learning to study the strategic mining problem [9]. The complicated settings of these works are out of the scope of this work.

Random Election in PoS

Proof-of-Stake consensus protocols avoid wasting tremendous electricty in computing uesless puzzles in PoW by mimicking the distributed random election of block proposers based on random beacons, or distributed random number generation (RNG). Distributed random number generation should output random numbers that are agreed by the entire nodes, uniformly random, bias- resistant and even unpredictable against a collusion of a subset of nodes. Random beacons are non-trivial and attracted wide research interests in the community especially due to the adoption of Proof- of-Stake consensus protocols in blockchains, such as [10-17].

Most common distributed random beacons are instantiated by distributed protocols among a set of participants where each participant contributes some local randomness independently. In the simplest form, a set of n participants (P₁ , P₂,...,P_n) jointly generate a random bit v ∈ {0,1}. Each participant Pi chooses her own local random number xi without knowing the choice of other participants. Define the random output to be v = x₁ ⊕ x₂ ⊕•••⊕x_n. It is easy to see that v is a uniformly random bit as long as at least one of the participants chooses their local bits uniformly at random. Moreover, any subset of ≤ n − 1 participants cannot collaborate to bias the output from uniformly random distribution or guess the output before seeing the other participants’ choices. The technical challenge is to implement the protocol by communication in distributed systems and prevent anyone from knowing others’ choices before publishing her own choice. In the literature, various cryptographic techniques are adopted to achieve the simultaneous publishing, including commitment schemes, publicly verifiable secret sharing (PVSS) and verifiable delay functions (VDF) [10,11,18]. Commitment schemes and PVSS consist of at least 2 rounds and require significant time for each round to make sure the communication among participants is synchonized, while VDFs must be evaluated for more than synchronization time. In summary, current distributed random beacons require considerable time to generate a fresh random number, much longer than the duration of a slot in proof-of-stake blockchains. Therefore, existing PoS blockchains update the random seed only once per epoch, instead of once per slot.

In each slot t, miner election result is determined by the slot number t, the random seed r of the epoch and optionally metadata of miners. There are typically two cases for the metadata of miners. In the first case, one slot leader is elected uniformly from a known fixed committee of m miners. The solution is use a pseudo- random number generator prng and select the prng(r, t) (mod m)- th miner in the committee. In the second case, there is no fixed committee and every stake holder with address addr can propose a block if prng(r,t,addr) < ρ, where ρ is a difficulty parameter to control the expected number of leaders per slot. Since r,t,addr are known to any miner at the beginning of the epoch or earlier, the miner knows the election results in future slots of the epoch. Even if more advanced cryptographic primitives are adopted, such as verifiable random functions (VRF) or single secret leader election (SSLE) [4], to keep the election results as secrets to miners, each miner should at least know whether the leader of a slot is herself or not [19,20].

Existing blockchains suffer from low throughput, which results in high transaction fees and limits the widespread application of decentralized technologies. Since PoS blockchains aim at increasing the throughput, they typically use a shorter timeslot to generate a new block. On the other hand, distributed random beacons must have large enough committees to jointly generate the random numbers and use complicated communication protocols to be secure. Concerning the current situations such as the usage of RANDAO in Ethereum 2.0 and the future challenges, it is necessary for security researchers and PoS protocol designers to keep in mind that miners know (partial) election results in the future.

Mitigations of Selfish Mining

Due to the possibility of selfish mining attacks, especially attacks that can be successfully launched by arbitrarily small miners in the real world, the blockchain mining scenario might be significantly different from what blockchain designers expect and the security of blockchain is severely undermined. Therefore, the community has come up with solutions to mitigate the selfish mining attacks. For example, proposes a novel proof-of-work based solution to disentangle the relationship between the number of blocks in the chain and the amount of mining rewards, by associating mining rewards with fruit blocks that are referred by consensus blocks which form the blockchain [21]. Ethereum 2.0 claims to be immune to selfish mining, which they refer to as avalanche attacks, by requiring honest miners (validators) to ignore block of slot t₁ when they already agreed on a block of slot t₂ > t₁, according to Latest Message Driven (LMD) GHOST [6]. However, the security relies on stronger requirements on communication synchrony and user availability. Besides, since Ethereum 2.0 punishes late attestations, honest validators are more vulnerable to attacks against the peer- to-peer networks.

In summary, proof-of-stake protocols are still developing and evolving rapidly. Selfish mining attacks may never be completely resolved because other desired properties might be sacrificed. Therefore, the findings of this work, notably random beacons that are not updated frequently enough leak information about future slot leader election results, should be taken into the account by PoS protocol designers.

PoS with KFE Model

In this section, we present a complete specification of a simplified model of PoS blockchain protocols with random beacons updated once per epoch. We call it PoS with KFE (knowledge of future election results).

Miners We assume there are two miners: miner M₁is malicious, while miner M₂ is honest. M₁ deposited α(0 < α < 1/2) of the total stake and M₂ deposited the rest 1 − α of stake. While the blockchain grows, M₁ and M₂ might receive different mining rewards. However, we assume that both M₁ and M₂ do not change their deposited stake. This means α is constant throughout the lifespan of the blockchain. We will define the behavior of these two miners later.

Transactions We ignore transactions in our model, because we do not consider the effect of transaction fee rewards or the attack of censoring particular transactions.

Timing Blockchain is a dynamic system. Our model use a discrete time system and use slot as the basic time unit. Blockchain starts from slot 0 and extends infinitely. We also define epoch as T slots, so that slots (k −1)T +1 to k_T form the k-th epoch for k ∈ {1,2,...}. In each slot, we select one of the miners to be the leader of the slot.

Leader of slot t can propose a block for the slot. Every epoch k uses a different random seed rk for leader selection that is unpredictable by either miner in previous epochs and known to both miners from the beginning of the epoch k.

Blocks In each slot t, the slot leader can create and own a block B_t. A valid block should specify its predecessor, which is a previous valid block B_t′, t′ < t. Once a block becomes part of the finalized blockchain, its owner receives a fixed amount of reward R. The honest miner M₂ always creates only one block for one slot when she is the leader and immediately publishes the block. The malicious miner M₁ might create multiple blocks for one slot, by pointing to different predecessors, and might withhold these blocks and publish one of them later. M₁ cannot publish ≥ 2 different blocks for one slot, because M₂ can detect this dishonesty and punish M₁ severely. There is a genesis block B₀ at slot 0 that does not belong to M₁ or M₂ but agreed by both miners as the first block.

Communication Since we exclude transactions, it suffices to consider that M₁ and M₂ have a communication channel between each other. We assume M₁ and M₂ can send arbitrarily many messages through the channel each other and the messages are delivered to the recipient immediately. This assumption is rather an oversimplification, especially because selfish mining such as maliciously withholding blocks can be detected by honest miners when the communication is perfect. We remark that if we consider the setting that the communication might be delayed longer than a slot, the total stake ratio of multiple honest miners is effectively discounted because they create forks.

Forks, Views and Blockchains Ideally, blocks form a chain (B₀ ← B₁ ← B₂ → ••• → B_t) after slot t. However, since M₁ might withhold his blocks and point to arbitrary predecessors, blocks might form forks and M1 and M2 might have different views of the forks. For example, if M₁ owns and withholds blocks B₁(→ B₀) and B₂(→ B₁), M₂ owns and publishes block B₃(→ B₀), then the view of M₁ is two forks (B₀ ← B₁ ← B₂) and (B₀ ← B₃), while the view of M₂ is one fork (chain) (B₀ ← B₃). The consensus blockchain after slot t is defined as the longest fork, which might be different for M₁ and M₂. We can simplify the views. M₂ always extends after the longest fork, so she only keeps the longest fork in her view and ignores other forks. M₁’s view can be compactly represented by M₂’s view and slots of M₂ where M₂ has not published a block. In our example, M₂’s view is (B₀ ← B₃,{1,2}). We define depth of a block as the length of the fork ending at the block.

Reward and Payoff The longest fork in M₂’s view is considered as the consensus chain, denoted as chaint. After slot t, the reward of M_i, REW_i,t(chaint), is defined as the number of blocks of M_i in the consensus chain chaint. Since the goal of selfish mining is to maximize the ratio of REW₁ versus REW₂, we define the payoff of M₁ as ρt (chaint, λ) = REW₁,t (chaint)•(1−λ)−REW1,t (chaint)•λ. λ is introduced as inspired by [19,10] to facilitate aggregating the payoffs of different slot intervals. λ is closely related to the proportion of block rewards received by M₁. If M₁ owns λ of the blocks in chaint (excluding the genesis block), then ρ_t = 0.

Strategies M₂ always uses the simple honest strategy so it suffices to only consider the strategy of M₁. Suppose M₁ uses strategy π, that at the beginning of slot t, given his view at t − 1, (chain′t, slotst−1 = s₁, s₂,...), according to whether he is the leader of slot t and future slots in the current epoch, chooses his action. When M₂ is the slot leader, M₂ publishes her block B_t before M₁ chooses his action. chain′t refers to the longest chain after M₂ publishes her latest block Bt. If M₁ is the slot leader, he adds slot t to his state, updates slott−1 to slot'_t = slot_t1 U {t}.

We call the set slott as the available slots. The valid actions of M₁ is to choose a subset pubt of the available slots to publish blocks. For each chosen slot, M₁ publishes one block and specifies its predecessor block. In the end of slot t, the set of available slots becomes slot_t = slot'_t / pub_t.

Runs and Randomness We define a run to be a particular execution path, consisting of the view of M₁at every slot, determined by random seeds {rk} at every epoch and the strategy π of M_1. We define a variable et for every slot t ≥ 1 to represent the selected leader for slot t. If e_t = 0, M₁ is the leader of slot t. Otherwise e_t = 1 and M₂ is the leader. The leader election result of epoch k is represented by bit sequence e_k = e(k−1)T+1e(k−1)T+2 •••e_kT, determined by random seed r_k. Assuming rk is a uniformly drawn integer from a large range, ek is uniformly distributed in {0,1}T. r is independent from any ≠ k. With a bit abuse of notation, we refer to the payoff ρ_t(chaint, λ) as ρ_t(e₁e₂ •••e_t ,π,λ), because chaint is determined by the election results and strategy of M₁. Previous works uses MDP analysis and measures the expected payoff of a strategy π over all randomness used in slot leader election [4,5]. Our work also measures the expected payoff over leader election. If we consider the selfish mining game within an epoch, for simplicity the first epoch, and want to maximize the payoff at the end of the epoch, then the problem becomes an offline algorithm so that we can find a deterministic optimal algorithm.

Optimal Strategies

This section first presents an optimal strategy for M₁ to maximize his payoff in the first epoch. We recall the problem in subsection 4.1 present the strategy in subsection 4.2 and prove its optimality in subsection 4.3. If slots at the boundary serve as checkpoints, so that honest miners ignore blocks of epoch k′ < k received after epoch k starts and the malicious miner cannot withhold blocks across epochs, the optimal strategy for the first epoch can be repeated for every epoch and remains the optimal strategy.

Problem Formulation for the First Epoch

We use the model in section 3 and only consider the first epoch. For a run of the mining game, suppose the slot election result is the bit sequence e₁ = e₁ e₂ ••• e_T.

Payoff of Honesty If M₁ uses the honest strategy πh, then the chainT is a chain of T + 1 blocks (B₀ ← B₁ ← B₂ ••• ← B_T), where B₀is the genesis block and B_i (1 ≤ i ≤ T) is owned by M₁ if e_i = 0, otherwise owned by M₂. The payoff is

Using our optimal strategy π_s , M₁ always receives payoff no less than T(η − λ).

Optimal Strategy for the First Epoch

Extremely Lucky Case (η > 1/2) In an extremely lucky case, M₁ owns more slots than M₂ in the first epoch. In this case, M₁ can simply withhold all of his blocks until near the end of slot T. M₂ does not know blocks of M₁ so she grows a chain consisting of her own slots. Before the slot T ends, M₁ publishes his fork using all of his slots. Since the fork of M₁ is longer than the view of M₂, M₂ gives up her old view and agrees on the new fork that only includes blocks of M₁. This simple strategy is obviously optimal and achieves payoff Tη(1−λ) for M₁. This lucky case happens with only a low probability, because α < 1/2, and admits a simple optimal strategy. In the following discussion, we can focus on the more complicated case when η ≤ 1/2.

Fork Attack Firstly we discuss the condition for the selfish miner M1 to win the fork competition and exclude one fork produced by M₂. An important observation is that when M₁ publishs a fork fork = (•••Bc ←)B_s1 ← B_s2 •••Bsg that diverges from the view of M₂ viewold = (•••B_c ←) B_h1 ← B_h2•••Bhf after the common block Bc and longer than the view of M2 by at least 1, M₂ will give up her old view. This is also the necessary condition for successfully excluding an honest fork. If M₁publishes a fork that is the same long or shorter than M₂’s view, M₂ will ignore the fork and continue extending after her own view. The two sets of slots S = {s₁,s₂,...,s_g} and H = {h₁,h₂,...,h_f} are disjoint because Bc is defined as the last common block. Under strategy πs, slots in S are all controlled by M1 and slots in H are all owned by M₂.

Interval Attack π_s only considers attacks such that S ∪ H forms an interval of slots I = [t = min{s₁ ,h₁},t₂ = max{s_g , h_f }]. Therefore, we call such a fork attack as an interval attack. When interval attack is successful in I, we call I a valid interval. Note that a valid interval consists of at least 3 slots, because the trivial case of 1 M₁’s slot does not exclude any honest block. We imformally justify why this restriction does not lose optimality. Suppose there exists a slot t in the interval I but is not in S ∪ H.

• If the leader of t is the honest miner M₂, then M₂ should have mined a block B_t and included it in her old view before M₁ publishes his fork.

• If the leader of t is M₁, he can use another set of slots S′ = (S ∪{t})\{s_g} to form another fork of the same length but saves the slot sg for future attacks. In contrary, if M₁does not use slot t in the fork, he can never use the slot t in future attacks without the cost of excluding his own slot sg.

• If t > s_g, this means M₁ has additional slots so that he can wait until M₂ catches up and exclude more blocks of M₂. Multiple Interval Attacks Under strategy π_s, M₁ might perform multiple interval attacks that do not intersect each other. If M₁chooses intervals I₁,I₂,...,I_d (the intervals are sorted so that the largest slot of Ii is smaller than the smallest slot of Ii+1), M₁’s full mining strategy is the following

• M1 acts honestly when the current slot t is not in any of these intervals, i.e., he does not withhold any block or form forks.

• During slots [t_i1, t_i2 − 1] in a slot interval Ii = [t_i1,t_i2], M₁withholds his slots and does not publish any block.

• At the last slot t_i2 of a slot interval Ii = [t_i1,t_i2], M1 waits until M₂ publishes her block if she is the leader of t_i2, then M₁ publishes a fork consisting of all of his slots in the interval I_i and connects the fork to the last common block right before I_i.

Under π_s which specifies intervals I₁,I₂,dots,I_d for M_1, the final chain chainT consists of slots

& All slots of M₁ are included in chainT. In each interval, all slots of M₁ are included in chainT while slots of M₂ are all excluded. The goal of M₁ is to exclude as many M₂’s slots as possible.

Interval Scheduling π_s chooses one subset of non-intersecting intervals from the set of all valid intervals that maximizes the number of honest blocks excluded in these intervals. This is a weighted interval scheduling algorithms and has an efficient algorithm of Θ(T + |I′|) time complexity, where T is the range of time units and I′ is the set of all valid intervals. Note that for valid intervals are slightly modified from I = [t₁,t₂] to I′ = (t₁ − 1,t₂] before running an interval scheduling algorithm.

an interval scheduling algorithm finds the optimal selection of a subset of non-intersecting intervals that maximizes the total weight. The algorithm uses bottom-up dynamic programming. The optimal total weight for the range (0,T], W[T], is the maximum of different subcases:

– The slot T is not covered by any interval in the optimal solution. In this case, W[T] = W[T −1], the optimal total weight for the subrange (0,T −1].

– The slot T is covered by an interval I′ = (t₁,t₂] with weight w in the optimal solution. Firstly, t₂ must be equal to T. If t₂ < T, T is not covered by I′. If t₂ > T, the interval is not covered by the range (0,T]. In this case, the optimal total weight is W [t₁] + w.

Optimal Strategy The optimal solution Sol of interval scheduling can be reconstructed from W[0...T] and Temp[0...T] as illustrated in the pseudocode of algorithm 1. π_s use the intervals in Sol to instruct M₁ to perform the multiple interval attacks.

Algorithm 1: Interval Scheduling Algorithm

Optimality Proof

At each slot, M₁ can choose to release a subset of his unpublished slots. If the published blocks form a chain that extends after the longest chain in the honest view, both M₁and M₂ acknowledge the new chain. Otherwise the published blocks form forks w.r.t. the honest view. These blocks might form multiple forks. If all forks are no longer than the honest view, M₂ ignores the new forks and continue with her old view. If some forks are longer than M₂’s old view, M₂ changes to extend after the longest fork (breaking ties arbitrarily when there are multiple longest forks) and ignores other forks. Therefore, in the view of M₂, M₁ effectively only publishes exactly one fork in each fork attack. M₁might try to extend after shorter forks, as long as he does not publish two different blocks for one slot.

We use the following propositions on fork attacks to show that an optimal strategy exists after we prune the strategy space and reduce the selfish mining problem to an interval scheduling problem.

Proposition 1. M₂ never mines two different blocks (of different slots) with the same depth. Moreover, M₂ mines blocks with strictly increasing depth. Immediately after a successful fork attack, the next honest block is deeper than the previous honest block by at least 2 because the head of the new fork is deeper than the previous honest block by at least 1. In other cases, M₂ extends after the longest chain which is at least as deep as her previous block.

Next we want to make a proposition that an optimal strategy will not jump between two conflicting forks back-and-forth. In other words, after M1 successfully excludes some blocks in a fork attack, he will never extend after those discarded blocks. To formally analyze this proposition, we define conflicting forks precisely first. Recall that a fork is a chain from the genesis block to the most recent block and a slot has at most one block. A fork can be uniquely represented as fork(t) for the slot t. Two forks fork(t₁) and fork(t₂) are conflicting forks if neither fork is a prefix of the other.

Proposition 2. In an optimal strategy, there cannot be three different views of M₂, fork(t₁), fork(t₂) and fork(t3), (t₁ < t₂ < t₃) such that fork(t₂) conflicts with fork(t₁₎ and fork(t₃) conflicts with fork(t₂), but fork(t₁) is a prefix of fork(t₃).

be included in the final chain and use fewer blocks to form fork(t₃).

Fork Attacks and Intervals Proposition 2 and 3 show that the slots effected (published by M₁ or excluded from the old forks) in two slot attacks have no intersection. It is also easy to see that, in each slot attack, the set of excluded honest slots form a continuous interval (ignoring M_1’ slots in between). We can also show that M₁’s slots affected by a slot attack form a continuous interval (ignoring M₂’ slots in between), by requiring M₁ to always use the earliest available slots first. Now we want to show that M₁’s slots and M₁’s slots affected by one fork attack form a continuous slot interval, i.e., H ∪ S is a slot interval.

one more honest block in future attacks.

With proposition 4, a slot attack can be represented by an interval in an optimal strategy. Moreover, M₁ cannot perform two attacks on two intervals with non-empty intersection, because he will not exclude an honest slot twice according to proposition 3 and cannot use his own slot twice. So far we have proved that an optimal strategy exists such that M₁ performs attacks in a few non- intersecting intervals. Besides these intervals, M₁ publish blocks honestly.

Global Strategy for Multiple Epochs

Since M₁ does not know the future miner election result of the next epoch, he cannot extend the above optimal strategy for the first epoch to future epochs directly. Now we present how should miner M₁ act at the intersection of two epochs, according to three different cases for the first epoch:

• In the first epoch, M₁ owns more slots than M₂, so that M₁ can exclude all blocks of M₂ by presenting a fork of length exactly large than the number of M₂’s slots by one. If M₁ has s remaining slots, he can withhold these slots until the next epoch starts. At the beginning of the next epoch, the state of the game is equivalent to the case that the next epoch has T + s slots and the first s slots are all controlled by miner M₁. Then M₁ can use the optimal strategy for single epoch.

• In the first epoch, M₁ cannot exclude all blocks of M₂. If the first epoch ends with a winning interval for M_1, he should not have any remaining hidden slots and he wins the interval with exactly one more block. This means he wastes at most one block compared to the case that he knows all information of both epochs at the beginning of the first epoch.

• M₁ cannot exclude all blocks of M₂ and the first epoch does not end with a winning interval. In this case M₁ should have already published all his blocks as a honest player by the end of the first epoch. If M₁ is extremely lucky to own more slots in the second epoch, then he could have used the extra slots from the second epch to exclude M₂’s blocks in the first epoch if M₁ knows all miner election results at the beginning of the first epoch. In this unlikely case, M₁ may waste up to 1 slot in the optimal strategy. Otherwise, M₁ does not waste any slot compared to the global optimal strategy.

Overall, in the worst case, for a period of n epochs (nT slots in total), the above global strategy is worse than the global optimal strategy by excluding at most n slots of M₂. In most cases, the gap is even much smaller. This global strategy ensures that every block of M₁ is included in the chain, while the strategies proposed by previous works risk losing some blocks of M₁.

Experiment

In this section, we use random samples to estimate how much payoff gain is achieved by the optimal strategy in section 4. We also compare it with the MDP strategy in that does not use the information of future leader election [5].

MDP Strategy The MDP strategy is the following:

– The original state is (0,0), when M₁has no withholding block and agrees to extend after the same fork as M₂. From state (0,0), if M₁ gets the next slot, M₁ withholds it and the state becomes (1,0). Otherwise M₂ publishes a block and M1 should accept it, so the state remains (0,0). – At state (1,0), if M1 mines the next block, it transits to state (2,0). Otherwise it transits to (1,1). M₁ waits in both cases. – At state (1,1), if M₁mines the next block, he publishes two blocks to exclude M₂’s block and the state returns to (0,0). Otherwise M₂ publishes a block and the state goes to (1,2). – At state (1,2), if M₁ mines the next block, it goes to state (2,2). Otherwise M₂ mines one more block. M1 gives up his block and accepts the three blocks of M₂. The state returns to (0,0). – At state (2,2), if M₁ mines the next block, he publishes all 3 blocks and successfully exclude 2 blocks of M₂. The state returns to (0,0). Otherwise M₂ gets the next block. Now the latest 5 slots belong to (M₁,M₂,M₂,M₁,M₂) respectively. M₁ gives up the first slot but still holds the fourth slot. The state transits to (1,1). – At state (2,0), M1 waits until M₂ catches up. When M₁ has exactly 1 more block than M₂, M₁ publishes all of his blocks and successfully exclude all blocks of M_2. The state returns to (0,0).

Experiment Parameters We consider a game with 10 epochs, where each epoch consists of 100 slots. We test the performance of our optimal strategy and the MDP strategy for M₁ with different stake ratios. The column η represents the stake ratio of M₁. The second, third and fourth columns represent the ratios of blocks of M₁ included in the blockchain if M₁uses our optimal strategy for each epoch, the global optimal strategy and the MDP strategy introduced above, respectively.

Result The experiments show that our strategy significantly outperforms the MDP strategy. Our strategy is also very close to the global optimal strategy, especially in the usual case when a miner controls much smaller than 50% of the stake.

η	Ours	Optimal	MDP
0.05	0.0503	0.0503	0.0105
0.10	0.1019	0.1020	0.0360
0.15	0.1566	0.1566	0.0606
0.20	0.2174	0.2179	0.1311
0.25	0.2838	0.2864	0.1929
0.30	0.3736	0.3736	0.2783
0.32	0.4066	0.4092	0.3140

0.34	0.4450	0.4480	0.3329
0.36	0.4952	0.4986	0.4029
0.38	0.5413	0.5452	0.4385
0.40	0.5988	0.6061	0.4868
0.43	0.6891	0.6969	0.5508
0.46	0.7850	0.8214	0.6959
0.48	0.8759	0.9006	0.7613

Note that the MDP strategy only allows a miner with more than 32% of the total stake to benefit from strategic mining. To increase the relative revenue by 10%, approximately 36% of stake should be hold by the miner. In practice almost no miner or mining pool has such amount of stake. Using our strategy, a miner with 22% stake can already increase his revenue by 10% via strategic mining.

Conclusion

This paper is the first to discuss the effect of real-world distributed random beacons on the blockchain mining games. Previous works usually assume that the random seed to select validators is refreshed in every slot, but this paper points out that this is not the case in real-world PoS blockchains such as Ethereum and Cardano.

Fixing the randomness for an epoch of multiple slots allow blockchain nodes to reach consensus on the randomness, but also allows selfish miners to take advantage of the knowledge about future validator election result. Specifically, this paper presents a close-to-optimal block mining strategy that allows any miner to gain more block rewards in proportion, through an efficient interval scheduling algorithm. We mathematically show the optimality of our strategy for a single epoch and also evaluate its concrete performance using simulation experiments.

The findings of this work urge the community to improve distributed random beacon protocols and especially reduce the latency and round complexity. As for future directions, researchers should take into consideration the current random beacons in blockchain protocol design and security analysis.

Advances in Machine Learning & Artificial Intelligence(AMLAI)

ISSN: 2769-545X | DOI: 10.33140/AMLAI

Impact Factor: 1.755

Advances in Machine Learning & Artificial Intelligence

Indexing

Open Access Journals

Strategic Mining in Proof-of-Stake with Practical Random Election

Abstract

Keywords

Introduction

Related work

PoS with KFE Model

Optimal Strategies

Experiment

Conclusion

References

Important Links

Locate Us