Over the past five years of experience with Bitcoin and alternative cryptocurrencies, one important property for proof of work functions that has been discovered is that of "memory-hardness" - computing a valid proof of work should require not only a large number of computations, but also a large amount of memory. Currently, two major categories of memory-hard functions, scrypt and Primecoin mining, exist, but both are imperfect; neither require nearly as much memory as an ideal memory-hard function could require, and both suffer from time-memory tradeoff attacks, where the function can be computed with significantly less memory than intended at the cost of sacrificing some computational efficiency. This paper presents Dagger, a memory-hard proof of work based on moderately connected directed acyclic graphs (DAGs, hence the name), which, while far from optimal, has much stronger memory-hardness properties than anything else in use today.
The main reason why memory hardness is important is to make the proof of work function resistant to specialized hardware. With Bitcoin, whose mining algorithm requires only a simple SHA256 computation, companies have already existed for over a year that create specialized "application-specific integrated circuits" (ASICs) designed and configured in silicon for the sole purpose of computing billions of SHA256 hashes in an attempt to "mine" a valid Bitcoin block. These chips have no legitimate applications outside of Bitcoin mining and password cracking, and the presence of these chips, which are thousands of times more efficient per dollar and kilowatt hour at computing hashes than generic CPUs, makes it impossible for ordinary users with generic CPU and GPU hardware to compete.
This dominance of specialized hardware has several detrimental effects:
Specialized hardware is so powerful because it includes many thousands of circuits specifically designed for computing the proof of work function, allowing the hardware to compute the function thousands of times in parallel. Memory hardness alleviates this problem by making the main limiting factor not CPU power, but memory. One can also make a modest improvement by targeting CPU clock speed, but the extent to which such optimizations can be made is fundamentally limited to a very low value by technological considerations. Thus, improvement through parallelization hits a roadblock: running ten memory-hard computations in parallel requires ten times as much memory. Specialized hardware menufacturers can certainly pack terabytes of memory into their devices, but the effect of this is mitigated by two factors. First, hobbyists can achieve the same effect by simply buying many off-the-shelf memory cards. Second, memory is much more expensive to produce (if measured in laptop equivalents) than SHA256 hashing chips; the RAM used in ordinary computers is already essentially optimal.
Essentially, the Dagger algorithm works by creating a directed acyclic graph (the technical term for a tree where each node is allowed to have multiple parents) with ten levels including the root and a total of 225 - 1 values. In levels 1 through 8, the value of each node depends on three nodes in the level above it, and the number of nodes in each level is eight times larger than in the previous. In level 9, the value of each node depends on 16 of its parents, and the level is only twice as large as the previous; the purpose of this is to make the natural time-memory tradeoff attack be artificially costly to implement at the first level, so that it would not be a viable strategy to implement any time-memory tradeoff optimizations at all. Finally, the algorithm uses the underlying data, combined with a nonce, to pseudorandomly select eight bottom-level nodes in the graph, and computes the hash of all of these nodes put together. If the miner finds a nonce such that this resulting hash is below 2256 divided by the difficulty parameter, the result is a valid proof of work.
Let D
be the underlying data (eg. in Bitcoin's case the block header), N
be the nonce and ||
be the string concatenation operator (ie. 'foo' || 'bar' == 'foobar'
) . The entire code for the algorithm is as follows:
spread(L) = 16 if L == 9 else 3
node(D,xn,0,0) = D
node(D,xn,L,i) =
with m = spread(L)
p[k] = sha256(D || xn || L || i || k) mod 8^(L-1) for k in [0...m-1]
sha256(node(L-1,p[0]) || node(L-1,p[1]) ... || node(L-1,p[m-1]))
eval(D,N) =
with xn = floor(n / 2^26)
p[k] = sha256(D || xn || i || k) mod 8^8 * 2 for k in [0...3]
sha256(node(D,xn,9,p[0]) || node(D,xn,9,p[1]) ... || node(D,xn,9,p[3]))
Objective: find k such that eval(D,k) < 2^256 / diff
This algorithm provides a proof of work mining function with memory hardness properties that are not ideal, but that are nevertheless a massive improvement over anything available previously. It takes 512 MB to evaluate, 112 KB memory and 4078 hashes to verify, and even the tinest time-memory tradeoff is not worthwhile to implement because of the bottom-level branching adjustment. These parameters allow Dagger to be much more daring in its memory requirements than Primecoin or scrypt, asking for 512 MB of RAM for a single thread. Because the primary determinant of hardness is memory, and not computation, specialized hardware has only a tiny advantage; even an optimal Dagger mining ASIC would have little to offer over a hobbyist purchasing hundreds of gigabytes of memory cards off the shelf and plugging them into a medium-power GPU. And even in such an equilibrium, mining with ordinary CPUs will likely continue to be practical.
To compute the required four values in level 9, one must look at 64 randomly selected values in level 8. With negligibly less than 100% probability, these values will all be different. From there, suppose that you have computed V[n]
different values for level n
. You will need to make an expected 3 * V[n]
queries to level n-1
. Because the queries are pseudorandom, after these queries the probability that any individual cell will remain empty is (1 - (1/8^n)) ^ (3 * V[n])
. Hence, we can estimate that the expected value of V[n-1]
, the number of cells that will not be empty (ie. will need to be computed) is close to 8^n - 8^n * (1 - (1/8^n)) ^ (3 * V[n])
. We thus have a recursive computation:
V[8] = 64
V[7] = 8^7 - 8^7 * (1 - 1/8^7) ^ (64 * 3) = 191.99
V[6] = 8^6 - 8^6 * (1 - 1/8^6) ^ (191.99 * 3) = 575.34
V[5] = 8^5 - 8^5 * (1 - 1/8^5) ^ (575.34 * 3) = 1681.38
V[4] = 8^4 - 8^4 * (1 - 1/8^4) ^ (1681.38 * 3) = 2900.72
V[3] = 8^3 - 8^3 * (1 - 1/8^3) ^ (2900.72 * 3) = 512.00
V[2] = 8^2 - 8^2 * (1 - 1/8^2) ^ (512.00 * 3) = 64.00
V[1] = 8^1 - 8^1 * (1 - 1/8^1) ^ (64.00 * 3) = 8.00
V[0] = 8^0 - 8^0 * (1 - 1/8^0) ^ (8.00 * 3) = 1.00
V[0] + V[1] + ... + V[8] = 5998.43
This computation is not quite accurate, since EV(f(x)
is not the same as f(EV(x))
(EV = expected value) if f is not linear, as is the case here, but because we are not dealing with especially fat-tailed or skewed distributions the standard deviations are sufficiently low for this inaccuracy to have no substantial effect.