# Open Access

Digital finance in the new millennium was undoubtedly a game of centralized management. Given the strength of governmental regulation within financial markets, centralization proved a necessity for companies looking to develop internet-based currencies. Without central control, these companies would almost certainly be unable to comply with the various restrictions on what was essential digital banking. Centralized systems were also often significantly simpler than any decentralized alternatives. The server-client model that came to dominate internet applications shaped the structure of the internet itself, further enshrining the concept of a central service provider.

The internet was clearly evolving from its strange origins, a world once populated by millions of pages on GeoCities, into a more rigid environment with a largely top-down structure. Any remaining glimmers of hope for a primarily peer-to-peer internet system were quickly beginning to fade. Some individuals, many of whom had witnessed this rapid shift, turned their attention away from the internet base-layer and instead to decentralization on the growing application layer. Once again, digital money systems became a prime target for ideologically-driven, and typically computer-savvy, researchers.

An ideal digital currency, through the lens of decentralization theory, was one in which control over the system was shared by a vast group of diverse individuals. Diversity, both culturally and geographically, would hopefully limit the ability of any one entity or interest group to exert undue influence over the financial activities of others. Such a system would have to be open and accessible in both the currency itself and the social process by which the currency was governed. People and computers all around the world would need to be given the chance to participate in this shared financial model.

Conveniently, the field of computer science had already developed protocols that allowed individuals on a network to enjoy joint control over a shared system. Of course, these are the fault-tolerant protocols described thoroughly in the previous chapter. Leslie Lamport and countless others demonstrated that these protocols could be used to represent a wide class of applications, even an entire state machine, so it was no stretch of the imagination that the same principle could be applied to a digital currency. Byzantine fault tolerance could even provide robustness against actively malicious actors that might desire to manipulate or damage the system.

Unfortunately, existing protocols did not, in the eyes of the decentralization camp, provide the level of openness necessary for a global digital financial system. Most of this research was focused on environments in which participants were known and identified at launch. Organizations making use of these protocols typically either ran all of the nodes themselves or coordinated with a small group of predetermined external parties. Simply put, there was little demand for systems in which nodes could join or leave the consensus process at any time and for any reason.

As fault-tolerant protocols relied on a known set of participants, they could give explicit levels of weight to the opinion of each node in the network. Networks most often simply assigned equal weight to every node in the network, but a network could theoretically assign additional weight to nodes deemed to be more "trustworthy." Opinions and their weights could then be used to inform the decisions of other nodes during the voting processes that were critical to achieving fault tolerance. Without a known set of participants, nodes essentially had no way to determine how to act on the messages they received.

If one were to develop an open system in which all nodes were given equal voting power, they would quickly find their network in shambles. Any would-be attacker could launch countless nodes in an effort to manipulate the system. Since each of these nodes would be given equal voting power, an attacker could quickly overwhelm the system with a massive number of votes that, in fact, are being generated by a single person or entity. This "sybil" attack, as it came to be called, broke existing byzantine fault tolerant protocols because the 1/3 cap on malicious nodes was meaningless when an attacker had the ability to introduce an unlimited number of nodes into the system.

NOTE

Above paragraph isn't entirely correct, but I didn't feel like going into a ton of detail when I wrote it. I'll update it when I do a second pass.

With explicitly identified nodes out of the question, decentralized financial systems needed some alternative form of "identification." Ideally, anyone could join the system but be somehow limited in their influence over the system. Essentially, this model required the existence of some "proxy" that could link individuals to specific nodes on the network. A potential solution to this dilemma was eventually discovered in the seemingly unrelated field of email spam prevention.

Although this may seem surprising, email spam prevention actually faced many of the same problems related to individual identity. Since emails are easily sent, malicious actors could swamp user inboxes with countless emails and with little cost. Furthermore, since email addresses are relatively anonymous and easily generated, spam prevention software couldn't simply block addresses as new spam emails came into an inbox. Text analysis of the contents of every email might catch nefarious messages, but it can't identify the person or entity behind the anonymous addresses.

Researchers looking into this problem found an interesting solution. Spam couldn't be entirely prevented, but it could be throttled. If users were required to pay for each email they sent, then spam would become increasingly costly. One 2014 report places the average number of spam emails sent per day at 54 billion messages. A charge of even a single US cent per email would cost the global spam industry upwards of 200 billion USD annually. It's undoubtedly clear that the number of spam emails would drop under such a model.

Although this system sounds ideal in theory, it still requires the existence of some gateway through which users can make the necessary payment for each email. Any form of online payment associated with traditional finance would force email users to register accounts with old-world institutions, a concept that flew in the face of openly accessible email. Instead, researchers found that users could make "payment" in the form of an expenditure of electricity. Users could generate an email "stamp" by running a program that took, on average, a small fixed amount of time to execute. Stamp generation and verification was designed so that it was essentially impossible to decrease execution time. Even a short computation time on the order of a single second per email would massively throttle spam while leaving "real" users mostly unaffected.

Adam Back's HashCash is likely the most widely recognized of these constructions today. Crucially, HashCash popularized the concept of using hashing algorithms as a mechanism for proving that a certain amount of computational effort had been expended. This mechanism, which we often now call "Proof-of-Work," became a keystone of the peer-to-peer digital currencies to come. Proof-of-Work's importance is so crucial that it's necessary to fully understand its basic operation before we continue with our analysis of its eventual use.

TODO

Transfer hash explained here from older draft.

Cryptographic hash functions are designed to be highly resistant to collisions, cases in which two different inputs produce the same output value. SHA256, a commonly used hash function, effectively renders hash collisions infeasible under the current state of computer processing power. Barring any discoveries of flaws in the algorithm, it would take an incomprehensible amount of time to find such a collision even with the processing power of the entire world at your fingertips. However, it is possible to find "partial" hash collisions in which two inputs share some small number of bytes.

We can demonstrate a simple partial collision under SHA256 quite easily. In the following example, two different inputs produce hashes that share the same first few bytes:

Partial hash collisions have no significant impact on the security of a hash function. We find them useful, however, because the amount of computational effort required to find a collision depends on the number of bytes we're aiming to share between the two resulting hashes. Cryptographic hash functions are constructed to be almost entirely random, such that it's infeasible to "guess" the hash of an input without actually running it through the algorithm. From this property, we can determine the probability of finding a particular collision with some light mathematics.

Each byte of the hash output has 256 possible values, represented in hexadecimal as 0x00 to 0xFF. The probability of finding an input with prefix 00 (or any other specific byte prefix), as the first values of the hash is therefore one in 256, meaning it takes 256 hash attempts on average to find such an input. Similarly, the probability of finding an input with 0000 as its first two bytes is 1/256 * 1/256 = 1/65536, a total of 65,536 attempts on average. For each additional byte we want to match, the number of hash attempts required is multiplied by 256. Clearly, the number of required attempts grows extremely quickly as the number of bytes to match increases.

Proof-of-Work asks a user to find this sort of partial collision with a given number of matching bytes. By adjusting the number of bytes to match, we can also adjust the required attempts and, as a result, the total computational power and electricity expenditure necessary to find a valid partial collision. Returning to our email use-case, one could pick the number of required matching bytes such that the associated electricity cost of computing such a collision is roughly equivalent to one US cent.

We can further fine-tune the number of required hashes by expanding the list of acceptable matches. For instance, one could specify that the sender should find a collision where the first byte is equal to 0x00 and the second byte is anything less than 0x80 Since this now permits 128 possible values for the second byte (0x00 through 0x80), or half of the total possible values for that byte, the odds of finding such a collision become 1/256 * 1/128 = 1/32768. Note that this is smaller than the 1/65536 chance for an exact two-byte match.

Proof-of-Work is valuable in the field of email spam prevention, but it also provides us with an interesting primitive. Each collision gives us information about the resources expended in order to generate the value. Next, we'll explore how this primitive was applied to digital finance systems.