WOMAN: The following contentis provided under a Creative Commons license. Your support will helpMIT Open Courseware continue to offer high qualityeducational resources for free. To make a donation or toview additional materials from hundreds ofMIT courses, visit email@example.com. NEHA NARULA: OK, solet’s get started. OK? So great. We’re here to talkabout cryptocurrency engineering and design. I think the first question thatcomes up that’s very obvious is what is a cryptocurrency? So this word was kind ofinvented 10 years ago when– I don’t know how many of youknow the origin story of where bitcoin came from, but basicallya pseudonym on the internet dropped a paper andsome open source code in a forum on an emaillist, and said, hey, I have this idea for thisthing called bitcoin.It’s kind of likeelectronic cash. Here’s how I thinkit could work, and here is some code if youwant to run it and become part of this peer-to-peer network. We don’t know whothis person is. This person has basicallyvirtually disappeared from the internetand from the world. But it’s created somethingthat has captured so many people’s imaginationsand has sort of, depending on how you measureit, created billions and billions of dollarsof economic value and inspired a lot ofpeople to think about how to use this technology to solvea myriad of different problems, not just electronic payments. So cryptocurrencies andthe technology behind them are inspiring people to thinkabout how to bank the unbanked, add more auditability andtraceability to our world, get rid of trustedintermediaries and institutions in certain situations,and basically solve every problem, if you readabout what blockchains can do on the internet.Now that’s not exactlywhat this class is about. This class is not goingto be about applications. This class is going to be abouttechnology and infrastructure. You’re going to learn how tocreate a cryptocurrency, what goes inside a cryptocurrency,what’s important, what are the techniques. And what application you chooseto apply that to down the line, that’s kind of up to you.But we’re not going to be doingdigital identity or health care records or something like that. We’re going to be talkingabout the technology. So a big question ishow are cryptocurrencies different fromregular currencies? And another thing that Iwant to make really clear is that the terms in thisspace are still being defined. So you will hearpeople throw around all sorts of terms–cryptocurrency, blockchain, consensus. And these words kind of havefloating, evolving meanings right now. Part of that is because bitcoin,the first cryptocurrency, didn’t come from academia,as far as we know.It came from a community ofenthusiasts on the internet. And so it doesn’t necessarilyhave the same basis and rigor that we might expect frommost of our academic fields of study. It’s totally OK. We’re figuring itout as we go along. And academia is reallyembracing this topic. So if any of you aregraduate students who are looking for an areain which to do research, I think basically,the number of papers published on cryptocurrenciesand blockchain technology in respected academic venuesis doubling every year. So there’s hugeopportunity here. So cryptocurrencies arenot regular currencies. They’re not $1.00 ora pound or a euro, what we normallythink of as currency. They’re something different. Bitcoin was sort ofcreated out of nowhere. And what does it mean tocreate a cryptocurrency? Who says you can createa cryptocurrency? What backs a cryptocurrency? Why is it valuable? Well, first, before weanswer that question, I just want to makeit really clear what this course is not about, OK? We are not goingto help you ICO.If you are interestedin ICO’ing, just go. That’s not what this classis going to be about. We are not going tooffer any trading advice. We have zero opinions onwhether you should buy bitcoin now or sell orwhatever, or zen cash, or whatever allthese things are. So none of that. Don’t even ask us. We’re not interested. And this class isnot really going to be about permissionedblockchains either. Now you might not knowwhat this term means yet, and that’s totallyOK, but I just want to make it clear thatwhat we’re talking about here are cryptocurrencies. They’re open permission withsystems in which there is a token which has some value. So that’s what we’re notgoing to do in the class. So going back to– and let me just pausethere for a moment. Let me pause and ask you ifthere are any questions so far about what I’ve said. Yeah. AUDIENCE: Do they alwayshave to have value? NEHA NARULA: No, not at all. And let’s startto get into that.So the question was do tokensalways have to have value? So I think, really,to understand what are cryptocurrencies, whatare tokens, what do they mean, we have to talk about money. And we have to talk about whatmoney is and what it means. So this is going tobe very hand-wavy and I’m sure not very satisfyingto a real monetary economist. But money developed– thereare a few different theories about how money developed. There is this thing calledthe coincidence of wants. So maybe I have a sheepand Tadge has some wheat. I am hungry and wouldlike to make bread. Tadge would reallylike to make a sweater. And so we canbarter, we can trade. I have one set of goodsthat is useful to Tadge. Tadge has another set ofgoods that are useful to me. We can get togetherand make an exchange. So that’s fantastic. Barter is incredibly important.Barter has existedfor a long time. But what if Tadge doesn’t havewheat, Tadge has vegetables, and I don’t want vegetables. I want wheat. But Tadge still wantsthe wool from the sheep. How do we execute this trade? We don’t have acoincidence of wants. We don’t actually want the exactsame thing from each other. So some theories are that moneyevolved out of this problem. And money can be representedin so many different ways. Money, I think, was firstcreated around 5000 BC, so it’s really,really, really old. The things thatrepresented money usually had certain properties. They were rare. They were noteasily reproducible. People, at times, used thingslike shells or beads for money.The first coins– this islike a really interesting coin that was developed. Precious metals wereoften used for money. And then eventuallywe sort of evolved into what we thinkof as money now, which is paper bills, currency. Another theory ofhow money came about is this idea of receipts,debt and credit. So maybe I have a sheep,and I shear all of my sheep and collect a lot of wool. What I can do is I canstore that wool somewhere. And I can get areceipt from someone from having stored that wool,and that receipt is of value. It entitles the person who holdsthe receipt to the good that is being stored. And so anothertheory of money is that money evolvedout of these receipts, trading these receiptsback and forth. Instead of taking allthat wool with you, you leave it in oneplace in a depository, and the receipt actsas a bearer instrument.Whoever owns it has access tothe wool in the depository. And so you can kind ofsee two different ideas about what money isdevelop from this. One is, well, it’sa bead or a coin, or something thatI hold, something physical that we’vedecided to assign value to in and of itself. And another idea is I’m goingto use a trusted institution. I’m going to deposit somethingwith that institution, and they are going to ensurethe validity of that deposit and manage who hasaccess to that deposit. So this doesn’t reallyget at the question that was originally asked, whichis why do tokens have value. But one thing I wantto point out is– well, a question I wantto ask you guys, actually, is why do thesethings have value? Does anyone have any ideas? Yes. AUDIENCE: Because everyoneagrees that they do. NEHA NARULA: Becauseeveryone agrees that they do. Any other thoughts on whythose things have value? Yeah. AUDIENCE: They’re alsobacked by institutions like the government. NEHA NARULA: They’rebacked by institutions.Say a little bitmore about that. What does that mean? AUDIENCE: So thegovernment’s kind of promising you torespect the value of that. NEHA NARULA: OK. The government’s promisingto respect the value of that. Does anyone want to add tothat or have another reason? Yes. And say your name. I’m sorry, yeah. AUDIENCE: Jared Thompson. In the example of thedollar, the government is willing to accept itas payment for taxes. NEHA NARULA: Payment for taxes. OK. So that kind of connectsthe government thing. AUDIENCE: Even if it hadno value of any other sort, it has value in that sense. It’s the last thingthat holds up its value.NEHA NARULA: OK, great. Anybody else? Yes. AUDIENCE: I’m Paul. I think those the three onthe front of the dollar, those have inherentvalue because they might be more rare. NEHA NARULA: They havevalue because they’re rare. OK. Interesting. All right. So those are all reallyinteresting ideas. I think that those areall sort of properties of what makes things valuable. There are definitely things thatare rare that are not valuable, right? I can think of some things thatmight be extraordinarily rare. There’s only one or twoof them in the universe, and you would have no interestin owning them whatsoever. You wouldn’t assignvalue to them. Certainly it’s really importantthat you can pay taxes with this stuff because taxesis pretty much a requirement of living in any country.There are things that havevalue that you don’t necessarily use for taxes. So that’s a little confusing. And then there’s thisidea that it’s backed, that it’s backed by something. And the dollar used tobe backed by something. And actually, ifyou look at $1.00, I think it stillsays this, right? It’s backed by the full faithand credit of the United States government. TADGE DRYJA: Theydon’t say that anymore. NEHA NARULA: Theydon’t say that anymore? They used to say that.But that’s what a lot ofpeople say about money. It’s backed by the full faithand credit of the United States government. What does that really mean? I think what itall goes back to is these things arevaluable because we think they’re valuable. We’ve all decidedthey’re valuable. And you know that if you have a$1.00 bill and you want to buy something from someone,they’re going to take it, that you can make that exchange. And the reason thatthey’re going to take it is because they knowthat someone else is going to take it.These things hold value becausewe think that they hold value. It’s a collectivestory that we all tell. So I think once youlook at money that way, then when you start to look attokens, which are essentially digital representationsof these things, things that are rare and alittle bit special, then when you ask, well, whydoes this token have value, because we think it has value. So what makes a tokeninherently valuable? The fact that wethink it’s valuable. And a lot of differentthings can go into that. Maybe we think it’s valuablebecause it’s very rare. Or maybe we think it’svaluable because someone’s promised that youcan use it to pay for storage, like with Dropbox. Or maybe we think it’s valuablefor a completely different reason, becausewe like the name, or we like the people whoare running the network.But ultimatelytokens are valuable. These digitalrepresentations are valuable because wethink they’re valuable. Yes. AUDIENCE: And also becausethey’re a limited amount. NEHA NARULA: Name. AUDIENCE: [INAUDIBLE]. Because they’rea limited amount. NEHA NARULA: Well,so my argument is that the factthat they’re limited is something that goesinto our perception that makes it valuable. Great. OK. So now that we’ve learneda little bit about money, talked a littlebit about money, I want to go into how paymentswork because ultimately, we’re going to get tocryptocurrencies. And cryptocurrenciesare electronic cash. So here’s the way that digitalpayments kind of work right now. You have an institutioncalled a bank. You have Alice and youhave Bob, and Alice and Bob have accounts at this bank. And so the bank is keepingtrack of who owns what. And these are these are records. These might be digital records. They might be paperrecords, whatever the bank is usingto keep track of who has what in their account. And so the way that I’ve setup this example right now, Alice and Bob bothhave bank accounts.Alice has $10.00 with the bankand Bob does not have any money with the bank. So let’s say thatAlice wants to pay Bob. Let’s say that Alice andBob have gotten together. Maybe they’re in thesame coffee shop. And Alice wants to buya sandwich from Bob. And Bob says, OK, youneed to pay me $1.00. If you give me $1.00, thenI’ll give you the sandwich. So how can Alice do this? How can she transfer$1.00 to Bob? Well, if she had a paperdollar, she could just do that. But let’s say that shedoesn’t have a paper dollar. So Alice can ask the bank tomake this transfer for her–or $5.00. So Alice sends amessage to the bank and authenticates withthe bank to show the bank that she is, in fact,Alice, but I’m not going to go into thedetails on how that works. And then the bank confirmsthat, makes the transfer in its ledger, says Alice now has$5.00 and Bob now has $5.00.Alice tells Bob,hey, I did this. I talked to the bank. Go check. You can verify it for yourself. Bob checks with the bankand sees, yes, in fact, the bank is sayingthat he has $5.00 now, whereas before he had zero. And then Bob gives Alice thesandwich because he believes that he now has $5.00. And the bank sort ofpreserved the property that money was not created outof nowhere, that the balance was ultimately maintained. So the bank is veryimportant in this scenario. The bank is critical. This is how digitalpayments work. Credit cards, Venmo,banks, kind of all sort of based on the same idea,that there’s some trusted institution that ishandling that payment for us and that is keepingtrack of everything.Now what are the pros andcons of this scenario? Anyone want tothrow a couple out? Yeah. AUDIENCE: The bankcan get hacked and people could move moneyaround between the accounts. NEHA NARULA: Right. So we’re putting a lotof trust in this bank. And maybe shouldwe trust the bank? Banks fail sometimes. Banks are hacked. Banks have humanswho are running them who occasionallymight want to change those balances in their favor.This has all happened. Anything else? Yeah. And say your name. AUDIENCE: Brittany. If it’s urgent, sometimesyou might run into a delay or it might take timewith the process. NEHA NARULA: Yeah. Alice has to talk to the banks,and that’s kind of annoying. So there’s that. Anything else? Yeah. AUDIENCE: And ifeveryone can actually withdraw at the same time,then the bank can actually get money into the system. NEHA NARULA: So OK. So this is getting a littlebit more advanced here. What if everyone takes theirbalances out at the same time? Well, we need to make surethat the bank actually has that money, so to speak. We’re not going to be talkingabout that problem right now. But very good problem. So to kind of talkthrough some of the pros and cons of this situation,one of the big pros, I think, is, that even ifAlice and Bob are not in the same physical location,Alice can still pay Bob if they can talk to the bank.So it’s pretty cool,and that’s something you can’t do with dollar billsor with coins or with bars of gold. So having thistrusted institution that you can communicatewith electronically means that Alice andBob could be halfway around the world from eachother and they can still pay each other. So that’s pretty awesome, andthat is definitely a property that we want to have.In terms of cons, I think wecovered quite a few of them, which is we’re reallyputting this bank kind of in the middleof everything here. And there are a few differentways that can cause us trouble. So the bank needs to be onlineduring every transaction. If the bank is offline,then how does Bob know whether he got paid or not? The bank could fail atsome point in time, which is kind of related to that. The bank could simplydecide that they don’t want to do this anymoreand can block transactions. And then privacy. The bank has kind ofinsight into everyone and their payments. And this is incrediblysensitive information. Payments are quite important. And we’re going to betalking about privacy a lot in this class, duringthe second half of this class.So just an example, a coupleof visual examples of that. The bank could justtotally go away, and then what happensto that ledger? Who knows, right? I mean, literally, itcould just disappear. Maybe it’s paperand it gets burnt, or maybe it’s bits on a computerand it wasn’t replicated. The bank could decidethat they don’t like Alice for somereason, and that they don’t feel like processingAlice’s transactions.This happens all thetime in the real world. So there have been designsfor electronic cash that work a littlebit differently. And we’re goingto kind of step up to the design that cameright before bitcoin, and we’re going todo that iteratively. So let’s talk about e-cashand how e-cash works. So the way thate-cash works is Alice tells the bank–instead of saying, hey, bank, do thistransfer for me, Alice says, hey, I wouldlike a digital representation of a coin.Can you give mesomething that is digital so I don’t have to be in thesame physical place as you, and that I can use in such away that I can prove to someone else that I have this thing andthat I haven’t double spent it, because that’s the problemwith digital representations of coins. A fundamental problem isthat bits can be copied. So whatever system you use todesign your electronic cash, you need to make surethat people can’t just copy coins and give what is thesame coin to multiple people. In the previousexample, the bank was making sure this happened. The bank wasmaintaining balances and debiting Alice’s accountand crediting Bob’s account. But if we want to think aboutsomething that doesn’t involve the bank, and we’restarting to get there, then we need to think about howto ensure that a coin can’t be what is known as double spent.So Alice asks the bank for coin. And maybe she has an accountwith a bank like before. Or maybe she gives the bankteller actual physical money in order to getone of these coins. So the bank generatesa unique number– SN stands for serial number– and decides that this isthe digital representation of the coin. The bank then gives thatcoin to Alice in a way that it’s clear thatthe bank did this. Usually this is doneusing a technique called digital signatures. We’re going to get tothat as class progresses, but not right now. Once Alice has this coin,then she can give it to Bob. And Bob can take alook at this coin, and hopefully there’s enoughgoing on with this coin that Bob can be convincedthat this is a real coin.Alice didn’t make itup out of nowhere. She actually had the funds,so to speak, to give to Bob, and that it hasn’tbeen double spent. And once Bob isconvinced of that, he can give Alice the sandwich. Now in traditional e-cash,the way that this is done is Bob actually goesback to the bank and says, here’s this coin. Alice just gave me this coin. Is this an OK coin? But the fact of the matter isthat the bank, in this case, has a serial numberand knows that it gave that uniqueserial number to Alice, and then Bob is showingup with a coin that is that serial number. And what the bank is doinghere, in this example, is the way that thebank checks to make sure that this coin is correct isit looks at the serial number, and it makes sure that ithasn’t been spent before. So the bank can link thecoin between Alice and Bob, which is unfortunate.The also still sortof has to be online, not to do the actual paymentbetween Alice and Bob, but in order for Bobto have confidence that this coin is real. And later on Bob can say, Iwould like $1.00 for this coin that I’ve just given you,or something like that. Or Bob can have anaccount with the bank and can maintaina balance there. So just to go through someof the pros and cons here, OK, we’ve kind of donesomething where the bank’s not in the middle, except the bankis still really in the middle. We’re getting a stepcloser, but we’re not there.Alice can technically giveBob this electronic thing that represents value,but Bob still needs to talk to the bankto make sure it’s real and it hasn’t been double spent. And we still have this problemwhere the bank is the one who’s minting these things. The bank can decidenot to give Alice a coin if it feels like it. And we still havethis privacy problem because the secret number,the serial number that we invent for the coin, can belinked across these payments. So there’s thisnotion of something called Chaumian e-cash. So David Chaumianis a cryptographer, and he developedthis system which has slightly nicer propertiesthan previous forms of e-cash.So the idea here,which is really key, is instead of the bankchoosing the secret number, Alice chooses the secret number. And we have ways ofgenerating random numbers that we can be fairlysure are unique. So we can let everybody generatetheir own random numbers. So in Chaumian e-cash, Alicechooses the secret number that represents a coin. And then Aliceblinds her message. So Alice adds some randomnessto the secret number such that the bank doesn’t knowwhat that number actually is. And we’ll get into more detailabout exactly what that means. It’s all in the paperthat was assigned reading for thisclass, so make sure that you take a look at it. So when the bank verifies thatthe secret number is a real secret number andit’s really a coin, and Alice gave the bank$1.00 or something like that, the bank does so on theblinded secret number. And Alice actuallyhas the ability to remove that randomness,or that blinding, later and end up with a validsignature on a secret number.So Alice does the samething that she did before. She gives Bob a representationof that electronic coin. And when Bob redeems it,note that the bank never sees what the number is,so when Bob redeems it, the bank has no way of linkingthe payment between Alice and Bob. So just to get into howthis works visually, Alice will talk tothe bank, and Alice will use a blinding factoron the secret number. And so when Alicetalked to the bank, the bank doesn’t actually seewhat the secret number is. They can’t decode it. Again, Alice gives $1.00 orsomething like that to get this coin from the bank.And the bank signs this. Alice can remove theblinding factor later. And this is what the coin is. The coin is a validbank signature on the secret number, andalso the number itself, which Alice canthen send to Bob. Bob can check and make surethat this is a valid signature from the bank. And if that’s correct, thenBob can give Alice a sandwich. In order to redeem this, Bobgives this coin to the bank. The bank says, OK, I’ve neverseen the secret number before, and you have my signature on it. So I’m going to assume thatI went through this process with somebody andsigned something.And now I’m going torecord that secret number. Once that happens,Bob can be sure that this coin hasn’tbeen spent before. The bank keeps a running listof all the secret numbers it’s seen, and it makes surethat if it ever sees one again, it can say no, thisis not correct. I should never see a secretnumber more than once. Now, OK. But know what about Alicecould give one version of that to Bob. Alice could also give aversion of that to Charlie. And how are Charlieand Bob supposed to know whose coin is correct? Because remember, we wantedto try to get the bank out of the way when doing this. And so in Chaumiane-cash, the way that this works isthe bank actually keeps a bit more information. And the informationthat the bank is keeping won’t let the bank linkthese transactions together unless Alice happens togive this to two people. And so if Alice gives the samecoin to two different people, the bank will beable to detect it and the bank will be ableto know it was Alice.And so this is kind ofa motivator for Alice not to do that. So the idea being here is thatthe way that we get around the fact that we don’t know ifa coin has been double spent or not is we add punishmentif the coin is double spent. So Bob doesn’t know for surethat this coin he receives hasn’t been double spent, buthe does know that if it was, someone’s going toknow it was Alice, and they’re going to punish her.So this is a prettyclever scheme. And this actually gets usaround a lot of problems. We have digital payments. We can make the actual transferwithout the bank in the middle. We have some privacy nowbecause the bank can’t link transactions together. And we have this way ofdoing double spend detection. We have a way ofmotivating people not to double spend their coins,which means that you probably don’t have to checkat the time you receive a coin whether ornot it’s been double spent. Of course, this still suffersfrom a really big problem, which is that a bank canstill decide that they just don’t want to do this with you. They can just decidethat they don’t want to play this game with you. They don’t want to issue coins.Maybe they don’t likeyou specifically. Maybe they don’t want to takeyour coins and exchange them. So this scheme, Chaumian e-cash,solves quite a bit of problems when it comes to how dowe have electronic money with some nice features,but it doesn’t quite get to all of them. And so the realquestion in this class is how do we doelectronic money, really, in a peer-to-peerway, where there’s no institution in the way. There’s no sort ofentity that can say no. TADGE DRYJA: So e-cash, themath is really interesting. It kept relying on thesebanks and so it never quite got off the ground. So I’m willing to talk aboutsomewhat more abstract and low level primitives.I’m not going to quite get intocash or tokens or transfers or anything this lecture. But I’m going to talk aboutthe really basic primitives that you need that wealready sort of mentioned, hash functions and signatures. Signatures, obviously, wetalked about a little bit, what you need to be ableto sign messages in order to send these tokens around. But first I’ll talk about hashfunctions, which are basically the most fundamental basicthing we use in these systems. And I think if you’veused computers, or if you’veprogrammed a little, you probably havesome familiarity with hash functions. They’re simple, but they’reactually extremely powerful. The hash functionis basically you have some data,a bunch of bytes, a bunch of ones and zeros. You run it througha hash function and you get an output that’salso a bunch of ones and zeros. Generally, the inputdata can be of any size. You can hash something– put in a megabyte,put in a gigabyte, or put in a single byte,and generally the output is of a fixed size.So in the case ofbitcoin, we use Sha-256. The output size is 32 byteslong, or 256 bits long. And this is used for lotsof things in computers. I guess the reason they callit a hash is because it’s like when you take thepotatoes and chop them up into little squares andgrill them for breakfast, it’s sort of that idea,that we’re taking this data. And the data going in getschopped up and smushed around and then comes outinto an output. So this is not a sufficientdifferent definition. But I will say that youcan sort of do everything with hash functions. There’s some fun thingsthat you can’t do, but you could make acryptocurrency only using a single hash function. And I think people have, sortof for experimental reasons. You limit the fun stuff you cando, but you can do signatures. You can do encryption. You can do all sortsof things like that. OK. So this is not asufficient definition, that there’s any sizeinput, a fixed size output, and the output israndom-looking.That’s sort of wishy-washy. But what doesrandom-looking mean? It’s not actually random. If you put in thesame input, everyone will get the same output. So if you say, OK, well,what’s the hash of one, you’ll get some output. And if someone else says,OK, what’s the hash of one, you’ll get the same thing. However, the output,while it is deterministic, it’s sort of highentropy in that the output should have about asmany as one bits as zero bits. If you take thehash of one, it’s just going to look likea big random number. And the hash of two will looklike a completely unrelated random number. The outputs look like noise. So if you’ve everseen hash functions, you can run it on your computer. You say echo. Hello, pipe Sha-256sum, and you’ll just get some kind ofcrazy, random thing.There doesn’t seem to beany order to the outputs. A little bit more well-defined. We usually talk aboutthe avalanche effect, in that changing asingle bit in the input should change about halfthe bits of the output. So even though you haveextremely similar inputs, they should be completelydissimilar outputs– well, completelydissimilar, as in about half the output changed. If every bitchanges, then it just is the inverse of what you had,and so it’s easily correlated. But the avalancheeffect is sort of how hash functions are constructed,where generally they’re iterative rounds. And so you say, OK, I’mgoing to swap these things and multiply these things andshift these bits around such that if any changein the beginning will sort of propagatean avalanche, too, so that all the output bitshave been affected by it. OK. And a little bitmore well-defined. Generally, the hashfunctions are defined by what they should not do. So the three main thingsthey should have– preimage resistance,second preimage resistance, which I’ll sort of skip over,and collision resistance.And we can definewhat these things are. So a preimage is the thingthat came before the output. So it’s sort of a math-y term. But the idea is OK, if you knowy, you can’t find any x such that the hash ofx is equal to y. So if I give you a hash output,and that’s all I give you, you should not be ableto find an input that leads to that output.So if I just say, hey,here’s a hash output. It’s 35021FF– whatever,some long string, you won’t be ableto figure out what I used to put in to get that. Of course, you canfind it eventually. For any given y,there’s probably some x. In fact, there’s probablya lot of x’s that will lead to that y. Since y is a fixedlength and there’s two to the 256 possibley’s, but there’s an infinite number ofx’s because x is not bounded in length. You can have a megabyte or agigabyte or a terabyte size x. So since there are sort ofinfinite numbers of x’s, and a fixed, though verylarge number of y’s, as long as it is a randommapping, there will be lots of different x’sthat can lead to this y. And so you shouldbe able to find it.It’s just impractical. It’s like, yeah, youmay be able to find it, but it’s going to takeyou two to the 256 tries to find any specific y value. And that’s about10 to the 78, which is a number that’s big enoughthat you can sort of round it up to infinity. Well, I mean, notquite, but big enough that you’re not goingto be able to compute that, the sun’ll burnoutand the universe’ll die and stuff like that. So that’s preimage resistance. You can’t go backwards. Given the hash, you can’tfind what led to that. OK. Any questions aboutpreimage resistance? Seems reasonable? It’s a little interestingin that given y is a little tricky, andthat it’s like, OK, well, someone might know x in orderfor them to have computed y.Or maybe it’s justcompletely random, and no one actuallyknows what the x is. So there’s a sort ofloss of information in the idea of a preimage stack. OK. Second preimage resistance. This one’s a littletrickier and can get messy. So I’ll define it, but wewon’t go into it too much. The idea is given x andy such that the hash of x is equal to y, you can’t findx prime where x prime is not equal to x. And the hash of xprime is equal to y. So we’re sort ofgiving you a preimage. We’re saying, hey, here’sthis number x and here’s this result y. I bet you can’t findanother x that leads to it. This one is actually poorlydefined in the literature. And so it’s a littlelike, well, who made x, and who gets tochoose, and is it any x prime and things like that.So it’s not actuallythat useful. So we can just sort ofgloss over that one, just sort of mentioning it. And then the other one that’svery important is collision resistance, where the idea isthat nobody can find any x,z pair such that xis not equal to z, but the hash of x isequal to the hash of z. And this one’s alot cleaner in that there’s no lack of information. There’s no secrets or anything. It’s just like, look,no one can find this. And so it’s reallyeasy to disprove. You can just say, hey, look,here’s an x and here’s a z. Try hashing them. Oh, shoot, the hashes are equal. And it doesn’t reallymatter how you got these or who’s doing it.So that’s a really nice,easy, clear property. And again, you canfind this eventually. So if your outputsize is 256 bits long, you’ll be able tofind two inputs that map to the same output. In fact, you do notneed to try 256 times. I’m not going to gointo the details, but you actually onlyhave to try 128 times. Sorry, two to the 128. So you need to take the squareroot of the number of attempts in order to find this collisionbecause the intuitive reason is, well, you just starttrying things and keeping track of all their hashes.And there’s what’s calledthe birthday attack, which, as you keep trying them,there’s more possibilities. The next thing youtry, you can collide with any of these thingsyou’ve tried before. And so you actually onlyhave to do the square root. And it’s calledthe birthday attack because there’s the birthdayparadox, which is not really a paradox, but the ideais so in this room, there’s people thathave the same birthday. It’s almost certain,which seems kind of weird because the intuitive thing is,like, well, there’s 365 days a year. Maybe once you get 160,170 people in a room, you’re going to have twopeople with the same birthday.But actually, it’slike 22 or something– anyway, that it becomeslikely that people have the same birthday. So it’s kind ofcounterintuitive, and it applies inthis case as well. So to find a collision,you need the square root of the output space. But a hash function shouldnot have collisions. If you can find acollision, if any collision exists for thishash function, you can consider thehash function broken. It’s a little bit differentthan preimage resistance because it’s hard todefinitively prove that you’ve broken preimages. That’s something of aninteractive process where you say, hey, here’s a y,and then someone comes back with an x, andyou’re like, oh, OK, you prove to me thatyou can find preimages. But that’s hard to tellto the rest of the world because it was sortof interactive, whereas collisions are veryclear and non-interactive. You can just say, hey,here’s an x and here’s a z. Anyone can verify these. Didn’t really matterhow you got it. OK. So some practical, howdo these functions work. Practically speaking,the collision resistance is a harder property.So there are many functionswhere the collision resistance has been broken where thepreimage resistance has not been broken. So examples are Sha-1 and MD5. MD5’s a fairly old one writtenby Ron Rivest over at– well, I guess it wasn’tat the Stata Center because it was in the ’80s. But this was message digest 5. I guess there wereseveral before that. And that is quite broken. You shouldn’t use it.Its collision resistanceis trivially broken. You can find collisions in undera second on a modern computer. Sha-1 happened later, inthe late ’90s, I think, and NSA made it. And there have beencollisions found. I think there’s reallyonly one collision that’s been found, basically,by a team at Google and some Italianuniversity last year. And they spent a lot of computertime to find this collision.But they did find it. And then once you find one,it’s sort of like, oh, yeah, we really shouldn’tuse this anymore. But in both of thesecases, sha-1 and MD5, there’s no feasiblepreimage attack. So given a hash outputfor either of these, you can’t find whatthe input was, or you can’t find a different input. So generally, it’s a loteasier to make a function strong against preimages. Collisions is sort ofharder to deal with. Also, practically speaking, howdo these hash functions work? It’s a little bitof black magic. There’s no proofs that ahash function can even exist. So if you could prove thatthere is a one-way function, you get the Fields Medal, right? It’s like a milliondollar prize. So if you can proveit there is such a thing as a hashfunction, you will be a super famous mathematician.We have no idea that this iseven mathematically possible. Or maybe the universedoesn’t work this way. It seems to, though. It seems like thereare these things that work like hash functions, thatwork like one-way functions, but we have no proof of that. So even the most fundamentalpart that everything hinges on, we don’t even know if it exists. And then this is sortof closely related, if you’re in thecomputer science-y stuff, like p and mp– anyway, so we don’t knowthat these actually work. And also, in practice,hash functions are not nice math, coolthings like elliptic curves and RSA, prime numbersand stuff like that. They’re really, if you look atthe code, it’s sort of like, well, I’m going totake these bytes and I’m going to swap them. And then I’m going toadd these two numbers, and then I’m going torotate the bits over here, and then I’m going tox over these things.And then I’m goingto do that 50 times. And why 50? Well, it seems like50 is a good number. It’s not too slow. No, really. It’s sort of black magic,Sha-256 uses 64 rounds. Nice even number. Different functions likeBlake 2B uses 20 rounds. But then there’s alsoa version that uses 12 rounds, which is faster. And people think, well, it’sstill seems quite secure. But if you want tobe really secure, use the 20-round variant. If you want to beprobably secure enough, use the 12-round variant. So there’s no proofs. There’s heuristics and thingslike that, and best practices. But this kind of cryptographyis a little bit of black magic. And it’s not based on any coolmathematical number theory stuff, either, the waythat elliptic curve cryptography or RSA stuff is.So if you breakRSA, you can say, hey, I can now factorthese composite numbers very quickly, that’s,in and of itself, a cool mathematical discovery. The breaking of Sha-1,there’s not really any cool math insight. It was just like, yeah, wefound this fairly specific, weird path that we wereable to break Sha-1 after a couple ofyears of computer. So it’s cool, and somepeople are super into it. But it’s something of a niche toactually build hash functions. I would recommend not buildingyour own hash function. Yes. AUDIENCE: I’m Wayne,and my question is, is breaking a hash functionliterally just guess and check, or is there moreof a method to it? TADGE DRYJA: So no. If you say, hey, Ifound a collision by doing two tothe 128 attempts. One, nobody’s done twoto the 128 attempts. That’s still seen as likebeyond technology today. But if that’s how you breakthe function, that’s not really considered a break becausethat’s sort of the definition, is yeah, well, we knowthis is 256 bits long. So to find a preimage, if youdo two to the 256 attempts, you’ll find it.So that’s notconsidered a break. A break is considered, hey,I found a preimage in two to the 240 attempts. Or I have a proofthat you will be able to find a preimage intwo to the 240 attempts, and here’s how to do it. And that’s considered a break. It’s still impractical. Two to the 240’s stillimpossible in today’s technology. But if you had a paperand people looked at it, like, oh, yeah, that would work,you wouldn’t be able to do it. But that’s stillconsidered broken. And so something likeMD5, MD5 output size was 16 bytes or 128 bits. So collisions, evenif it were strong, it would still betoo short today that collisions would be ableto be found in two to the 64 iterations, which is doableon today’s computers. If you run a bunch of stuff onAWS, you can do two to the 64 in a couple of days.But that’s thedifferent definitions of breaking the function. Sort of fun. Ethan Hellman,who’s at BU and we work with, he– and we all brokethe IOTA wrote their own hash function, which is likesome cryptocurrency. And we found collisions in it. And it was kind of fun. But yeah, it was weird. It wasn’t like number theory. It was just like, oh, well,I wrote this Python script and we have this go script,and we tried this thing and we got a collision. So it was kind of fun. So usages.What do you usethese hashes for? There’s lots of cool thingsyou can use them for. use them sort of asnames or references, where instead of naminga file, you can just take the hash of a file. And that is a good, compactrepresentation so you can point to what you’re talking about . So the hash of a file isa unique representation. And if you changeany bit in that file, the hash will change. And so you know that, OK, here’sthis way to point to a file. You can also use it as sortof a reference or pointer in different algorithms. So you can say, anythingyou’re using pointers for, linked lists or maps andstuff like that, you can say, well, I’m going to usea hash as a pointer and then be able to sortthrough it that way.So anytime you think of pointersand graph theory and stuff like that in computerscience, think, well, could I use a hash functionhere instead of just like regular memory pointer? And in many cases, you can. In some cases, you can’t. So you can’t have cycles. So the idea is youcan’t find preimages, you won’t be able to find a– whereas you could make a cycleof pointers in a computer, where A points toB, B points to C, C points back toA.You shouldn’t be able to produce thatwith hash functions because having that cyclemeans, OK, well, somehow you found this preimage. But in many cases,you can do this. And another way to look at itis the hash is a commitment. You can say, well, I’m notgoing to tell you what x is, but I’ll tell you what y is,and I can reveal x later. And then, sinceeveryone remembers y, they can be sure that yeah,he’s revealing the right thing.There are no collisionsin this function, so we can be sure,if we’re presented with x, that this was the xthat was committed to yesterday. So I’ll give a little exampleof that, of commit and reveal. So you can commit to somekind of secret or something you want to reveal laterand reveal the preimage. So here’s my commitment. This is an actual hash, Sha-256. I just made it on my computer. And there is a string. There’s an Ascii stringthat maps into this, and it is a predictionabout the weather, but that’s all I’ll say.And given that informationand given this hash, you probably can’tfind my prediction. You can try to try all thesedifferent Ascii strings about the weather today,but I’ll reveal it. So I think it won’tsnow Wednesday. But I think itactually– anyway, and then I put this number in. And so if you put this inyour computer in Linux– I think in Mac it’s aslightly different command. It’s like Sha-2 or something. But in Linux, this willwork, and you can say, I think it won’t snow Wednesday. And then I put somerandom numbers here because if I had committedto just the phrase, I think it won’tsnow Wednesday, you might have beenable to guess that. You could say, well, he saidit was about the weather. I’m going to takeall sorts of millions of different stringsrelated to days and weather andcommon English words, and I’m going totry hashing them and see if I find a collision.And you might be able to. But I added this fourbytes of randomness at the end to makethat difficult. It doesn’t reallycontribute to my commitment. And you know this doesn’treally mean anything. But it makes it harder toguess what my input was because I’ve alreadyrevealed that it’s not a fully random input. So you might be ableto guess things. So I could say, hey,I’m going to make a prediction about theweather, commit to it, and then reveal myprediction tomorrow. And we’ll see if I was right. This can be usefulin the case where– not the weather, butin other things– if knowing my prediction couldinfluence the actual events, this would be a nice way tocommit to what my prediction is without everyone knowingwhat the prediction is and then revealingit the next day.Yes. AUDIENCE: What are the usecases for double hashing, like where you wouldhash that hash? TADGE DRYJA: Hashing this again? Well, so in bitcoin theyhash everything twice. Generally, you don’t need to. There’s no explanation forwhy they do that in bitcoin. You could. But there are thingsyou can construct where you can, say,append some extra data and then hash this again. So you can say, here’s myprediction for next week. And this is the hash,and then hash it again. So you can makechains of commitments and then revealiterations of it.Actually, I hadsome slides where you can sort of hashsomething again and again, and start revealingit incrementally. That might be useful. I actually have stufflike that in software. I’ve written where youwant to reveal secrets. But let’s say I wantto reveal secrets, but I don’t want everyone tohave to store all of them. So I can make a chain ofhashes, commit to the last one, and then as I revealsuccessive preimages, you don’t have tostore all of them.You can just storethe latest preimage, and you can reconstructall the hashes from that. Yes. AUDIENCE: But isit computationally difficult to run double hashes? TADGE DRYJA: So to evaluate– if you want to try this,it’s imperceptible. To perform oneSha-256 hash is, I don’t know, a billionthof a second or something. You can generally do,like, 100 megabytes to a gigabyte of hashoutput on a regular CPU. NEHA NARULA: Ithink she’s asking does it make it harder to finda preimage if you hash twice, and the answer’s no. TADGE DRYJA: Theanswer’s sort of no. It might. So I don’t know, chained MD5,can you still find collisions? I’m not sure. But generally the thinking is,if the hash function is broken, and you can either findcollisions or preimages, yeah, maybe it gets a littleharder by iterating it. But you shouldjust stop using it and use something that’s secure.But yeah, it seemsthat finding preimages would be harder since it’sessentially adding more rounds by hashing it twice. And then there are some attacks,so it’s fairly out there. But it’s called lengthextension attacks due to how hash functionsare constructed, where if you do say, OK,I’m going to take the hash and then take thehash of that, you do prevent certain types ofattacks that are fairly niche.But a lengthextinction attack in a Merkle-Damgard constructionwill be prevented by this. So generally, no. Generally, you don’tneed to do this. But there are differentconstructions where you’re going to hash a bunch of times. I don’t have the slideshere but, like a Merkle tree is a binary tree ofhashes where you’re taking the hashesof these things and then hashingit again and again, and that’s a reallyuseful data structure. And a blockchain isessentially a chain of hashes. And that’s what we’lltalk about next week.But yeah. OK. So I’m going to goa little faster. So that’s an interestinguse case where you can commit and reveal. And yeah, adding randomness soyou can’t guess the preimage. This is called a hash-basedmessage authentication code where part of it issecret and part of it is not. And this is getting towards asignature, where I’ve committed to something, andthen I reveal it, and everyone knows,yeah, that must be what he committed to the day before.It’s not quite a signature, butit’s getting to that direction. And so next I’m going totalk about signatures. What is a signature? It’s useful, and it’s amessage signed by someone. And so I’ll definewhat a signature is through thefunctions that it uses. There’s threefunctions will allow you to create asignature scheme, generate keys, sign, and verify. And these different things. Generate keys, you make asecret key and a public key. And so the idea is there’ssome public key which is your identity , and there’ssome secret key which you only control. And you use that toprove your identity and prove that thesemessages are signed by you. So yeah, yougenerate a key pair. The holder of the secretkey can sign a message. And then anyonepossessing a public key can verify a messagesignature pair. So I’ll go into detailon these three functions. And this applies generally.So I’m going to talk about ahash-based signature in detail, but there are manydifferent signature schemes. DSA, ElGamal, RSA signatures,elliptic curve signatures. There’s tons of differentcool math systems that allow these kinds of functions. And I’ll talkabout in some ways, this is one ofthe simplest ones. So yeah, there’sthese three functions. The first one is generate keys. And it returns a privatekey public key pair. And it generally doesn’ttake any arguments, but it takes in randomness. You need to flip coins. You need to find randomone and zero bits. And it has to be longenough that no one else can guess what your private key is. So you have a privatekey, public key. Public key is public. You tell everyone.Private key is more secret key. Actually, I think in thecode, I always say secret key. It’s usually better to saysecret key because at least it starts with aletter that’s not p. OK. And then the signingfunction, where you take your secretkey and your message, and it signs a messageand returns a signature. All these things are juststrings of ones and zeros. It’s just a bunch of bytes. Public key, a private key,a signature, a message. These are all just bytes. And then the verify function,which is the most complex. A verify function takes apublic key that you’ve seen, a message, and a signature. And it returns a Booleanwhether this was valid or not. So it returns a single bit. If it’s zero, it says,yeah, these two things don’t match up. Maybe the message justchanged, or maybe the signature has changed, or maybe it’sfrom a different public key or something. But if all three ofthese are correct, and the signing functionwas the private key– the secret key associatedwith this public key– was signed to this messageand produce this signature, then it will return true.And so you get intothe math properties of what does it meanto forge a signature, and can they be forgeablecomputationally? Eventually a lot of thesethings, since it’s bits, you could eventuallyguess the forgery. But maybe that takes two tothe 256 attempts or something. OK. So any questions about thebasic structure of what constitutes a signature scheme? Mostly make sense? And you can seehow this is useful. You can publish a publickey and say, hey, I’m Tadge. This is my public key. And in fact, on my businesscard, I have a RSA public key. And so if people get my businesscard and then I sign a message and email it to them, theycould be sure that, oh, this is probably the same guy. Nobody ever cares. But it’s useful for thestuff we were talking about before with Chaumian cash, whereAlice needs to authenticate to the bank, and one way todo it is to sign a message and say, hey, I’mAlice, give me a coin.And then Alice can sign amessage to Bob and so on. So this is really useful as abasic building block for all these kinds of messages. So I’ll talk inthe last 14 minutes about signatures from hashes. This is doable. Using just hash functions, youcan construct a signatures key. And in fact, that’sthe first problem set.And you implement a signaturesystem using only hashes. And the hash function isalready defined for you. It’s in the standard library. It’s just Sha-256, thesame thing bitcoin uses. And this is calledLamport signatures. Leslie Lamport wroteabout this late ’70s. I forget exactly whenthe paper came out. But this was one of theearliest cryptographic signature schemes. And it’s kind of cool. And another fun thing isit’s quantum resistant.So if you know aboutquantum computers, quantum computers kindof ruin all the fun in terms of cryptography. All the cool things we can dowith cryptography– not all, but most of them getruined by quantum. Computers but hashfunctions are quite resistant to quantumcomputers because they’re not based on any fun math. They’re based onthis black magic of just XORing andshifting numbers around. That’s a hugeoversimplification. But yeah, so thosehash functions are generally seen tobe quantum-resistance. So if you have asignature scheme that only uses hashfunctions, well, it still works, even if someoneinvents a quantum computer and can break allthese other things, like RSA and elliptic curves. So there’s actuallyrenewed interest in these kinds ofsystems recently.OK. So how do you make a signaturescene with just hash functions? So how do you generatea key, in this case? So a public key and a privatekey you want to generate. So first we generateour private key. Now these squaresare 32 bytes each, and you generate 256of them on this row, 256 of them on that row. So you’re generating 256 timestwo, or 512 32-byte blocks. And these blocks are each256 bits or 32 bytes. So in total, that’s what, 8K? Eight kilobytes, I think. Pretty big. But anyway, you’re saying,OK, here’s my private key. It’s all completely random. I just take slash devslash urandom or whatever, just flip coins 8,000 times,or however many this is total, and generate allthese different blocks and store them on my harddrive and keep it secret.Then I want to generatethe public key. So for each of these32-byte blocks, I take the hash of it,which will also be 32 bytes. So there’s now 512 hashes, 256on this row, 256 on this row. The green will be my public key. And the gray oneis my secret key. So they all look the same. They all look like just abunch of random ones and zeros.The gray ones actually are abunch of random ones and zeros. The green ones areactually hashes, though, of all the gray ones. And I publish the green ones. Just to serialize it,I just put in a row. I say, OK, here’s this first32-bit, second, third, fourth, and then go to this row orwhatever scheme you want.So how is this useful? Now everyone knowsa bunch of hashes, and I know a bunchof the preimages. So now it’s sort of thiscommit reveal thing, where if I reveal to you this,you can verify that, oh, yeah, that mapped tothis one later on. Any questions so farabout this process? Seems sort of useless butfairly straightforward. OK. Then I want to sign. So first, to sign amessage, I’m going to take the hash ofthe message to sign. And this is often done.It’s done in bitcoin. It’s done in most signatureschemes, where I want a fixed length number to sign. It’s annoying tohave to say, well, what if I want to signa megabyte long file, or what if I want to signof 10-byte long string? You want to standardize it. So whatever I’m signing,it’s always 256 bits long. So if I want to just signthe message hi, first I take the hash of the messagehi, which in Sha-256, this is the hash of hi. And so I look at thisas 256 bits, and I say, OK, I’m going to pick theprivate key blocks to reveal based on the bits here. So the first bit here isa one, because it’s an 8. And so I’ll reveal. And I indicatedbefore that there’s this zero row and this one row. And now what that meansis, well, the first bit of my message to sign is a one.So I’m going to revealthis gray square. And the next bit, thenext four bits, actually, since it’s an eight,are going to be zero. So I’ll revealthis and then I’ll reveal this, this, and this. And I just made it up. But yeah. So for example, if I’m signing,and it starts with 01101110, I reveal this preimage, thispreimage, this preimage, this preimage, thesethree, this one. And so I reveal preimagesbased on the bit representation of the message I’mtrying to sign, and then give everyone these. So my signature willjust be this sequence. I can go in row order here. Yeah, it’s probablya lot easier. So I go in sequence. I say, OK, here’s the first32 bytes of my signature. Here’s the next, here’sthe next, here’s the next. And so my signature endsup being 256 blocks long, each of which are 256 bits. So it’s like 8K. The keys are 16K andthis is 8K or something.Fairly big but totallydoable on a computer today. Eight kilobytes is no big deal. OK. Now to verify,take the signature, hash each blockof the signature, and see that it maps intothat part of the public key. So the people who areverifying the signature, they have your public key. They have all the green squares. And now they have beengiven a signature, which is these gray squares,and they say, OK, well, let me hash this one. Oh, it maps to that,so it maps to a zero. Oh, this maps to a one,this maps to a one, this maps to a zero. And they can gothrough and say yeah, this is a signatureon that message.In the case ofLamport signatures, you can actually determinewhat the message is just from the signaturein the public key. If you’re giventhis and you’re not told whether it’s a one ora zero, well, just compare. Hash it and compare tothese two green ones. You’ll be able to see. And that’s a usefulsignature because no one can forge thatbecause no one knows these preimages exceptfor the person who holds the secret key. So given your public key,I can’t forge a signature from you.Once the signature is issued,I also can’t forge a signature. The only bit sequence I knowis the one that you revealed. And so I know partof your private key. I know half of it. But that half only lets me signthe message you just signed. So I can’t really doanything extra with this. So this is a usablesignature scheme. I think I just showed it. But any downsides that youcan think of with this? AUDIENCE: You can only sign one. TADGE DRYJA: Yeah,you can only sign one. Is that what you were– AUDIENCE: You could alsosend the same message on to someone else withdifferent signatures. TADGE DRYJA: Yeah, butsignatures are sort of public. So yes, you’resaying that you can sign a message once and giveit to a bunch of people. And that’s sort of afeature, not a bug, I guess. There are different signatureschemes where you want, I only want this signatureto be valid to this person. There’s differentways to do that with Diffie-Hellmankey exchange and stuff. But the signature schemewe’ve talked about here with these three functions, thepublic key is really public, and anyone can verify.And that’s something we want. If you don’t want that,there’s other ways to do it. But yeah, the big one is,wait, you can only sign once. Once you generate a keypair, your private key, your public key, and you telleveryone these green squares, if you’re try to sign again,you will reveal more pieces of your private key. So if I sign twodifferent messages, sometimes it’s the same bit. Sometimes it’s different bits. And now I start revealingmore pieces of my private key. And now people can startto forge signatures because I can say, OK,well, the first bit, I can sign anythingon the first bit. I’m still constrainedhere and here and here. But in several locations, Ican sign whichever bit I want. And so the basic thing is,if there’s one signature, I can’t forge anything. If you give me two signatures,since it’s generally random, on average, half of thebits of the signature will be constrained.So in this case, if it’s 256bits long and you sign twice, I probably still can’t forgeanything because 128 bits, I have the freedomto pick either. And the other 128 bits, I’mstuck with the one or the zero and I don’t get to choose. So that means mostmessages I want to sign, I won’t be able to because if Itried two to the 128 attempts, I’ll be able to finda forged signature. But that’s a lot. And so maybe you can sign twice. But again, it’s probabilistic. You might get unluckyand reveal quite a bit more than 128 bits,where you get both. But on average– and then onceyou have three signatures, OK, now I’ve probably revealed3/4 of the locations you’re going to have boththe one and zero row.And you can start–and this starts to be practicalbecause in this case, you’d need a 2 two the 64attempts to forge a signature. And that’s doable ontoday’s computers.