3. Blockchain Basics & Cryptography

the following content is provided under a Creative Commons license your support will help MIT OpenCourseWare continue to offer high quality educational resources for free to make a donation or to view additional materials from hundreds of MIT courses visit MIT opencourseware at ocw.mit.edu I just want to say how touched I am that you are all still here I really you know there's a lot of shopping opportunities in the MIT courses and that you have come back and not shaken Louis after reading Satoshi nakamoto's peer-to-peer Bitcoin paper or maybe you just came back to see whether I was gonna crash and burn describing it but what we're gonna try to do in the next three classes just to frame it is really give you some of the technical underpinnings of blockchain technology through the lens of Bitcoin Bitcoin is just the first use case of blockchain technology so if I often say Bitcoin this or Bitcoin that it's really largely not entirely largely applicable to blockchain technology my feeling is I'm only about eight or nine months ahead of all of you I may have spent my whole professional life around finance and public service and I can talk a lot about markets and about public policy but MIT has given me the gift of thinking about blockchain technology and I'm trying to return that gift a little bit for you all and I have a few computer scientists in the room that are gonna bail me out if I don't get this right Sabrina and then oh I see Aileen is putting up his do you all know alene he's actually a PhD student at MIT computer science so somebody gets to that part of their lives what was that terrible life choice yeah but he's gonna bail us all out but the reason that I think it's relevant not to just belabor it is I really believe the only way that any of us can get to ground truth is to know a little bit about how the inner workings of this technology or you're not gonna have to do an algorithm or actually do a hash function but to know underneath it and then you can step away and say I no longer need to know how the carburetor on the car works but I know what a carburetor is or or you know whatever analogy you want so with that a little bit as opposed to sort of all that Socratic cold calling that I did last a class because money fiat currency is something at the core and Ledger's is at the core of a Sloan students either education or background there's a little less of the core if today's and the next couple of lectures if you can work with me that I want you to interrupt me any time you've got a question I'm not gonna do much cold calling oh I don't want you to relax too much I still want you to do the readings the next three classes but just raise your hand stop may say well but what what is that all about and that is just sort of we can work a little bit different on these next classes so as I've always going to be doing consistency what are the study questions so really what are the design features what are the key design features of this new technology blockchain and I put a few on the syllabus and we're gonna go through all this today and and next week cryptography append-only time stamps blocks distributed consensus algorithms and networking I list for later in this lecture you'll see eight or ten that I guess it's ten that we're gonna really dig dig into can I just get a sense of the class and this is not first talita or sabrina to write down notes about participation is it a decent assumption to did most or all of you at least read nakamoto's paper alright good all right great just a sense how many of you felt you got at least half of it maybe less than two-thirds for at least half of it all right pretty good when I first read it I was about with you so it's all right the lean you got more than half of it right you read it five years ago yeah yeah yeah yeah life choices talk them talk about it all right and you're taking this class good good so we'll go through each of those and then more specifically we're gonna peel back the cryptography the two main cryptographic algorithms or these words that you'll hear sometimes cryptographic primitives I mean what is a cryptographic primitive what's the two words together basically protects the communication and the presence of adversary so communications and computation that needs to be protected or verified have some form of cryptographic algorithm which happens to be called a cryptographic primitive the two main ones and there's a third one we'll talk about later in this semester but the two main wants hash functions just as a working knowledge of blockchain is worthy to know and we're gonna we're gonna get everybody's going to get there we're going to all get there to where you have some sense of what a hash function is and then this whole concept of digital signatures which relates to asymmetric cryptography those two are very fundamental to blockchain technology later in the semester we'll talk a little bit about zero knowledge proof but they're not as fundamental to the first application and so that's why they're you know kind of and they help making things verifiable and immutable and that's the business side the market side why does it matter otherwise like who cares what's in the car operator if it doesn't matter and then how does this all relate to the double spending problem I can call it calling on this Isabella do you remember what the double spending problem was from so in essence a double spend is when you have a piece of information and you use it twice and we haven't call this piece of information money but you use it twice you can send an email to two people and that's okay I mean it's a little embarrassing if you're sending it to one friend telling him you're available for dinner and the other friend thought you told them you weren't available but you can still send it to two places but in the system of money it's a critical thing that you don't use it twice the readings did was the demo helpful I mean we're gonna we're gonna do a lot more on the know I watched that demo last November December it was one of the first things I watched I'm an MIT student I don't know if you knew this Bosworth and I found it very helpful so I'm clear that and I see it's actually that demo is when a Stanford blockchain course as well so in the West Coast one of our competitors is using an MIT product and so we're gonna we're gonna just do a slight review of what we did in class two and then we're gonna talk about the key design features hash bunch just as I mentioned what is an append-only log block headers and Merkle trees and asymmetric cryptography and digital signatures crazy we're gonna cover all five of those today and then you're gonna tell me how we did oh Bitcoin addresses which is just a small thing six actually so last time for those of you that work with this we talked about money and again money is just a social construct or an economic consensus mechanism we're gonna talk a lot about consensus next Tuesday when we talk about the consensus protocol on Bitcoin but remember money itself is just a consensus there was a question on Tuesday I think a lien actually had asked this question about both what is it mean to be a liability of the central bank why is money what does that actually mean and I said it just means that somebody else will accept it it's a social consensus because it's not that they're gonna give you anything else it's just that you can get a bank deposit you can pay your taxes you can use it it's Starbucks if in fact you've already gotten a cup of coffee if you remember it's only legal tender for a debt and so forth fiat money is just in that long line but it's had its challenges instabilities it doesn't mean it's gonna go away I'm not a Bitcoin maximalism thinks that fiat currencies are gonna go away but fiat currencies have their instabilities particularly around weak monetary policy in essence when you debase a currency and allow a lot of it to be issued or usually around unstable fiscal policy so either the government is spending a lot the king is off to Foreign Wars and the Bank of England was actually set up in the late 17th century in essence to control the currency when the king was of England I think was in wars with France if I can recall a lot of banks central banks were set up right about when a sovereign was was off debasing a currency and spending too much at war ledgers we talked about Ledger's how critical Ledger's are in essence Ledger's are a way to keep records and those records could either be transaction records or balance records will see that bitcoin is set up as a transaction ledger system later we're gonna be talking about other blockchain technologies that are set up as balance Ledger's so one should not just think immutably that there's only one way to do this but transactions and Ledger's are the core of Bitcoin and central banking is of course built on Ledger's the master ledger of the central bank and then the commercial banks have sort of the sub Ledger's and then you can think sometimes your digital wallet maybe Starbucks has yet a third tier ledger we obviously live in an electronic age or already we know this there's been many efforts they've all died until Bitcoin to crack that riddle that we talked about peer-to-peer money without a central authority and later in the semester when we talk about what are the use cases that's gonna be the core thing and so why I'm not a Maximus I'm not sure in every circumstance a central intermediary isn't necessarily so bad and this is not a value judgment this is just pure money and markets and so forth but in some circumstances decentralization really will compete and be decentralized intermediary so let's talk about his little paper which of course he was modest or she was modest please remind me we don't know who Nakamoto is or was or a group of people I've been working on a new electronic cash system that's fully peer-to-peer with no trusted third party so you've seen this slide before but a time-stamped append-only law just think blocks of data you know it's to kind of oversimplify but it's got a name blockchain and I don't think did satoshis paper you all read it in the last few days I of course read it again yesterday just to make sure I remembered it I don't remember that he ever used the word blockchain am I right about that right so the words blockchain really have been sort of layered over his innovation so information blocks going on and that leads to basically a database but it's the blocks of data Bitcoin right now is about five hundred and fifty thousand blocks and the blocks are added on average every 10 minutes and we'll talk about why is every 10 minutes and not only why Satoshi Nakamoto made it every 10 minutes but how they maintained that other blockchains lucky theory emits about every 7 seconds so don't get too caught up that it's all the same and and and there's some technologists here Sylvia McCauley is working on algorri & & and that's even tighter less than 7 seconds so there's not one way there's multiple designs on how often blocks are at it but let's start with Bitcoin secured by yes guess what those two cryptographic primitives hash functions and digital signatures you lose anybody yet yeah maybe and then there's a consensus for agreement the whole debate usually about databases is who gets to change the data and this is true in all databases in its essence it's usually centralized but in blockchain it's all of a sudden well maybe it's not centralized who gets to add that next bit of information then the next block and the consensus agreement is which we'll discuss next Tuesday is about that very issue and I think there was a little pretty picture that I'd done in slides before but I'm gonna I'm gonna delay that discussion until next Tuesday and hopefully you all come back so what are the key features and I might do a little cold calling do you remember any key feature table from from the papers [Music] any other key features let's see how many I'm gonna have 10 on this page a private-public kid yes so asymmetric cryptography or private and public key yes hash functions yes private and public key and the other kind of key design features or or words you didn't understand maybe that's another way to put it Leandra address Bitcoin addresses three that's four of the things this is going well rehab the payment is is something that's trying to address it's not really a design feature but it's a they have a solution for double payments so I'll give you credit for it but it's alright so Hugo says minors which is really the consensus so I'll say that the design feature is the consensus or proof of work Kelly right so very interesting this concept of nodes and Satoshi actually talks about full nodes or lightweight nodes in essence how much information has to be stored I'm going to reserve that Kelly please remind me when we talk about block headers to come back to that but nodes and the network is a very important design feature over here Merkle tree structure so Merkel tree structure is a way to compress a lot of data and also to sort through that data oh no no Sabrina's not gonna clean me out here Merkle tree structure is there we're gonna talk about that two more all right what's that knots the knots okay so Ananse anybody know what the word nan says a year ago I didn't so we're all getting there look do I have a look do you know what a knot sis no in the actual vertical it's essentially a guess for the miners too so the word knots means a random number that is used once and for number and once it's a number that's random and it's used once that's how I've learned it and so one more because it serve this is great actually reminding your first name yeah Priya here here all right so how this is what I have cryptographic hash functions we're gonna go through these in more detail time-stamped append-only logs block headers and Merkle trees so Merkle trees were discussed but we need to actually say what what information is kept at the head of the block as opposed to all the bots the body and some of that's just to make it more manageable asymmetric cryptography which is this public key private key and signatures the Bitcoin addresses themselves which interestingly are a little bit different than public keys and then I spy breach break because I'm with the next we're gonna talk about next Tuesday the proof-of-work the miners the the nodes the Nazis are all in that little topic there's actually in Bitcoin a really important protocol is how information gets propagated on the Internet just the network communication it's not written about a lot you won't read a lot about it in the found your poppers digital gold or all the other popular books but it is an important thing to remind ourselves that information has to propagate around the Internet and and all these transactions have to communicate with each other there's currently about 10,000 nodes on the Bitcoin network we don't know where all they all of them are but they're probably in a hundred and eighty different countries and so it's just also the networking and communication matters and it matters to the economics a lot there's a native currency and this is interesting that was the one thing that no one said that's an actual technological design feature it's not only that he created a currency but the native currency is part of the economic incentive system and we'll have some fun with that in essence he said that when you mind and did the proof of work you created and you got some native currency called Bitcoin so he created an economic incentive system whomever Satoshi Nakamoto was or is knew a lot about economics as well as technology yes I just quickly add to what you said so it's not only that you create this native currency but once the finite supply is reached the currency can be distributed as a transaction which i think is so what Daniel just said is is really interesting not only to take light of this individual individuals that did this but this world of Bitcoin and other cryptocurrencies creates a unit of account that could be valued and once it's valued you have sort of a native currency but as daniel said Nakamoto also said there would be a finite limit it happens to be 21 million Bitcoin is the most it can be and will get there around the year 2040 does anyone know how many Bitcoin you are right now but half of you were invested in it so it's kind of curious you you go about seventeen million Bitcoin right now and all 17 million have come from this process of proof of work and mining initially it was 50 Bitcoin every 10 minutes roughly every 10 minutes then it went down to 25 and we're now at 12 and a half Bitcoin and does anyone know what today's value purported I always should say purported value of Bitcoin because I don't know if we can trust some of those websites that say with a value so what is it so $6,500 a Bitcoin at 12 and a half Bitcoin to mine a block so you use you see that it's about 80,000 u.s.

Dollars is the reward to mine a block right so so there he created an incentive system that initially if you got 50 Bitcoin and they weren't worth a penny you would not commit that much had you had to be a hobbyist basically in 2009 or a cyberpunk or just kind of curious because you weren't getting much incentive if in fact it's worth 6500 today you're getting $80,000 if you actually successfully mine a block and then there's the transaction inputs and outputs think about a check you know who signs it where you move money there's something called the unspent transaction ledger so this is the ledger part so when you think I think of the technology I think of cryptography which is kind of all that stuff at the top which we're gonna discuss today secondly the consensus mechanism in essence that's that key question of any database who gets to a man in the database who gets to decide to change the state of what we all agree to and then thirdly is the ledger or the transaction ledger which we're not going to deep dive into the scripting language but we are next Thursday going to talk a little bit about the underlying scripting so does that give you a pay F it's the well this cryptography the consensus and then the transactions in terms of the CPU power the electricity that will be consumed to mine that cloth how much does that translate to equivalent US dollar terms so the question that's asked is how much electricity is being consumed for that miner to get that reward that $80,000 and I'm gonna try to answer in one minute but we'll come back to this later in the semester about economics and blockchain economics and mining economics but what has happened over these ten years is more and more computers they're being used or or trying to mine for the Bitcoin and so today and the most recent research I've seen is that the probability of winning a block there's so much is it measured in Tara hashes I can't remember the numbers but it's how many Tara hashes which is is it 15 zeros is it Tara hash these trackers at 12 well in any event there's so many hashes being done a second X number of Tara hashes that your probability of winning is quite low and so what's happened is most nodes and miners have entered into agreements called mining pools where they smooth out the risk and everybody shares in the rewards but those economics we'll talk about later it's thought to be that you need electricity cost around three cents a kilowatt hour to be successful and in most parts of the world you can't get electricity for three cents a kilowatt hour so you would put your mining rigs where you can get low-cost electricity or where you possibly can you could get it legally low-cost or illegally low-cost so there there are a lot of mining rigs and in jurisdictions where there may be local officials that are allowing those mining rigs and instead of three cents a kilowatt hour to the electric company it's one to two cents a kilowatt hour to the local government officials so and the two largest mining pools are in China and the third is in Russia but we'll get into this sort of the economics and at least some theories about why summer where they are so cryptography so aleene's probably gonna clean me up it's not just communication in the presence of an adversaries it's also computation in the presence of adversaries is that that would be good and we talked about we're not going to deep deep dive if you remember even in ancient times if you were going to war there was this wonderful little way that you could do cryptography and then anybody's image invitation games yeah about the the British patent you know breaking into the German codes even though they should have probably given more credit to the Polish government that had probably broken into it in the 1930s but touring did a great work and then we're gonna talk about asymmetric cryptography for the day okay what is a hash function a hash function and these are just words that I think of it I think of it as a fingerprint for data but it has certain properties the one that you'll see throughout is that it takes inputs of input X it maps that input of any size to a fixed size so one that we use here in the US one hash function we all use is ZIP codes in a way it's five digits it's a fixed size I know I'm doing this as a loose hand you know how can I think of it but zip code you you might have 50,000 people or 5,000 people all living in one in one postal district and you can map them to zip codes and it's a to fixed length now I don't know whether my friends and the computer science departments but but it's an early sense of a hash function I just wanted to say there are there are tangible things in our life that act like hash functions problem with zip codes is it will not in any way be a secure hash function and you'll see that in a minute but it does take you can be a 300-pound person or a thirty pound kid and you still map into the same zip code it's deterministic it's always the same so if you take a certain set of data it will always give you the same hash and that's relevant to the the background and you can efficiently compute it you don't want to take a year to do this you've got to do it in short periods of time and in bitcoins case it's done in nanoseconds or less because they're one computer one CPU can do can remember probably how many millions a second a couple of Terra House hashes a second so it's a remarkably efficient algorithm and it's a bunch of mathematicians and and hashing started in the 1950s and 60s but the ones that we're talking about here are much more recent but it's it's really terrifically talented scientists mathematicians computer scientist and sometimes the National Institute standards of technology here in the u.s.

Working on hash functions so it takes a array of any size puts it into a fixed number I think ZIP codes for a minute it's deterministic it's always you only live in one zip code in a sense and it's very efficient but now what are this cryptographic properties because a zip code wouldn't make it it just wouldn't well the computer scientist uses term preimage resistant I would just say it's one way you can only go one way meaning its infeasible to determine the input from the output its infeasible to determine the the ax from the hash of X so anybody know why I use the word infeasible rather than impossible brute force what do you mean by brute force just so everybody but as I understand it a sort of tenet of cryptography for centuries is not to have it mathematically impossible it's it's the point is getting it so infeasible that your adversary can't either get the communication or so forth so hash functions I just say this because you can't assume that Bitcoin can't be broken we all call it immutable it is immutable until the hash functions that are inside of Bitcoin might be broken and even Satoshi wrote about this in 2010 he got emails there's this wonderful book if any of you want that I mentioned in the bookshelf at the end of the syllabus he said well what if sha-256 which is the hash function gets broken and and his answer by the way was well there'll be a better hash function at that time whatever that is will hash the entire system whatever that is because remember you can take something of any size hash it with the new system and move forward and so he he or she felt in this wonderful email is that bitcoin actually could transition to a new hash function as long as you you know had a little bit of time before it was all corrupted how is it for me to create a fork and so she doesn't analysis at the oh you're talking about in his paper yes in his paper he's talking about how hard it is computationally to do what some people call 51% attack to basically take over all the nodes and that part of his paper we're going to talk about next Tuesday but it's basically can you take over the nodes I was talking about a separate thing can you break the cryptography and he doesn't write about that in his paper he writes about in an email about ten months later or so second key cryptographic thing so we said one is it's one way the other thing is this concept of collision resistant I presume if everybody in this room told me your birthdays there's multiple people in this room who have the same birthday and in fact if we got it past 26 people in a room it's over 50% chance the two of you have the same birthday we don't need to get to a hundred and eighty three people in the room which is half of the days of the year we we can get to about twenty six or seven and similarly the key thing is is that two sets of data are it's again infeasible that x and y would hash to the same thing it's not impossible it's infeasible and if you look at the history of hash functions this is usually the thing they at some point in time these hash functions will not be collision resistant some quantum computing will come along or something will come along but for now you can put something of any size in and they're independent they also look terribly random it's called an avalanche effect meaning you change one little difference and the whole thing looks different so when you noticed on that little video if you changed one thing it all looks so different and why that's important is it makes it more secure and then there's something called puzzle friendliness even if you know a little bit of the input it doesn't mean that you're gonna get the output I put these up here not for you to know them you're not gonna get tested if you go into business as long you've started when you probably haven't thought well you know collision-resistant this or that but I just wanted you to know there are always a bunch of cryptography underneath this and the key is it is not a hundred percent of mutable it's like probably one and you know I don't know quadrillion immutable but they're still of these things could be broken and quantum computing and something else might it's much more than one in quadrillion so it's 1 over 10 to about the 40th how'd I do my math all right all right and anybody who's interested can come to office hours well five so it's it's highly unlikely to be broken but I think it's always worthwhile to say well no there there's some outward it's it's not as bounded as you think so what is it used for in many places it's used for names and references and pointers and and something called commitments in Bitcoin it's it's used for pointers for there's one block points to another block but it's also used in commitments you'll hear these words we're not going to delve into them but the headers and the Merkle trees use something called sha-256 which is a standard which is literally 256 bits long that's like zeros and ones for 256 registries but a Bitcoin address actually satoshi nakamoto through in a loop I'm glad to debate why but he uses two hash functions for Bitcoin addresses the one thing I saw that he actually wrote about it as he said if one of them's broken at least the other one is less likely to be broken so it was as I've read about it I think in his own voice is you have to hash something twice and he was just making it that much more secure even though alene it was 1 out of 10 to the 40th chance so remember where's Caroline I don't remember there we are you asked me about I thought I'd set it up for today but you were good to remind me for Tuesday what's the longest running hash time stamp – I'm not sure but he came out up that labs the super paper and surety there he is yeah so Haber and his his colleague yes you got it did you remember it terrific so I'm just trying to say it wasn't Bitcoin that had it he did this in 1991 but by 1995 they started a company called surety I don't think it took off that much it's not competing with Apple for the largest market cap or anything like that or Facebook but every week in the notices section you can see a hash literally that there's a it's time-stamped because it's in the New York Times and and it's a hash all those funky digits and everything of all the information came before it and they're basically hashing any document any document that you want a timestamp in that week you put it in one follows another and that's a blockchain it's not about money there's no native currency and so forth I believe that Hebert and stern Etta or three of the eight or nine footnotes in the Satoshi paper maybe it's four of them so he gets his credit and if you go to his website stored Hebert I think he says blockchains co-founder on his personal website no so here we get this was in the National Institute the NIST paper but timestamp dependent only logs in Bitcoin or blockchain what is put together is the header the top information and if I go past the visual and just say what's there there's five pieces of key information the version it doesn't change that often but there is a version number the previous blocks hash so it's some information about all the blocks that came before it the merkel route hash which does anybody wanted to tell me what that does the merkel so if i if i go back to the nice little picture the yellow box at the bottom of each of these blocks is all the transactions there could be upwards 2000 2000 transactions in a block so there's blockchain concept a thousand two thousand there's means and methods well before nakamoto's paper about how to compress that how to keep that information a little bit tidier and that uses this thing called merkel roots the thing the five items right at the top the what's called the block header doesn't have the thousand transactions and earlier kelly you had asked me about full nodes and light nodes a light node or a wallet that anyone here could download on your cell phone probably does not download the millions of transactions that have happened in the history of bitcoin you are unlikely to download what's called a full node but you might download all the headers this bit of information that's all of the headers all of the information in bitcoin is still not that large it's less than 200 gigs but that all of the headers I think is single-digit gigs I can't remember if it's 4 or 6 gigabytes right now what is the number any bites fifty megabytes 60 megabytes 60 megabytes so it's it's much smaller as opposed to like a hundred and eighty gig so Satoshi was thinking in advance and every blockchain that you're gonna work one likely I mean there might be some that this concept of it's really keeping the security by a little bit of information and something called a header and then pushing all the meat of the transaction and data down and this is really important when you get to like in theory I'm where there's a lot of data and a lot of computation down in each of these blocks it's sort of like if Stewart Hebert had a lot of documents and pictures and everything he didn't you don't have to have all the picture quality and a whole movie you can actually hash a whole movie and you still get these 256 bits so oops so the the header has the previous hash this merkel root which is just a way to get all the transactions just think of a merkel root as a way to grab two thousand transactions in a way a timestamp that one's easy we can get that difficulty target does anybody know what what blockchain bitcoin tried to do to make it more or less difficult over time no bro – we've heard to block the more miners there are so every block header needs to have some what's called a difficulty target how difficult is the mining going to be since we're talking about mining next Tuesday please all bring me back – difficulty target and then what's a nonce what's it a random number that's used what's number one snots and that's that's hash functions how do we do they're a little off the skids we are MIT yes the output not the input so could you help me pronounce your first name justa yeah Moe has asked the right question he's saying well how do you know especially as you have more and more time and more and more time you might get the same output of a hash from different inputs and if you recall wait somebody does recall know before produce in front of British it's possible but it's if like the miners are working like not at the same time it's like the same information is not treated at the same exact time it won't be a problem because you're correct as it relates to mining but there's another piece of it as well is that the hash function if it's a good cryptographic secure hash function is what's called collision resistant where what you're saying is so infeasible in fact 1/10 to the 40th you know that's a 1 with 40 0 it it's so infeasible to happen it's probably possible but infeasible to happen what you're referencing is is what if two parties solve the cryptographic puzzle as opposed to a collision and because of the difficulty they just got it at the same time please it seem like a dumb question but from the whole system time stamps are not a particularly important part of Bitcoin they are time stamped but sometimes if somebody puts something off and it's off by a few minutes or even up to two hours there's a there's a there's a check in the technology and the scripting function if the time steps off more than a couple hours so literally it's not that precise having said that the real way that time stamping happens is if if a block is mined and it's the 540,000 block and it's sort of accepted and all the nodes these 10,000 nodes start mining the 540,000 and first block the in essence it's just think of it as almost like a stack and so what's in essence more relevant than the actual time that's in the header and they all have a timestamp in the header but what's more relevant is the order of the blocks and most importantly the previous block hash I'm gonna partially agree with you because the difficulty adjustment happens every two weeks so even if any one individual or five or six time stamps are a little goofed up in the two weeks the algorithm is basically looking over course of about 2,000 blocks you need the time stamps put its in it it's more important it's the order of the blocks I want to hold that question for Tuesday but it has to do with rather than the collision issue what the papers talking about is if two minors solve the puzzle and that doesn't mean that they got identical hashes because the puzzle is not geared to getting an exact hash the Bitcoin puzzle is having a certain number of leading zeroes so it's literally said it started I think it was nine or ten leading zeroes I'm talking about ten years ago and now you have to hash to something with I think it's about twenty or twenty six leading zeros meaning it's gotten more and more difficult and the result of the hash has to have a bunch of leading zeros what you saw in that video I'm sorry so if it's only if it's only hashing the transactions how does it change when the hash of the previous block changes the idea it's it reminds me of that old television so Johnny Carson and you just did a great setup for the comedian so thank you so I'm gonna go to Merkel roots so Merkel roots which are by Natick binary data tree looks something like this if one had a thousand transactions I wouldn't have a pretty slide so this only goes to four levels but think of four transactions at the bottom there each hashed and then you concatenate you put the two hashes together you hash that you keep going up the tree if you had a thousand transactions because that's two to the tenth roughly then you'd have ten levels of this tree and so that's what happens and literally the the mining pool operators are doing this a lot for the nodes but in the Bitcoin core application in the in the software that anybody in this room could download the software if you wish there's software that helps takes transactions puts them basically into this binary tree called a Merkel tree uses hash functions and basically skinny's it all the way up to the top like given that the structure exists how does the root hash change with the previous block so basically all the forward will get invalidated because the hash changes oh but it doesn't seem to use the previous address so I'm going to repeat the question does a merkel root that is basically a summary of all that 10,000 transactions that are in a block change if the rest of the header changes or the previous plot change and the answer is no it only changes if some of the data in the ten thousand transactions changes and so a Merkel rule we'll change if you put different transactions in the mix or as is really important one of the incentives you get your 12 and a half bitcoins today in what's called a coinbase transaction and so one of these thousand transactions is the payment to the miner so the merkel route would be different depending upon who wins but that wasn't your question I'm just saying but Merkel routes are a very efficient way to take thousands of transactions store it up have one spot please so the order of the different transaction has to be exactly the same for everyone that is cashing right no actually not so if you're hashing and you're running a mining rig and allonge running a mining rig if a line solves the puzzle and propagates it out on the network and people start mining on top of a lines block because they say well he's finished yours you're not you're just gonna probably start mining on the top of his block and look in something called the mempool the memory pool is illness Network of all the free-floating transactions you'll scoop up the next set of transactions and so all the transaction and erode are alright so validation which is which is more next Thursday but I'll give it a shot no no it's a good question every transaction or actually you're setting me up digital signatures there you go thank you did you have a question or so the second cryptographic thing and we're going to keep going back and forth hash functions are basically a way to compress a lot of data have a fingerprint make sure that it's basically commitment digital signatures well remember that little graph that we had Alice and Bob Alice wants to send a note to Bob and just say hello Bob she wants to encrypt it she encrypts it with Bob's public key sends it to him he decrypt it with his private key you might say oh my god genzler what's a private key what's a public key in cryptography it's a way to kind of scramble information I know I'm really making this like but so if we went back to that little mechanism the Romans used or we used what the Germans used in the Enigma machine they were symmetric cryptography both people had the key the key was the Enigma machine with five rotors in the 1970 some wonderful technologist here and elsewhere basically said well what if the key isn't the same because the adversary can steal the key what if it's not symmetric but it's asymmetric there's a private key and a public key in essence there's two keys that have some mathematical relationship and the math between these two keys does matter for a class like this but know that the probably key and the private key link together they're bonded together but the critical thing is about digital signatures there's three functions you have to generate a key pair and when a key pair is generated a public key and a private key are generated at the same time and they need a random number to go into it and one of the things that makes a lot of Bitcoin and other wallets insecure and it's probably why some have been hacked the wallets not Bitcoin is because they don't have good random number generation yes British I saw I was at a conference last week where a technologist from University of Pennsylvania had done a survey 150 hedge funds mining companies and Bitcoin wallet companies and alike so they actually let a cyber security individual get inside and do a survey of 150 which would consider really committed high-end users of Bitcoin miners and hedge funds and and and and crypto exchanges and it was horrifying the their cybersecurity as to how their what they're doing with their private keys before even got to the private keys many of them didn't really have a secure way to create the random numbers to create their private keys so just a a piece when somebody says they have really good private key public key in the back your mind just know there's got to be some way to do a random number generation that's the only math that I'm gonna ask you to remember about there's a signature function and the key thing is a signature creates you can create a digital signature from a message and a private key so if Kelly has a private key and wants to send a secret message to somebody across the room Isabella you want a message from Kelly Kelly's gonna take the message you got this Kelly you're gonna take the message and you're gonna sign it with a private key you sent it over to Isabella House it was a billet now that is from you she's got to verify it so there's a function code of verification function and it comes back just yes now I mean it might just say it said differently but it's a just a yes now it's a verification function Isabella you want to do this with me is gonna verify your signature is valid for this message because you have the public key so you're right Isabella has your public key but using your public key she can verify that the signature it's magical math well it's not magical math it's real math but it's not math we need to study in this class the pup you can think of it in Bitcoin it uses an elliptic curve cryptography and you can think of it as that that the private key is based on the random number it's based it to me more technical so it's the the random number is what gets you to the public key but I think of it as the private keys of us the random number and then the public key is generated along it yes so you pick around the number actually because 0 and 2 to the 256 that's a private key to pick a public key you derived it directly from the private key in fact all you do is you exponentially another number by the private key so it's in think of the public key as a one-way function of the private key so dipping a public key you cannot recover the private key if you could then you could sign disastrous and instead of exponentiation and Bitcoin it uses a know a function called the elliptic curve but so what properties and these are the key economic properties as well as cryptographic properties basically it's infeasible and again I use the word in fees I didn't say impossible even though Eileen you might want to tell me that it's 1 over 10 to the 40th or something but it's infeasible to find a private key from a public key to reverse engineer impersonating you need to do a signature if you please just run your eye up there to do a digital signature you need a private key and a message and it's a function of the message and the private key like by sir let's call it complex math that digital signature was created from the private key and the public key was created from the private key and to oversimplify the reason that the verify function works is because both the digital signature and the public key that isabella has Isabella has this digital signature and she has the public key and she has the message the the math is such that basically the private key if you wish almost like factors out you know you but think of two functions she's got the isabella has Kelly's public key the message the digital signature it either verifies her it doesn't but she never has to see the private key and in fact Kelly does not want her to ever see the private key simplify that the way the validation of a digital signature works is message is run through a hash function which generates has and is encrypted with her private key then the message and the sick digital signature goes to the surveillance Abela what she does is use the same hash function to run it with the document to generate the hash function and uses the public key of K to unencrypt the signature in compare those two those two hashes correspond that means that the message belongs to Kenny and it hasn't been comfortable so that's the more or less simplification of the digital signature I mean the key is basically that there's a scheme unrelated to Bitcoin that exists for many other reasons on the internet many other reasons in in commerce and in wor that this public key private key cryptography and it's not simply just going back it's not just simply Alice sending something it's also digital signatures you generate the key pair everything in Bitcoin everything in etherium has key pairs public key and private key a digital signature but Kelly never lose your private key you got that do not and by the way you have to create it with a good random number generator because most sophisticated hedge funds around the world aren't so you're gonna be better than those that's what I learned at a conference at recently and then there's a verification function so is there any like third party generating the generator or the generator is like a function already existing and like already there so so the question is is if random number generation is so important are there outside parties that have good software in essence to produce the random number generation and the answer is yes and there's some that are not so good and yes some good laptops have it at the heart I want to skip ahead elliptic curve digital signature algorithm that's the actual algorithm that Bitcoin uses to take the private key and so forth but many of the wallets if you if you download a wallet application to hold your Bitcoin to hold your light coin to hold some other coin that wallet application has a random number generation software I can't attest to all the random number generation software I'm not a cybersecurity expert but there's probably a range of some that are a little bit more they're stronger ones the key to random number generation is if you're generating any length that it truly is not clumpy er that there's that they say it's what maximum entropy you know and that you really don't have any clumps if it all clumps in one area then that's not great randomness so I just want to finish because there's one other thing we're gonna chat about to lay the groundwork is Bitcoin addresses I put that up you can look at the slides later the details don't matter much but the key thing is is that when you hear somebody talk about public keys and Bitcoin addresses colloquially we all reference them the same they're actually not the technology that Nakamoto did was he uses the public key he literally hashed it twice once with with this program this hash function called sha-256 another hash function then concatenates and puts a little checksum at the end and then uses something called a base 58 to make it even shorter i've gone back and read some of nakamoto's emails for the two years after he published all this and I've read other things my understanding is is the reason there's two hash functions and actually two different lines was just to make everything a bit more secure also a public key is a very long it's about 512 bits and so you can shrink the data and make the data more pressed by hashing it which took it to 256 bits he hashes it twice and then he does this base 58 that makes it even a little tighter so for all purposes you can go ahead and just use public key and Bitcoin addresses to say but remember back in the mind Oh actually they're they're a little different and Bitcoin addresses are a little bit more secure supposedly unless of course somebody's hacked into your wallet and figured out all these little details a Bitcoin address is a little bit like the signatures on these notes we talked about right remember what 1/2 of you don't use checking accounts but these are early forms of checks and there's a signature on the bottom that's really kind of a Bitcoin address I'm sorry the signature is the digital signature the address the Bitcoin address is who it's paid for and I promise last slide we're gonna be talking about this next week transactions all that stuff that rolls up into the Merkle trees all that little itty-bitty important information they basically have an input and an output the input and a lock time but the input is a previous transaction this uniquely identifies basically money and you're gonna send value in Satoshi's he named the unit of account for himself there's a lot of satoshis and every one bitcoin that's why we don't hear much about Satoshi's but there's 10 to the 8th satoshis and every one bitcoin so when you actually enter in the computer code and a transaction you're doing it in satoshis and it's sent to a public key that's a coin that is what the incentive system is all about any other questions and this is just I know there's a lot I wonder how many of you are gonna come back one Thursday no let me say this it's not just that we're at MIT but we are at MIT come on everybody in this room can get these kind of key concepts the key questions that we talked about were time-stamped append-only logs does anybody want to tell me what it touch what if this class here in the next seven minutes can get these two concepts that's all we talked about for the last hour so I don't know your name in the orange shirt what's that Candra andrew what's time append-only logs is essentially actually they're a block is blockchain uses it with a time and that can't be changed in the future so it's kind of immutable because of all this cryptography Stewart Hebert was making it in a timestamp append-only log and he was placing it where Carol and you still with me where was hey we're putting it New York Times there you go in the classified section so it's just it's a bunch of blocks of data compressed up so we talked about something called Merkle trees and Merkle roots just think about is that's a way to take a lot of information and compress it but also make it searchable later because a thousand transactions when we talk next week you'd have to be able to verify somebody asked me about how to verify right chuckling when you go back to verify you need an index number to find it in that Merkle tree situation and it's secured through hash functions anybody want to tell me the easiest lay definition of a hash hash function Jennifer [Music] just one right you could take a picture of this classroom and everybody exactly and it could map into something I don't know what a QR code be a form of a hash not cryptographically secure but is it a is it a hash graphic hash function is a way to take not only a lot of information and put it into a fixed form but the key thing here is the hash functions are what tie the blocks together because hash functions can point to previous information and as the video showed if you change any of the underlying information the hash changes so what does that give you it basically secures the data you know if somebody's tampered so the only reason to really learn about hash functions is it's to say oh I get it this is one of the ways to make this data tamper proof how would any relevant change be adopted in the Bitcoin is always a challenge because it's a decentralized network and all decentralized networks have a little bit of a governance challenge the governance challenge is how do you do software updates we all know that on our laptops our iPhones there's probably software updates going on here now unbeknownst to me right there probably just apples drop you know I mean who knows what they're doing in here right and and uber I really one of my favorites who knows what's happening inside this phone but the the commercial enterprise the central authority has a way to update the software we probably signed some Terms of Use that allows them to do that in in a decentralized network like this there has to be consensus and so the only way really to update the software for a new hash function or for most everything else is in essence that the nodes the operators of the software collectively in a consensus form adopted so it's another way that not only is the data immutable because of these hash functions but the software is and that comes both with benefits and costs some people would say that's a bug of blockchain some people say it's a feature you can come to your own judgment over the course of the semester but the software is harder to update then software in centralized authorities because centralized authorities just say they just push the the now sometimes you have to click and say update so but but don't be naive not every software do you click I mean there's some that's just happening but here you've got to have consensus I know it didn't answer your question about the hash function but if it were a hash function that had to be updated and everybody said they had to quickly update it there's there's interesting debates about this but you wouldn't need to go back over all 540,000 previous blocks you could just hash all five hundred and forty thousand blocks 180 gigabytes to one 256 or maybe it's a DAT different and and then you'd have that and would be tamper proof so those are the key things that's what we covered really what we're gonna cover next tuesday's consensus protocol we've talked a lot about proof of work here because everybody thinks a bitcoin about proof of work but we're going to talk about proof of work the nodes and the native currency and then next Thursday we're going to talk about transactions again I try to break down this technology if you want to forget about this lecture and you're gonna go oh my god it was like going to the dentist you could tell your friends that you actually know something about cryptography it is called crypto currencies so how could we not know something about cryptography but it's basically those three things its cryptography it's a consensus mechanism and the transactions so write cryptography consensus mechanism transactions and we will get through and and you'll see this matters to finance and whether it's got any use cases so thank you you

test attribution text

Add Comment