Jekyll2018-05-15T14:28:20+00:00https://mroseman95.github.io/Matthew Roseman:ramen: Minimalist Jekyll TemplateScraping Dynamic Pages with Scrapy + Selenium2018-05-14T20:15:00+00:002018-05-14T20:15:00+00:00https://mroseman95.github.io/scraping-dynamic-pages<p>If you already know how to set up Scrapy and Selenium, skip to the <a href="#integration">Integration</a> section to see how to
integrate the two.</p>
<p>The final code can be viewed <a href="https://github.com/mroseman95/twitch-featured-scraper">here</a>.</p>
<h3 id="contents">Contents</h3>
<ul>
<li><a href="#overview">Overview</a></li>
<li><a href="#scrapy">Scrapy</a></li>
<li><a href="#selenium">Selenium</a></li>
<li><a href="#integration">Integration</a></li>
<li><a href="#alternatives">Alternatives</a></li>
</ul>
<h2 id="overview">Overview</h2>
<p><a href="https://scrapy.org/">Scrapy</a> is a python framework used for scraping websites, but a common problem is finding a way to get data off of a site
that is dynamically loaded. Many websites will execute JavaScript in the client’s browser, and that JavaScript will grab
data for a webpage. Scrapy does not have the ability to execute this JavaScript.</p>
<p><a href="https://www.seleniumhq.org/">Selenium</a> is a tool that automates web browsers for testing purposes, but it can be used along with Scrapy to load all of
a site’s data whenever Scrapy sends a request.</p>
<p>Lets say we want to scrape <a href="https://www.twitch.tv/">Twitch</a> for the currently featured stream. There is probably a way to
do it through the API, but lets pretend there isn’t.</p>
<p>To start we can go to Twitch and inspect the page through your browser and see what the HTML looks like.</p>
<p>You’ll probably see something like this…</p>
<p><img src="https://mroseman95.github.io/assets/images/twitch_featured_html.png" alt="Twitch Featured" /></p>
<p>pretty much all the data you see on twitch is loaded through JS. Without it you would just get a blank page with a
loading icon like this…</p>
<p><img src="https://mroseman95.github.io/assets/images/twitch_no_js.png" alt="Twitch No JS" /></p>
<p>So we are going to need to use Scrapy and Selenium to get the data we want.</p>
<h2 id="scrapy">Scrapy</h2>
<p>To set up your dev environment install scrapy.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">pip install scrapy</code></pre></figure>
<p>Make sure to check the documentation <a href="https://docs.scrapy.org/en/latest/intro/install.html">here</a></p>
<p>Then create scrapy’s files.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">scrapy startproject twitch_featured</code></pre></figure>
<p>Now we are going to create a spider to crawl twitch.</p>
<p>Go to your spiders directory.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nb">cd </span>twitch_featured/twitch_featured/spiders</code></pre></figure>
<p>And create a new spider <strong>twitch.py</strong></p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">scrapy</span>
<span class="k">class</span> <span class="nc">TwitchSpider</span><span class="p">(</span><span class="n">scrapy</span><span class="o">.</span><span class="n">Spider</span><span class="p">):</span>
<span class="n">name</span> <span class="o">=</span> <span class="s">'twitch-spider'</span>
<span class="n">start_urls</span> <span class="o">=</span> <span class="p">[</span>
<span class="s">'https://twitch.tv'</span><span class="p">,</span>
<span class="p">]</span>
<span class="k">def</span> <span class="nf">parse</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">response</span><span class="p">):</span>
<span class="k">pass</span></code></pre></figure>
<p>Scrapy will send a request to each url in <strong>start_urls</strong> and pass the response to the <strong>parse</strong> method.</p>
<p>Right now we aren’t doing anything with Twitch’s response, so lets use scrapy selectors to get data off of the page.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">parse</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">response</span><span class="p">):</span>
<span class="n">streamer</span> <span class="o">=</span> <span class="n">response</span><span class="o">.</span><span class="n">xpath</span><span class="p">(</span><span class="s">'//p[@data-a-target="carousel-broadcaster-displayname"]/text()'</span><span class="p">)</span><span class="o">.</span><span class="n">extract</span><span class="p">()</span>
<span class="n">playing</span> <span class="o">=</span> <span class="n">response</span><span class="o">.</span><span class="n">xpath</span><span class="p">(</span><span class="s">'//p[@data-a-target="carousel-user-playing-message"]/span/a/text()'</span><span class="p">)</span><span class="o">.</span><span class="n">extract</span><span class="p">()</span>
<span class="k">yield</span> <span class="p">{</span>
<span class="s">'streamer'</span><span class="p">:</span> <span class="n">streamer</span><span class="p">,</span>
<span class="s">'playing'</span><span class="p">:</span> <span class="n">playing</span>
<span class="p">}</span></code></pre></figure>
<p>We are giving scrapy an xpath, and it uses that to grab the text that tells you the broadcaster’s displayname and
current game.</p>
<p>For more information about how xpaths work, look at this <a href="https://www.w3schools.com/xml/xpath_intro.asp">tutorial</a>.</p>
<p>To test our code we can run…</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">scrapy crawl twitch-spider <span class="nt">-o</span> output.json</code></pre></figure>
<p>We are telling the twitch-spider to crawl it’s URLs and send the data it scrapes to output.json file.</p>
<p>But the output.json file is empty because we havn’t added Selenium yet. The data we want isn’t there.</p>
<h2 id="selenium">Selenium</h2>
<p>Selenium requires a pre existing browser to be installed. More specifically the driver for a browser. For this tutorial
I’ll be using this <a href="https://sites.google.com/a/chromium.org/chromedriver/">Chrome driver</a>. The documentation for what else Selenium supports can be found
<a href="https://www.seleniumhq.org/about/platforms.jsp">here</a></p>
<p>Install Selenium with pip</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">pip install selenium</code></pre></figure>
<h2 id="integration">Integration</h2>
<p>Now that we have Scrapy set up and Selenium installed, we need to integrate the two together.</p>
<p>I will be puting the Selenium code in the DownloaderMiddleware. The methods in this class are called whenever Scrapy
makes a request. It modifies the request/response in some way, and passes it back to Scrapy.</p>
<p>This diagram explains the steps Scrapy takes.</p>
<p><img src="https://mroseman95.github.io/assets/images/scrapy_architecture.png" alt="Scrapy Architecture" /></p>
<p>We are going to be putting code right after step 4 that makes the request through Selenium, and then we’ll pass back
what Selenium loads as step 5.</p>
<p>First we need to activate the downloader middleware class. Search <strong>settings.py</strong> for this code, and uncomment it.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="c"># DOWNLOADER_MIDDLEWARES = {</span>
<span class="c"># 'twitch_featured.middlewares.TwitchFeaturedDownloaderMiddleware': 543,</span>
<span class="c"># }</span></code></pre></figure>
<p>Open up the middlewares file located at <strong>twitch_featured/twitch_featured/middlewares.py</strong></p>
<p>Outside of the middleware classes, initialize the Selenium driver</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="o">...</span>
<span class="kn">from</span> <span class="nn">scrapy</span> <span class="kn">import</span> <span class="n">signals</span>
<span class="kn">from</span> <span class="nn">scrapy.http</span> <span class="kn">import</span> <span class="n">HtmlResponse</span>
<span class="kn">from</span> <span class="nn">selenium</span> <span class="kn">import</span> <span class="n">webdriver</span>
<span class="n">options</span> <span class="o">=</span> <span class="n">webdriver</span><span class="o">.</span><span class="n">ChromeOptions</span><span class="p">()</span>
<span class="n">options</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s">'headless'</span><span class="p">)</span>
<span class="n">options</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s">'window-size=1200x600'</span><span class="p">)</span>
<span class="n">driver</span> <span class="o">=</span> <span class="n">webdriver</span><span class="o">.</span><span class="n">Chrome</span><span class="p">(</span><span class="n">chrome_options</span><span class="o">=</span><span class="n">options</span><span class="p">)</span>
<span class="o">...</span></code></pre></figure>
<p>Then look for the <strong>TwitchFeaturedDownloaderMiddleware</strong> class, and the <strong>process_request</strong> method.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">if</span> <span class="n">request</span><span class="o">.</span><span class="n">url</span> <span class="o">!=</span> <span class="s">'https://www.twitch.tv/'</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">None</span>
<span class="n">driver</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">request</span><span class="o">.</span><span class="n">url</span><span class="p">)</span>
<span class="n">body</span> <span class="o">=</span> <span class="n">driver</span><span class="o">.</span><span class="n">page_source</span>
<span class="k">return</span> <span class="n">HtmlResponse</span><span class="p">(</span><span class="n">driver</span><span class="o">.</span><span class="n">current_url</span><span class="p">,</span> <span class="n">body</span><span class="o">=</span><span class="n">body</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="s">'utf-8'</span><span class="p">,</span> <span class="n">request</span><span class="o">=</span><span class="n">request</span><span class="p">)</span></code></pre></figure>
<p><strong>process_request</strong> is called anytime scrapy makes a request. The code we added tells Selenium to make the request to
<strong>https://www.twitch.tv/</strong> through the Chrome driver, get the page_source of the response, and then we return that as a
Scrapy HtmlResponse.</p>
<p>The code in Scrapy to make a request is unchanged, we are just making the request go through Selenium, and executing any
dynamic content.</p>
<p>Running Scrapy now will most likely work. It will output some json that contains the featured streamer’s name and game.
The reason it may not work is that Twitch has a lot of JavaScript to execute. In fact it is continuously executing
JavaScript. Selenium only lets the page load for a certain time, and the data we want might not have loaded in time.</p>
<p>One way to fix this is to tell Selenium to wait until the element we want is loaded.</p>
<p>Add these imports in <strong>middlewares.py</strong></p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">selenium.webdriver.common.by</span> <span class="kn">import</span> <span class="n">By</span>
<span class="kn">from</span> <span class="nn">selenium.webdriver.support.ui</span> <span class="kn">import</span> <span class="n">WebDriverWait</span>
<span class="kn">from</span> <span class="nn">selenium.webdriver.support</span> <span class="kn">import</span> <span class="n">expected_conditions</span> <span class="k">as</span> <span class="n">EC</span></code></pre></figure>
<p>And add this code to <strong>process_request</strong></p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">process_request</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">request</span><span class="p">,</span> <span class="n">spider</span><span class="p">):</span>
<span class="k">if</span> <span class="n">request</span><span class="o">.</span><span class="n">url</span> <span class="o">!=</span> <span class="s">'https://www.twitch.tv/'</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">None</span>
<span class="n">driver</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">request</span><span class="o">.</span><span class="n">url</span><span class="p">)</span>
<span class="n">WebDriverWait</span><span class="p">(</span><span class="n">driver</span><span class="p">,</span> <span class="mi">10</span><span class="p">)</span><span class="o">.</span><span class="n">until</span><span class="p">(</span>
<span class="n">EC</span><span class="o">.</span><span class="n">presence_of_element_located</span><span class="p">((</span><span class="n">By</span><span class="o">.</span><span class="n">XPATH</span><span class="p">,</span> <span class="s">"//p[@data-a-target='carousel-broadcaster-displayname']"</span><span class="p">))</span>
<span class="p">)</span>
<span class="n">body</span> <span class="o">=</span> <span class="n">driver</span><span class="o">.</span><span class="n">page_source</span>
<span class="k">return</span> <span class="n">HtmlResponse</span><span class="p">(</span><span class="n">driver</span><span class="o">.</span><span class="n">current_url</span><span class="p">,</span> <span class="n">body</span><span class="o">=</span><span class="n">body</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="s">'utf-8'</span><span class="p">,</span> <span class="n">request</span><span class="o">=</span><span class="n">request</span><span class="p">)</span></code></pre></figure>
<p>We used the same xpath as before, and we told Selenium to wait until the element we are looking for is loaded, or if it
hasn’t after 10 seconds to throw an exception.</p>
<p>It’s important to note that Scrapy will make additional requests to a various endpoints, and to make sure you are only
using Selenium on the actual request to twitch. This is done in the first lines of <strong>process_request</strong> where we check
the request url.</p>
<p>Adding additional code to tell Selenium to wait is usually not necessary. It depends on how long the page you are
scraping takes to load.</p>
<p>Now if we run our code</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">scrapy crawl twitch-spider <span class="nt">-o</span> output.json</code></pre></figure>
<p>and look in output.json, you should see something like this…</p>
<figure class="highlight"><pre><code class="language-json" data-lang="json"><span class="p">[</span><span class="w">
</span><span class="p">{</span><span class="s2">"streamer"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">"ForzaRC"</span><span class="p">],</span><span class="w"> </span><span class="s2">"playing"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">"Forza Motorsport 7"</span><span class="p">]}</span><span class="w">
</span><span class="p">]</span></code></pre></figure>
<h2 id="alternatives">Alternatives</h2>
<p>If you would like to scrape your page using Selenium library, you could move the code from the downloader middleware to
your spider, and manually make your requests there. This may require some manipulation of how Scrapy handles requests so
that you don’t make two requests, one through Scrapy and one through Selenium.</p>
<p>By putting it in your downloader middleware it lets you keep using
Scrapy normally, and not have to worry about setting up Selenium for each spider.</p>
<p>Reading up on Scrapy/Selenium documentation will give you a better idea of how the two can work together.</p>Matthew RosemanIf you already know how to set up Scrapy and Selenium, skip to the Integration section to see how to integrate the two.Bitcoin Scripts2018-01-10T18:20:00+00:002018-01-10T18:20:00+00:00https://mroseman95.github.io/bitcoin-scripts<h3 id="contents">Contents</h3>
<ul>
<li><a href="#overview">Overview</a></li>
<li><a href="#opcodes">Opcodes</a></li>
<li><a href="#standard-transaction">Standard Transaction</a></li>
<li><a href="#burning-coins">Burning Coins</a></li>
<li><a href="#puzzles">Puzzles</a></li>
</ul>
<hr />
<h2 id="overview">Overview</h2>
<p>In this post, I wish to describe a feature of bitcoin I wasn’t immediately aware of until I started reading about
Ethereum. Bitcoin comes with the feature of adding scripts that determine if some amount of coins are spendible.</p>
<p>Lets say Alice gives Bob 1 btc. In the transaction there is a script that must be executed and return True before Bob
can spend that bitcoin. Normally the script will simply ask that bob sign the previous transaction (the one from Alice
to Bob), and provide his public key. This ensures that the person Alice gives the bitcoin to is the one spending it. But
there is a lot more you can do with these spending requirements.</p>
<p>Alice could perform a transaction to Bob and require Bob’s signature and a third parties signature before Bob spends the
bitcoin. Or there could be no requirement, and anyone could spend the bitcoin.</p>
<p>In a previous post I talked about proof of burn, and how that involves sending some cryptocurrency to an <strong>unspendable</strong>
account. These scripts would be used to “burn” some bitcoin. You could have a script just automatically return false, or
maybe it adds 2 + 2 and only allows the coins to be spent if the result is 8. Once a transaction is executed with an
impossible script attached the coins are considered burned or stuck forever in the receiving account.</p>
<p>Etherum has built on this idea, and created a more built out language than the primitive scripting operators bitcoin
provides. This allows for more complicated tasks to be performed for Ether to be spent, and also allows the scripts to
access some outside data, leading to more uses.</p>
<h2 id="opcodes">Opcodes</h2>
<p>The building blocks of these scripts are the opcodes bitcoin provides. Under the hood these exist as bytes assigned
specific meanings, and perform simple tasks like loading data onto the stack, comparing values, and returning True or
False.</p>
<p>A full list of the opcodes can be found on the <a href="https://en.bitcoin.it/wiki/Script">bitcoin wiki</a> (as well as a more
indepth description of how scripting works).</p>
<p>A simple example is <strong>OP_0</strong> or <strong>OP_FALSE</strong>. These opcodes do the same thing and simply push an empty array of bytes to
the stack. The stack is where data is manipulated in this environment. A lot of opcodes will look at the top of the
stack and perform some function based on what is there.</p>
<p>Constants can be added to the stack through <strong>OP_PUSHDATA1</strong>, <strong>OP_PUSHDATA2</strong>, and <strong>OP_PUSHDATA4</strong>. These look to the
next 1, 2, or 4 bytes to get the length of the constant in bytes, and then adds the next specified length of bytes to the
stack. So if you had a script that said</p>
<p><code class="highlighter-rouge">OP_PUSHDATA1 <0x01> <0x2A></code></p>
<p>it would see <strong>OP_PUSHDATA1</strong>, look to the next byte constant to see how much data it’s going to push to the stack and see
0x01 or one byte, then it would push 0x2A to the stack, which is <strong>00101010</strong> in binary.</p>
<h2 id="standard-transaction">Standard Transaction</h2>
<p>If you want to transfer some amount of bitcoin from your account to another and only want to allow the receiver to
spend the bitcoin, then you would use this standard script.</p>
<p>First some opcodes that need explaining.</p>
<ul>
<li><strong>OP_DUP</strong> - duplicates the top stack item, and adds both to the stack.</li>
<li><strong>OP_HASH160</strong> - the top stack value is hashed twice, first with <em>SHA-256</em> then with <em>RIPEMD_160</em>, then readded to the
stack</li>
<li><strong>OP_EQUAL</strong> - adds true to the stack if the two top stack items are equal, false if they aren’t</li>
<li><strong>OP_VERIFY</strong> - if top stack item is not true, then the script fails. Also removes the top stack item</li>
<li><strong>OP_EQUALVERIFY</strong> - performs <strong>OP_EQUAL</strong> and then <strong>OP_VERIFY</strong> in succession. So if the two top stack values are
equal, it continues the script. Otherwise the script fails.</li>
<li><strong>OP_CHECKSIG</strong> - Takes a signature of all the transaction data, and the public key used for that signature, and
confirms the public key matches the private key used to sign the data.</li>
</ul>
<p>So the standard transaction looks like the following.</p>
<p>(On the bitcoin wiki, and here, some opcodes that push constants to
the stack are omitted. So any point that has <data> implies there is an appropriate <strong>OP_PUSHDATA1</strong> before it.)</p>
<p><code class="highlighter-rouge">scriptPubKey: OP_DUP OP_HASH160 <pubKeyHash> OP_EQUALVERIFY OP_CHECKSIG</code>
<code class="highlighter-rouge">scriptSig: <sig> <pubkey></code></p>
<p><strong>scriptPubKey</strong> is the part of the script the sender adds, and <strong>scriptSig</strong> is what the receiver adds.
The script always executes the <strong>scriptSig</strong> part first, and then <strong>scriptPubKey</strong> second.</p>
<p>So if Alice sends Bob some bitcoin, the transaction will include the <strong>scriptPubKey</strong> data, and Bob will add the
<strong>scriptSig</strong> parts, and they will be combined and executed.</p>
<p>For <sig>, Bob would take all the transaction data, hash it, and then sign it using his private key.</p>
<p>For <pubkey>, Bob would just use his public key.</p>
<p>For <pubKeyHash>, Alice would use the hash of Bob’s public key.</p>
<p>The combined script looks like this.</p>
<p><code class="highlighter-rouge"><sig> <pubkey> OP_DUP OP_HASH160 <pubKeyHash> OP_EQUALVERIFY OPCHECKSIG</code></p>
<p>Here is a breakdown of what the stack looks like while this script executes</p>
<ul>
<li>First two constants are pushed to the stack.</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code><pubkey>
<sig>
</code></pre></div></div>
<ul>
<li><strong>OP_DUP</strong> duplicates the top stack item</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code><pubkey>
<pubkey>
<sig>
</code></pre></div></div>
<ul>
<li><strong>OP_HASH160</strong> hashes the top stack item.</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code><pubHashA>
<pubkey>
<sig>
</code></pre></div></div>
<ul>
<li><pubKeyHash> is pushed to the stack.</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code><pubKeyHash>
<pubHashA>
<pubkey>
<sig>
</code></pre></div></div>
<ul>
<li><strong>OP_EQUALVERIFY</strong> checks to see if the public key given matches the destination of the transaction. Fails if they
don’t match. (At this point, we know that the public key used to create <sig> is the same public key the bitcoins
were sent to)</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code><pubkey>
<sig>
</code></pre></div></div>
<ul>
<li><strong>OP_CHECKSIG</strong> checks to see if the given signature was made using the correct transaction data, and with the given
public key. (Now we know that the person spending these bitcoins has the corresponding private key to the public key
of this address)</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>true
</code></pre></div></div>
<h2 id="burning-coins">Burning Coins</h2>
<p>So what if you wanted to send some coins to an address and make them unspendable. This is useful if you want to burn the
coins, maybe for a proof-of-burn protocol to convert the coins into another cryptocurrency.</p>
<p>You would just make the <strong>scriptPubKey</strong> this.</p>
<p><code class="highlighter-rouge">OP_RETURN</code></p>
<p>The bitcoin wiki points to this
<a href="https://blockexplorer.com/tx/eb31ca1a4cbd97c2770983164d7560d2d03276ae1aee26f12d7c2c6424252f29">transaction</a> as an
example of unspendable coins.</p>
<p>When someone mines this transaction, the coins are not added to the UTXO set. This is a set of all unspent transaction
outputs. These coins don’t have the potential to be spent, so they aren’t considered “unspent”.</p>
<p>By making sure the <strong>scriptPubKey</strong> is executed last, no matter what the <strong>scriptSig</strong> is, it will always end with
<strong>OP_RETURN</strong> ensuring the scirpt fails.</p>
<h2 id="puzzles">Puzzles</h2>
<p>One potential use of these scripts is to hold some coins in an account, and only allow them to be spent if some puzzle
is solved. The <strong>scriptPubKey</strong> would specify the puzzle, and the <strong>scriptSig</strong> would have to put some data on the
stack, that causes the <strong>scriptPubKey</strong> to succeed by leaving true on the stack.</p>
<p>There are some limitations with what you can do with the given opcodes, but a simple example would be to find the source
of a hash. Again the bitcoin wiki points to an example of this
<a href="https://blockexplorer.com/tx/a4bfa8ab6435ae5f25dae9d89e4eb67dfa94283ca751f393c1ddc5a837bbc31b">transaction</a>.</p>
<p>The <strong>scriptPubKey</strong> looks like this</p>
<p><code class="highlighter-rouge">OP_HASH256 <6fe28c0ab6f1b372c1a6a246ae63f74f931e8365e15a089c68d6190000000000> OP_EQUAL</code></p>
<p>So to spend these coins, you must create a <strong>scriptSig</strong> that puts some number on the stack, that when hashed with
SHA-256, results in the given constant. Anyone is able to try this puzzle, and if they come up with a <strong>scirptSig</strong> that
causes the <strong>scriptPubKey</strong> to succeed, then they get the bitcoins at this address.</p>matthewrosemanContents Overview Opcodes Standard Transaction Burning Coins PuzzlesPost Quantum Cryptography2017-11-21T17:11:00+00:002017-11-21T17:11:00+00:00https://mroseman95.github.io/post-quantum-cryptography<h2 id="intro">Intro</h2>
<h3 id="contents">Contents</h3>
<ul>
<li><a href="#overview">Overview</a></li>
<li><a href="#shors-algorithm">Shor’s Algorithm</a></li>
<li><a href="#hash-based-signature-algorithms">Hash Based Signature Algorithms</a></li>
<li><a href="#code-based-algorithms">Code Based Algorithms</a></li>
<li><a href="#lattice-based-algorithms">Lattice Based Algorithms</a></li>
<li><a href="#further-reading">Further Reading</a></li>
</ul>
<hr />
<h2 id="overview">Overview</h2>
<p>First of all, not all cryptographic algorithms are broken by quantum computing.</p>
<p><strong>AES</strong>, and most symmetric key algorithms, are safe to quantum computing attack.</p>
<p>For hashing <strong>Merkles Hash Tree</strong> isn’t broken.</p>
<p>And for classic public key cryptography <strong>Lattice based algorithms</strong> are safe.</p>
<p>But, there are a lot of important algorithms that don’t work. <strong>RSA</strong>, <strong>DSA</strong>, and
<strong>ECDSA</strong> are all broken with quantum computing.</p>
<h2 id="shors-algorithm">Shor’s Algorithm</h2>
<p>So what does quantum computing do that breaks algorithms like <strong>RSA</strong>?</p>
<p>Well there is an algorithm called <strong>Shor’s Algorithm</strong> created by a man named
Peter Shor. This algorithm simply finds the prime factors of integers. So
given some integer N, it figures out which prime numbers multiply to equal N.</p>
<p>Cryptographic algorithms usually boil down to some difficult mathematical problem.
And <strong>RSA</strong>, <strong>DSA</strong>, and <strong>ECDSA</strong> boil down to the fact that it is difficult to factorize
large integers.</p>
<p>So it is clear that if there is an algorithm that can factorize these large numbers
fast, the cryptographic algorithms built on this being difficult break down.</p>
<p>One other quantum algorithm that can act on those cryptographic algorithms not effected
by Shor’s, is Grover’s algorithm. But this algorithm is not as fast as Shor’s and can be
defended against by increasing key size.</p>
<h2 id="hash-based-signature-algorithms">Hash Based Signature Algorithms</h2>
<p>These types of algorithms are signature algorithms built upon hash function.</p>
<p>A hash function takes some data and maps it to some data of a fixed size. The most popular
algorithm is SHA-256.</p>
<p>These hash functions are generally regarded as safe to quantum attack, and will likely
not undergo large changes to make them secure. Most common defence is to increase the key
size, which basically means you use the same algorithm, but increase the computation time
required to find the key.</p>
<p>If you map data to a string of bits of length <strong>n</strong>, doubling the key size would mean you
map to a string of bits of length <strong>2n</strong>, meaning the chances of working backwards if much
harder, because there are much more possible mappings.</p>
<h2 id="code-based-algorithms">Code Based Algorithms</h2>
<p>A common code based algorithm is <strong>McEliece cryptosystem</strong>. This algorithm is based on the
difficulty of decoding a general linear code. A linear code is a error-correcting code, and
is used for forward error correction when transmitting data.</p>
<p>So the same algorithms that are used to correct errors that are introduced into data, can be
used to “scramble”/encrypt that data, and “decode”/decrypt that message.</p>
<p>These algorithms aren’t used that often, but they actually have faster encryption/decryption
times than RSA, and are considered secure to quantum attack, so they may become quite popular
in the near future.</p>
<h2 id="lattice-based-algorithms">Lattice Based Algorithms</h2>
<p>Lattice based cryptography is currently the leading candidate for post quantum cryptography.
These algorithms are based on the lattice problem.</p>
<p>A lattice is a set of points in n-dimensional space with periodic structure. These are
constructed with some basis vectors, and every point is some combination of addition of
these basis vectors.</p>
<p>The <strong>shortest vector problem (SVP)</strong> involves finding the shortest vector between two points
in the lattice.</p>
<p>Currently there are no known quantum algorithms that can solve the SVP in polynomial time,
so lattice based cryptography is considered safe to quantum attack.</p>
<h2 id="further-reading">Further Reading</h2>
<ul>
<li><a href="https://pqcrypto.org/www.springer.com/cda/content/document/cda_downloaddocument/9783540887010-c1.pdf">Beginning Paper on Post Quantum Cryptography</a></li>
<li><a href="http://nvlpubs.nist.gov/nistpubs/ir/2016/NIST.IR.8105.pdf">NIST report on Post Quantum Cryptography</a></li>
<li><a href="https://cims.nyu.edu/~regev/papers/pqc.pdf">Lattic Based Cryptography</a></li>
</ul>Matthew RosemanIntroReed-Solomon Codes2017-08-07T19:45:00+00:002017-08-07T19:45:00+00:00https://mroseman95.github.io/reed-solomon<h2 id="intro">Intro</h2>
<p>Reed-Solomon Codes are a form of error correcting codes created by Irving Reed and Gustave Solomon at MIT Lincoln Lab.</p>
<p>Reed-Solomon Codes are in a family of BCH algorithms that use finite fields and polynomial structures to process message data and
detect errors.</p>
<h3 id="contents">Contents</h3>
<ul>
<li><a href="#overview">Overview</a></li>
<li><a href="#galois-fields">Galois Fields</a></li>
<li><a href="#encoding">Encoding</a></li>
<li><a href="#decoding">Decoding</a></li>
</ul>
<hr />
<h2 id="overview">Overview</h2>
<p>The basic structure of these codes involves taking a message and splitting it up into <strong>Code Words</strong></p>
<p><img src="https://mroseman95.github.io/assets/images/reed_solomon_code_word.gif" alt="Code Word" /></p>
<p>A code word includes the original message you want to send and the parity bits added on at the end of it.</p>
<p>The data in the code word is broken up further into what are called <strong>symbols</strong> or <strong>characters</strong>. These are <em>s</em> bits
long, usually 8 bits.</p>
<p>The entire code word is <em>n</em> symbols long, while the original data is <em>k</em> symbols, and the pairity is <em>2t</em> symbols</p>
<p>The Reed-Solomon algorithm can correct up to <em>2t</em> erasures in the data, or up to <em>t</em> errors. <strong>Erasures</strong> are like
errors, but where the location is known. Think of a QR code where part of the code is covered by something; you know
that the data isn’t correct there before you send it. <strong>Errors</strong> are just mistakes in the data by some unknown
maginitude and at an unkown location.</p>
<h2 id="galois-fields">Galois Fields</h2>
<p>Galois Field arithmetic is very important to the Reed-Solomon algorithms. All operations done are done in a Galois
field.</p>
<p>The reason this is done is we can do addition, subtraction, multiplication, and division on binary numbers of length <em>s</em> and
always get back binary numbers of length <em>s</em>. In other words the numbers won’t ever be larger than <em>s</em> bits in length.</p>
<p>To create a Galois Field with integers we can just do normal addition and multiplication operations and just mod by some prime
number to wrap them around.</p>
<p>Say x and y are integers and p is some prime number…</p>
<ul>
<li>x + y mod p is an integer</li>
<li>x - y mod p is an integer</li>
<li>x * y mod p is an integer</li>
<li>x / y mod p is an integer</li>
</ul>
<p>The division point is the interesting one. By using a prime number as the modulus, we ensure that for every pair of
integers x and y, there is some integer z such that y * z = x, so x / y = z will always have an answer.</p>
<p>So what we want to do is take this Galois Field and apply it to binary numbers. Lets say our goal is to have 8 bit
binary numbers, and to create a Galois Field so any operation will give us 8 bit binary numbers. This type of Galois
Field is represented as GF(2^8), where 2 is the <strong>characteristic</strong> of the field and 8 is the <strong>exponent</strong>.</p>
<p>For all the operations it may be benefitial to represent the binary number as a polynomial. This is done by treating
each bit as a coefficient in a polynomial.</p>
<p><img src="https://mroseman95.github.io/assets/images/galois_field_polynomial_01.png" alt="Galois Field Polynomial" /></p>
<h3 id="addition">Addition</h3>
<p>say we want to add 5 + 6, or 0101 + 0110</p>
<p><img src="https://mroseman95.github.io/assets/images/galois_field_addition_01.png" alt="Galois Field Addition" /></p>
<p>Since we are dealing with binary numbers, the coefficients of the polynomials are always modulo 2, so the 2 becomes a 0</p>
<p>An efficient way to do this in binary is to just XOR the two numbers</p>
<figure class="highlight"><pre><code class="language-raw" data-lang="raw"> 0101
XOR 0110
--------
0011</code></pre></figure>
<h3 id="subtraction">Subtraction</h3>
<p>Because we mod the coefficients by 2, -1 = 1, and subtractions is the same as addition.</p>
<p><img src="https://mroseman95.github.io/assets/images/galois_field_subtraction_01.png" alt="Galois Field Subtraction" /></p>
<h3 id="multiplication">Multiplication</h3>
<p>For multiplication, we can continue converting to polynomials do the operation then convert back.</p>
<p><img src="https://mroseman95.github.io/assets/images/galois_field_multiplication_01.png" alt="Galois Field Multiplication" /></p>
<p>Or we can use binary multiplication, replacing the addition with XOR’s</p>
<figure class="highlight"><pre><code class="language-raw" data-lang="raw"> 10001001
x 00101010
----------
10001001
10001001
XOR 10001001
------------------
1010001111010
1010001111010 = x^12 + x^10 + x^6 + x^5 + x^4 + x^3 + x</code></pre></figure>
<p>Now we have a problem because this binary number is larger than 8 bits.</p>
<p>With the integers we modded the numbers by some prime, so we will have to find the equivalent for polynomials.
We need an <strong>irreducible polynomial</strong> or <strong>primitive polynomial</strong> to serve as our mod number.</p>
<p>We will use 100011101 as this number. So we need to divide the polynomial by this number and find the remainder</p>
<p>An easy way to do this division is to line up the primitive polynomial with the number being reduced so the msb’s line
up, and XOR. Then repeat until the number is less than 9 bits.</p>
<figure class="highlight"><pre><code class="language-raw" data-lang="raw"> 1010001111010
XOR 100011101
-----------------
0010110101010
XOR 100011101
---------------
00111011110
XOR 100011101
-------------
011000011
1010001111010 mod 100011101 = 011000011</code></pre></figure>
<p>There is actually a more efficient way to multiplication in Galois Fields.</p>
<p>The simplest operation is to multiply by 2, since the numbers are in binary, you just shift the bits left by 1, and
for any power of 2 you shift the bits left by that power.</p>
<p>We can create some <strong>generator number</strong> (α = 00000010) that is equal to 2 in the Galois Field, and find all the powers
of it still in the field.</p>
<p><img src="https://mroseman95.github.io/assets/images/galois_field_multiplication_02.png" alt="Galois Field Alpha Powers" /></p>
<p>If we find all the powers of this generator number from 0 to 255, or whatever the maximum power for the field is, we
will get all the numbers in the field.</p>
<p>In other words, every number in the Galois Field can be represented as some power of the generator number.</p>
<p>Then, if we convert the numbers before multiplying, it becomes much simpler.</p>
<p><img src="https://mroseman95.github.io/assets/images/galois_field_multiplication_03.png" alt="Galois Field Alpha Power Substitution" /></p>
<p>So if we just keep a table of these α powers, we can do a quick lookup, some simple addition of the logarithm of the
numbers with α as a base, and easy multiplication.</p>
<h3 id="division">Division</h3>
<p>Using the α trick explained in multiplication, division is as simple as subtracting the logarithms base α instead of
adding.</p>
<p><img src="https://mroseman95.github.io/assets/images/galois_field_division_01.png" alt="Galois Field Division" /></p>
<h2 id="encoding">Encoding</h2>
<p>Reed-Solomon encoding is very simple.</p>
<p>We take some message M, and break it into symbols, where if we use the Galois Field GF(2^8), the symbols are 8 bits
long. So M becomes M = (M1, M2, M3, …, Mk) and Mi is a symbol. Then we can create a polynomial from these symbols.</p>
<p><img src="https://mroseman95.github.io/assets/images/rs_encoding_01.png" alt="Message to Polynomial Form" /></p>
<p>It is important to realize that this polynomial is not the same as the previous ones. Each coefficient, Mi, of this
polynomial is some element of the Galois Field. From this point on we will be working with these polynomials where the
coefficients are elements of the Galois Field, so don’t confuse them with the ones used in Galois Field arithmetic.</p>
<p>Now we need a <strong>Generator Polynomial</strong>. Again this is different from the generator number mentioned previously, but it is
built from that number. The generator polynomial is a irreducible polynomial, with roots that are powers of α.</p>
<p><img src="https://mroseman95.github.io/assets/images/rs_encoding_02.png" alt="Generator Polynomial" /></p>
<p>2t is the number of parity symbols you want.</p>
<p>Now we take our message represented as a polynomial, and mod it by the generator polynomial, and we get our parity
bits. We just simply tack them on at the end of the message.</p>
<p>Usually you may want to add some buffer bits to the message so that it is a certain length. You can either just add on
0’s, and either leave them when you send the message, or remove them and the receiver can tack them on before working
with the message.</p>
<h2 id="decoding">Decoding</h2>
<p>There are some terms I need to define before going into the math of decoding the message.</p>
<ul>
<li>R(x) the received message polynomial (including parity bits)</li>
<li>T(x) the uncorrupted sent message polynomial (including parity bits)</li>
<li>E(x) the errors in the received message polynomial</li>
</ul>
<p>These polynomials are just like the ones described in encryption, where the data is split up and each symbol is a
coefficient of the polynomial.</p>
<p>These polynomials relate such that…</p>
<p><img src="https://mroseman95.github.io/assets/images/rs_decoding_01.png" alt="Decoding Polynomials" /></p>
<p>Ei is an <em>s</em> bit error value, and the power of the x determines the position this error happened at.</p>
<h3 id="syndromes">Syndromes</h3>
<p>We know that T(x) is divisible by the generator polynomial g(x), since we added the remainder of M(x) / g(x) to it.</p>
<p>This makes checking for errors very simple, just evaluate R(x) at each zero of g(x) and see if it is also a zero of
R(x). Because R(x) is divisible by g(x) the zeros of g(x) must be zeros of R(x).</p>
<p>The zeros of g(x) are also very easy to find since it is constructed so that the zeros are α^i for 1<=i<=2t</p>
<p>When we evaluate R(x) at each power of α, we get what is called a <strong>Syndrome</strong>.</p>
<p><img src="https://mroseman95.github.io/assets/images/syndromes_01.png" alt="Syndrome Value" /></p>
<p>We also know that T(x) will evaluate 0 for every zero of g(x) so we can simplify the equation…</p>
<p><img src="https://mroseman95.github.io/assets/images/syndromes_02.png" alt="Syndrome Value Simplified" /></p>
<p>So syndrome values are only dependent on the Error polynomial E(x)</p>
<p>Let’s say that errors occur only at locations (e1, e2,…,ev) were ei corresponds to the power of x where the error is.
Then we can rewrite E(x) to only include these locations.</p>
<p><img src="https://mroseman95.github.io/assets/images/syndromes_03.png" alt="Error Polynomial" /></p>
<p>Where Y1, Y2,…, Yv represent the error values at these locations.</p>
<p>If we evaluate α^i in this new E(x) we get</p>
<p><img src="https://mroseman95.github.io/assets/images/syndromes_04.png" alt="Error Polynomial evaluating alpha" /></p>
<p>And to simplify we can define</p>
<p><img src="https://mroseman95.github.io/assets/images/syndromes_05.png" alt="Error Polynomial evaluating alpha simplified" /></p>
<p>We can take these functions and create a matrix defining each Syndrome based on the error locations and error values</p>
<p><img src="https://mroseman95.github.io/assets/images/syndrome_matrix_01.png" alt="Syndrome Matrix" /></p>
<p>So we already know the values of the syndromes (S1, S2,…,S2t), and if we find the values of the error locations
(X1, X2,…,Xv), we can solve for the error weights, which are (Y1, Y2,…,Yv).</p>
<h3 id="error-locator-polynomial">Error Locator Polynomial</h3>
<p>The <strong>Error Locator Polynomial</strong> is a polynomial where the roots are the location of the errors in R(x).</p>
<p><img src="https://mroseman95.github.io/assets/images/error_locator_polynomial_04.png" alt="Error Locator Polynomial" /></p>
<p>The goal then is to find the coefficients Λ1, Λ2,…, Λv.</p>
<p>We know that Xj^-1 is a zero of Λ(x). So we plug in Xj^-1 and get this function.</p>
<p><img src="https://mroseman95.github.io/assets/images/error_locator_polynomial_01.png" alt="Error Locator Polynomial zero" /></p>
<p>Then we multiply both sides by YjXj^i+v. We can also find the summation of this function for all j’s, which will still
be zero.</p>
<p>The point of doing this is, we can substitute the syndromes into the equation, which we already know.</p>
<p><img src="https://mroseman95.github.io/assets/images/error_locator_polynomial_02.png" alt="Error Locator Polynomial syndrome substitution" /></p>
<p>Now we can convert this whole thing into matrix representation.</p>
<p><img src="https://mroseman95.github.io/assets/images/error_locator_polynomial_03.png" alt="Error Locator Polynomial matrix equation" /></p>
<p>There is one problem with this, which is we may not know the number of errors that are in the message. In other words,
we don’t know v.</p>
<p>We can find by with <strong>Berlekamp’s Algorithm</strong>.</p>
<p>This algorithm starts with Λ(x) = 1, and at each stage an error value is calculated by substituting the approximate
coefficients for that value of v.</p>
<p>To start we have two functions, the syndromes, and two parameters.</p>
<ul>
<li>Λ(x) is the error locator polynomial</li>
<li>C(x) is the correction polynomial</li>
<li>S1, S2, … S2t are the syndromes</li>
<li>K is the step parameter</li>
<li>L is the parameter used to track the order of equations</li>
</ul>
<p>We initialize with
K = 1 L = 0 Λ(x) = 1 C(x) = x</p>
<p>then we calculate the error value e</p>
<p><img src="https://mroseman95.github.io/assets/images/berlekamp_algorithm_01.png" alt="Berlekamp's Algorithm" /></p>
<p>What this algorithm does is it starts with a potential Λ(x) which is C(x) and L as the order of C(x), and over time
increases L so that</p>
<p><img src="https://mroseman95.github.io/assets/images/berlekamp_algorithm_02.png" alt="Berlekamp's Algorithm Goal" /></p>
<p>And it tries to find the smallest L possible.</p>
<p>So if we have error’s instead of erasures (meaning we don’t know how many or where they are) we can use Berlekamp’s
Algorithm to find v, and then construct the matrix equation shown above in order to solve for Λ1, Λ2,…, Λv.</p>
<p>Then once we have the error locator polynomial coefficients, we can solve for it’s roots, X1^-1, X2^-2,…, Xv^-1.</p>
<p>Then going all the way back to the first matrix equation that looked like this.</p>
<p><img src="https://mroseman95.github.io/assets/images/syndrome_matrix_01.png" alt="Syndrome Matrix" /></p>
<p>We have everything but Y1, Y2,…, Yv. We can simply use matrix inverse to solve for these though.</p>
<p>Plugging every thing together, we can solve for E(x)</p>
<p><img src="https://mroseman95.github.io/assets/images/error_polynomial_01.png" alt="E(x) polynomial" /></p>
<p>Then we simply subtract E(x) from R(x) and get T(x), the original message.</p>
<h3 id="farneys-algorithm">Farney’s Algorithm</h3>
<p>The <strong>Farney Algorithm</strong> is a more efficient way to solve for Y1, Y2,…, Yv. It does not require calculating the matrix
inverse.</p>
<p>To do it we need an additional polynomial called the <strong>Error Magnitude Polynomial</strong> Ω(x).</p>
<p><img src="https://mroseman95.github.io/assets/images/farney_algorithm_01.png" alt="Error Magnitude Polynomial" /></p>
<p>We mod by x^2t in order to truncate the magnitude polynomial to the last 2t elements. S(x) is the <strong>Syndrome
Polynomial</strong> where the coefficients are the Syndromes for R(x).</p>
<p>The derivative of the Error Locator Polynomial evaluated at Xj^-1 is different than the normal derivative, because we are working with
Galois Field polynomials.</p>
<p><img src="https://mroseman95.github.io/assets/images/farney_algorithm_02.png" alt="Error Locator Polynomial Derivative" /></p>
<p>We can simply set even powered terms to 0 and divide through by Xj^-1.</p>
<p>With this derivative and the error magnitude polynomial, we can solve for the error values Y1, Y2,…, Yv, with this
equation.</p>
<p><img src="https://mroseman95.github.io/assets/images/farney_algorithm_03.png" alt="Farneys Algorithm" /></p>
<p>This equation won’t work for position’s that don’t have errors, so we need to still find these positions first, but this
equation is faster than the matrix inverse we would have to do otherwise.</p>Matthew RosemanIntroAlternatives to Proof of Work: Part 22017-06-12T14:30:00+00:002017-06-12T14:30:00+00:00https://mroseman95.github.io/blockchain-consensus-02<hr />
<h2 id="proof-of-stake">Proof of Stake</h2>
<h3 id="overview">Overview</h3>
<p>Unlike proof of work where users must compete each other for the chance to approve a block, proof of stake pseudo
randomly chooses the validators. Each member of the network has some measure of stake in the network, and then ideally
that person is chosen to validate a proportionate amount to their stake. So if you own 10 percent of a networks
currency, you should validate 10 percent of its transactions.</p>
<p>Usually in proof of stake systems there is a set number of coins in existance, and the validators that forge (called
forged or minted instead of mined in proof of stake systems) blocks only get the transaction fees. There is a much lower
cost to validating blocks, so the lower payoff is not a detriment to the system.</p>
<p>The measure of a validators stake in the network isn’t always as simple as how much value they hold. In order to prevent
the richest member from having all the power some other factors are considered. One factor is age of the coins held, so
the longer a validator holds on to coins the more stake they have in the system. Usually coin age would be reset after
the validator is chosen to validate a block.</p>
<p>One downside to proof of stake, is that validators don’t really lose anything by confirming blocks. There is no
computational energy consumed, so there is no reason not to try and validate every current fork. This is called the
<strong>nothing at stake</strong> problem.</p>
<p><img src="https://mroseman95.github.io/assets/images/nothing-at-stake.png" alt="nothing at stake" /></p>
<figcaption class="caption">Left fork has probability of 0.9 of becoming the main fork, and right has 0.1</figcaption>
<p>This can prevent a blockchain from ever theoretically reaching consensus. In proof of work, a validator has incentive to
put their energy towards the fork that is has the most blocks and is most likely to become the true fork and give them a
reward. If they split their energy they have a lot less likely chance of finding the correct hash and earning the
bitcoins.</p>
<p>One solution is to penalize validators for validating conflicting blocks by taking away some money. This then makes it
economicaly sensible to only work on one chain, and the chain with the highest probability makes the most sense.</p>
<p>Another solution is to penalize validators for validating on the wrong chain. So like proof of work, it only makes sense
to validate blocks on the chain with highest probability of succeeding.</p>
<h3 id="pros">Pros</h3>
<ul>
<li>Does not consume large amounts of electricity to operate.</li>
<li>Because of lower energy cost to validate, reward can be much lower.</li>
<li>Reduced centralization risk. If you have 20x the amount of currency then you should get 20x the amount of revenue from
validating blocks. In proof of work you could exponentially scale up by buying new ASIC machines and growing your mining
farm.</li>
<li>Those validating transactions are also those who own coins, so they may have more incentive to be honest.</li>
</ul>
<h3 id="cons">Cons</h3>
<ul>
<li>While nothing at stake has some proposed solutions, there are still arguments that attacks are possible, and that
security is lower in proof of stake systems.</li>
<li>There are greater barrier of entries to forging in proof of stake systems, since you must buy some amount of coins in
order to participate.</li>
</ul>
<h2 id="proof-of-burn">Proof of Burn</h2>
<h3 id="overview-1">Overview</h3>
<p>This system’s cost to verify is to <em>“burn”</em> coins. Some amount of coins must be sent to some account confirmed to not be
spendeble from, and then they are able to verify a block. Like proof of work some asset is being consumed, but in this
system there is no cost of electricity.</p>
<p>Proof of Burn is designed to burn a cryptocurrency of another system, usually a proof of work system. This can be used
as a way to incentivize a transition from one cryptosystem to another. Although this means that the ultimate source of
the burned assets is still electricity.</p>
<p>It makes sense to require the burning of some cryptocurrency to occur some time before the transaction validation time.
Since blockchains can fork and aren’t always set in stone right away, it makes sense to require a validator to have
their burning transaction set in stone before they get the right to validate transactions.</p>
<p>The idea of verifiably unspendable addresses is interesting though. A bitcoin address is a representation of a public
key gotten through a hashing algorithm. There are valid and invalid addresses, and it is likely that an unspendable
address is still valid, as it may be difficult to send coins to an invalid address. So the main idea is to take an
invalid public key and convert it into the valid address. Now money can be sent to this address no problem, since no
client can determine that it was created with an invalid public key, however, no coins can be transfered out of this
address because doing so requires a signature with a private key. This is impossible because the public and private keys
are invalid.</p>
<p>A validator must also publish the public key number they used, in order for others to find the cooresponding bitcoin
address and confirm that it is indead invalid.</p>
<p>Usually people will use a <a href="https://en.wikipedia.org/wiki/Nothing_up_my_sleeve_number"><em>nothing up my sleeve number</em></a> as
their invalid public key. These numbers are something like all 0’s or the first n digits of pi. They are special in that
they are publicly recognizable, and their purpose is to assure the public that you aren’t trying to game the system with
some special number with some property that allows you to cheat.</p>
<h3 id="pros-1">Pros</h3>
<ul>
<li>This system can be used to jumpstart a new cryptocurrency off of another.</li>
<li>There is no imediate consumption of energy to validate blocks, and theoretically it could burn coins from a non
proof of work system, although this has not been done.</li>
</ul>
<h3 id="cons-1">Cons</h3>
<ul>
<li>One could argue that this system is just proof of work with extra steps. It burns a proof of work cryptocurrency, wich
means that the work going into mining this currency still occurs.</li>
</ul>matthewrosemanAlternatives to Proof of Work: Part 12017-06-11T16:15:00+00:002017-06-11T16:15:00+00:00https://mroseman95.github.io/blockchain-consensus-01<hr />
<h2 id="what-is-blockchain-consensus">What is Blockchain Consensus</h2>
<p>A blockchain can be thought of as a community driven ledger for a cryptocurrency. It logs every transaction and keeps
track of who has how many “coins”. These ledgers are broken into blocks, each block pointing back to the previous one.
As of March 29, 2017 <a href="https://blockchain.info/charts/n-transactions-per-block">the average number of transactions per block is about
2000</a>.</p>
<p><img src="https://mroseman95.github.io/assets/images/bitcoin-block-chain-small.png" alt="blockchain" /></p>
<p>Now these blocks, before being commited to the chain, must be approved. Since there is no central authority, this must
be done through a group consensus. Obviously there must be some way to hold people accountable to checking transactions
honostly, and not trying to approve bad or perhaps malicious transactions.</p>
<p>As an example scenario without some checks and balances:</p>
<ul>
<li>Alice has 1 btc</li>
<li>Alice buys something from Bob and gives him that 1 btc</li>
<li>Quickly Alice also buys something from Eve and gives her that 1 btc</li>
<li>Alice then approves the block that these transactions are both a part of, and she successfully spends her 1btc 2
times</li>
</ul>
<p>The above is an example of what’s called <a href="https://en.wikipedia.org/wiki/Double-spending">Double Spending</a>.</p>
<p>A naive solution to this problem is to require a certain number of people to approve blocks. Although an easy attack
would be to create many false identities, all of whom approve the block your transaction is in. This is called the
<a href="https://en.wikipedia.org/wiki/Sybil_attack">Sybil Attack</a>.</p>
<p>Many of the following consensus protocols are based off of this naive solution, but they add on a cost to approving
blocks, so that a single person can’t freely create new identities. With this cost there must be some kind of award, in
order to incentivize people to approve blocks. This reward is usually some of the networks cryptocurrencies, either
taken from transaction fees, or created from scratch.</p>
<hr />
<h2 id="proof-of-work">Proof of Work</h2>
<h3 id="overview">Overview</h3>
<p>One method to prevent users from using multiple fake identities on the network in order to approve their bad
transactions is to include some sort of cost to approving. The basic idea of proof of work is to force users to do
some amount of computational work before they can sign off on a block. A malicious user can easily fake multiple
identities, but they cannot fake computational work.</p>
<p>The basic proof of work algorithm bitcoin uses is based off of a similar algorithm called
<a href="ftp://sunsite.icm.edu.pl/site/replay.old/programs/hashcash/hashcash.pdf">hashcash</a>. Some key elements of both bitcoins
algorithm and the hashcash algorithm is that they are:</p>
<ol>
<li><strong>publicly auditable</strong> - anyone can check the result of the proof of work to see that it is correct</li>
<li><strong>non-interactive</strong> - the server doesn’t need to issue some challenge to the user. The user picks the challenge
themselves</li>
<li><strong>trapdoor free</strong> - there is no known solution to the challenge beforehand</li>
<li><strong>unbounded probabalistic cost</strong> - It could theoretically take forever to solve the challenge</li>
</ol>
<p>Bitcoin’s proof of work algorithm is based on the <a href="https://en.wikipedia.org/wiki/SHA-2">SHA-256</a> hashing algorithm. You
are given a block of transactions, and after making sure that every transaction is possible given the previous blocks,
you create a header for this block. This header consists of…</p>
<ol>
<li><strong>Version</strong></li>
<li><strong>Hash of previous block’s header</strong></li>
<li><strong>Hash of all transactions in current block</strong></li>
<li><strong>Current timestamp</strong></li>
<li><strong>Current target</strong> (explained later)</li>
<li><strong>Nonce</strong> (explained later)</li>
</ol>
<p>Now your goal is to find a hash of this header, that is less than a certain target number. The only values that you can
change is the 32 bit nonce at the end. Normally you would start with a nonce of 0 and increment every try, but it
doesn’t really matter.</p>
<p>Since SHA-256 is a one way function, meaning you can’t work backwards from a hash to a particular starting number, and
there is no way to predict what you are going to get until you calculate it, the only way to try and find a nonce that
produces a hash below the target is by brute force. Once you do find a correct nonce that produces a hash below the
target number, you can broadcast the block with the correct header, and receive your payment.</p>
<p>There must be some way to incetivize people to confirm that transactions are valid, and this is done by paying them in
bitcoin every time they successfuly confirm a block. These bitcoins come from transaction fees, and are created out of
thin air. This is how the network of bitcoin grows.</p>
<p>The target value is a result of an algorithm based on previous blocks time to mine, and the goal is to mine a block
every 10 minutes. So as computers get more powerful the target number can get smaller and smaller, making a successful
hash harder and harder to find.</p>
<h3 id="pros">Pros</h3>
<ul>
<li>So far proof of work has been the best method for preventing attacks on the blockchain, and many other algorithms
build off of proof of work, keeping the core details.</li>
<li>Proof of work is easily paralleled, meaning many people can work together at solving the hashing problems. This means
someone with not as many resources is able to participate.</li>
</ul>
<h3 id="cons">Cons</h3>
<ul>
<li>With ASIC’s (Application Specific Integrated Circuits) the cost of mining bitcoin is only viable with special
machines specifically designed to perform SHA256 hash calculations and nothing else. This increases the barrier of
entry into mining bitcoins.</li>
<li>Bitcoin mining has a very large energy footprint. There is a debate on whether the amount of
computational power going into mining bitcoin is a waste, some say there is nothing useful created from these hash
computations, while others say that the defence against Sybil attacks is something useful.</li>
</ul>matthewroseman