Update: hacked together this, more details to follow later…
In theory, Postgres should be able to verify transactions and blocks, as well as do a lot of other things that are currently only done by full nodes. For this to be performant, it will most likely require an extension written in C, but I’m curious how far we can get with bare bones Postgres.
More importantly, would that actually be useful? A node is really just a database, a very efficient one for a very specific purpose, but would leveraging the full power of Postgres be somehow more beneficial than just running Bitcoin-Qt or btcd, for example?
To get to the bottom of this would be a lot of work, and potentially a lot of fun. It would also be a great blockchain learning exercise. (If you’re working on a PG extension for Bitcoin or more generally blockchain, please do let me know!)
Random Thoughts
The structure of the Bitcoin blockchain is relatively simple. We have transactions, which in turn have inputs and outputs and belong to blocks. Four tables, that’s it.
I’ve been able to import the whole blockchain with some fairly basic Go code into my old Thinkpad running Linux overnight. The Go code needs some more polishing and is probably worthy of a separate write up, so I won’t get into it for now. Below is the schema I used. I intentionally left out referential integrity and indexes to keep it simple and avoid premature optimization.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
|
There are a couple projects out there that keep the blockchain in a database, most notably Abe. I haven’t studied the code very carefully, but my initial impression was that Abe tries to use standard SQL that would work across most big databases, which is philosophically different from my objective of going 100% Postgres and leveraging all that it can do for us.
Bitcoin uses a lot of uint32’s. A Postgres INT is the correct size, but it is signed, which means we have to use the next larger type, BIGINT. It seems like it might be a waste to use 64 bits for a 32-bit value, but I couldn’t think of anything better than a BIGINT. For the binary stuff it seems like BYTEA is the best match.
So what can we do with this? There is no easy way to create or verify an Elliptic Curve signature in Postgres, but with the help of the pgcrypto extension, we should be able to at least generate the correct SHA256 digest which is used in the signature. As a side note, EC signature math is actually remarkably simple and could probably be implemented as a PG function, but I’m too lazy. Here it is in a few lines of Python.
The rules on how Bitcoin generates the hash (which is then signed) are slightly complicated, and that’s an understatement.
For the purposes of this exercise, I’d just be happy with a value that matches, even if the code does not fully comply with the Bitcoin rules.
One problem I ran into was that, because Bitcoin blockchain is little-endian except for where it isn’t, you often need a way to reverse bytes in a BYTEA. Strangely, Postgres does not provide a way to do that, unless I’m missing something. But thanks to stackoverflow, here is one way to do this:
1 2 3 4 5 6 |
|
We also have no way to render a Bitcoin varint, but we can fake it with some substringing for the time being.
Equipped with this, we can construct the following statement, sorry it’s a little long and I do not have the patience to explain it in writing.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
|
The particular transaction we are looking at is this. It happens to have id of 37898 in my database. In case you’re wondering, for this example I used a subset of the blockchain which only has the first 182,000 blocks. On the full blockchain and without indexes, this statement would have taken an eternity to execute.
What makes this particular transaction interesting is that it has two
inputs, which is slightly trickier, because to spend them, there need to
be two different signatures of the same transaction. This is because
before signing, the input scriptSig needs to be replaced with the
output’s scriptPubKey (the oversimplified version). This is reflected in the SQL
in the use of LATERAL
and CASE
.
You do not have to take my word that the two hashes are correct, we can verify them fairly easily with a bit of help from the Python ecdsa library. Here is the code to verify the second hash. The key and the signature are in the transaction itself.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
I hope this was fun! Now I wonder how hard it would be to make an extension to provide all the functionality required by Bitcoin….