-
Notifications
You must be signed in to change notification settings - Fork 48
Flink Datasource
Jörn Franke edited this page Nov 12, 2017
·
3 revisions
The Flink datasource allows reading of cryptoledgers, such as the Bitcoin Blockchain, using Apache Flink. It uses as a base the HadoopCryptoLedger library. The following modules are currently supported:
- Bitcoin
- BitcoinBlockFlinkInputFormat.java: Deserializes blocks containing transactions into Java-Object(s). Each record is an object of the class BitcoinBlock containing Transactions (class BitcoinTransaction). Best suitable if you want to have flexible analytics.
- BitcoinRawBlockFlinkInputFormat.java: Each record is a byte array containing the raw bitcoin block data. The key (ie unique identifier) of the block is currently a byte array containing hashMerkleRoot and prevHashBlock (64 Byte). This is most suitable if you are only interested in a small part of the data and do not want to waste time on deserialization.
- BitcoinTransactionFlinkInputFormat: Deserializes Bitcoin transactions into Java-Object(s). Each record is an object of class BitcoinTransaction. You can generate the hash of a transaction to link it with previous transactions using the BitcoinUtil.getTransactionHash method
- Ethereum
- EthereumBlockFlinkInputFormat.java: Deserializes blocks containing transactions into Java-Object(s). Each record is an object of the class EthereumBlock containing Transactions (class EthereumTransaction). Best suitable if you want to have flexible analytics.
Note the Flink datasource is available on Maven Central and you do not need to build and publish it to a local Maven anymore to use it.
Execute:
git clone https://github.com/ZuInnoTe/hadoopcryptoledger.git hadoopcryptoledger
You can build the application by changing to the directory hadoopcryptoledger/flinkdatasource and using the following command:
../gradlew clean build publishToMavenLocal
- Analyzing the Bitcoin Blockchain with Apache Flink
- Analyzing the Ethereum Blockchain with Apache Flink
See also Useful Utility functions
The following configuration options exist when creating any of the input formats above:
- Bitcoin
- First parameter of the constructor: Maximum size of a Bitcoin block. Defaults (since version 1.0.7) to: 8M. If you see exceptions related to this in the log (e.g. due to changes in the Bitcoin blockchain) then increase this.
- Second parameter of the constructor: A comma-separated list of valid magics to identify Bitcoin blocks in the blockchain data. Defaults to "F9BEB4D9" (Bitcoin main network). Other Possibilities are are (https://en.bitcoin.it/wiki/Protocol_documentation) F9BEB4D9 (Bitcoin main network), FABFB5DA (testnet) ,0B110907 (testnet3), F9BEB4FE (namecoin), FBC0B6DB (Litecoin), FCC1B7DC (Litecoin Testnet)
- Third parameter of the constructor: If true then DirectByteBuffer instead of HeapByteBuffer will be used. This option is experimental and defaults to "false".
- Ethereum
- First parameter of the constructor: Maximum size of an Ethereum block. Defaults to: 1M. If you see exceptions related to this in the log (e.g. due to changes in the Ethereum blockchain) then increase this.
- Second parameter of the constructor: If true then DirectByteBuffer instead of HeapByteBuffer will be used. This option is experimental and defaults to "false".