Skip to content

Flink Datasource

Jörn Franke edited this page Nov 12, 2017 · 3 revisions

The Flink datasource allows reading of cryptoledgers, such as the Bitcoin Blockchain, using Apache Flink. It uses as a base the HadoopCryptoLedger library. The following modules are currently supported:

  • Bitcoin
    • BitcoinBlockFlinkInputFormat.java: Deserializes blocks containing transactions into Java-Object(s). Each record is an object of the class BitcoinBlock containing Transactions (class BitcoinTransaction). Best suitable if you want to have flexible analytics.
    • BitcoinRawBlockFlinkInputFormat.java: Each record is a byte array containing the raw bitcoin block data. The key (ie unique identifier) of the block is currently a byte array containing hashMerkleRoot and prevHashBlock (64 Byte). This is most suitable if you are only interested in a small part of the data and do not want to waste time on deserialization.
    • BitcoinTransactionFlinkInputFormat: Deserializes Bitcoin transactions into Java-Object(s). Each record is an object of class BitcoinTransaction. You can generate the hash of a transaction to link it with previous transactions using the BitcoinUtil.getTransactionHash method
  • Ethereum
    • EthereumBlockFlinkInputFormat.java: Deserializes blocks containing transactions into Java-Object(s). Each record is an object of the class EthereumBlock containing Transactions (class EthereumTransaction). Best suitable if you want to have flexible analytics.

Build

Note the Flink datasource is available on Maven Central and you do not need to build and publish it to a local Maven anymore to use it.

Execute:

git clone https://github.com/ZuInnoTe/hadoopcryptoledger.git hadoopcryptoledger

You can build the application by changing to the directory hadoopcryptoledger/flinkdatasource and using the following command:

../gradlew clean build publishToMavenLocal

Use

See also Useful Utility functions

Configure

The following configuration options exist when creating any of the input formats above:

  • Bitcoin
    • First parameter of the constructor: Maximum size of a Bitcoin block. Defaults (since version 1.0.7) to: 8M. If you see exceptions related to this in the log (e.g. due to changes in the Bitcoin blockchain) then increase this.
    • Second parameter of the constructor: A comma-separated list of valid magics to identify Bitcoin blocks in the blockchain data. Defaults to "F9BEB4D9" (Bitcoin main network). Other Possibilities are are (https://en.bitcoin.it/wiki/Protocol_documentation) F9BEB4D9 (Bitcoin main network), FABFB5DA (testnet) ,0B110907 (testnet3), F9BEB4FE (namecoin), FBC0B6DB (Litecoin), FCC1B7DC (Litecoin Testnet)
    • Third parameter of the constructor: If true then DirectByteBuffer instead of HeapByteBuffer will be used. This option is experimental and defaults to "false".
  • Ethereum
    • First parameter of the constructor: Maximum size of an Ethereum block. Defaults to: 1M. If you see exceptions related to this in the log (e.g. due to changes in the Ethereum blockchain) then increase this.
    • Second parameter of the constructor: If true then DirectByteBuffer instead of HeapByteBuffer will be used. This option is experimental and defaults to "false".
Clone this wiki locally