DataHub PoC

This repository contains a Proof of Concept (PoC) for setting up and using DataHub, an open-source metadata platform for data discovery, management, and governance. The PoC demonstrates how to install and configure DataHub using Docker, load sample data, and optionally integrate with local Kafka and Airflow instances for data ingestion and processing.

How to use

Before you start, ensure that your Docker daemon is running. You can verify this by running:

docker --version

If Docker is not installed, follow the instructions on the Docker website to install it for your operating system.

Once Docker is installed and running, you can proceed with the steps below to set up and use DataHub.

Requirements

Docker
Python 3

Install

To install the required packages, run:

python3 -m pip install --upgrade -r requirements.txt

Start DataHub instance

datahub docker quickstart [--version TEXT (e.g. "v0.9.2")]

Load sample data

datahub docker ingest-sample-data

Start local Airflow

Go to the local_airflow directory and run:

docker-compose -f docker-compose.yml up -d

Start local Kafka (optional)

If you want to test Kafka ingestion to DataHub, go to the local_kafka directory and run:

docker-compose -f docker-compose.yml up -d

Ingest local Kafka (optional)

If you want to test Kafka ingestion to DataHub, go to the local_datahub/recipes directory and run:

datahub ingest -c kafka_test_recipe.dhub.yaml

Send Ethereum transactions to local Kafka (optional)

If you want to test Kafka ingestion to DataHub, go to the scripts directory and run:

python3 eth_tx.py

Useful scripts

List Kafka topics

docker exec kafka_test_broker \
kafka-topics --bootstrap-server kafka_test_broker:49816 \
             --list

Read messages from a topic

docker exec --interactive --tty kafka_test_broker \
kafka-console-consumer --bootstrap-server kafka_test_broker:49816 \
                       --topic transaction \
                       --from-beginning

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataHub PoC

How to use

Requirements

Install

Start DataHub instance

Load sample data

Start local Airflow

Start local Kafka (optional)

Ingest local Kafka (optional)

Send Ethereum transactions to local Kafka (optional)

Useful scripts

List Kafka topics

Read messages from a topic

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
local_airflow		local_airflow
local_datahub/recipes		local_datahub/recipes
local_kafka		local_kafka
scripts		scripts
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

kirqz23/datahub-poc

Folders and files

Latest commit

History

Repository files navigation

DataHub PoC

How to use

Requirements

Install

Start DataHub instance

Load sample data

Start local Airflow

Start local Kafka (optional)

Ingest local Kafka (optional)

Send Ethereum transactions to local Kafka (optional)

Useful scripts

List Kafka topics

Read messages from a topic

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages