More than 2000+ questions for preparing a Data Engineer interview.

Full list of questions

Interview questions for Data Engineer

Databases and Data Warehouses
GitHub Repo	Official page	Questions	Description	Useful links
		Apache Cassandra	Cassandra is a distributed, wide-column store, NoSQL database management system.	Awesome Cassandra
		Greenplum	Greenplum is a big data technology based on MPP architecture and the Postgres open source database technology.	Awesome Greenplum
		MongoDB	MongoDB is a document-oriented database.	Awesome MongoDB
		Apache Hbase	HBase is an open-source non-relational distributed database.	Awesome HBase
		Apache Hive	Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data query and analysis.	Awesome Hive
		Amazon DynamoDB	Amazon DynamoDB is a fully managed proprietary NoSQL database service.	Awesome DynamoDB Awesome AWS
		Amazon Redshift	Amazon Redshift is a data warehouse product.	Amazon Redshift Utilities Awesome AWS
		BigQuery GCP	BigQuery is a fully-managed, serverless data warehouse.	Awesome BigQuery
		Bigtable GCP	Bigtable is a fully managed wide-column and key-value NoSQL database service.	Awesome Bigtable

Data Formats
		Apache Avro	Avro is a row-oriented remote procedure call and data serialization framework.	Awesome Avro
		Apache Parquet	Apache Parquet is a column-oriented data file format designed for efficient data storage and retrieval.	TODO
		Delta	Delta Lake is a storage framework that enables building a Lakehouse architecture with compute engines	Delta examples

Big Data Frameworks
		Apache Airflow	Apache Airflow is a workflow management platform for data engineering pipelines.	Awesome Airflow
		Apache Flume	Apache Flume is a distributed, reliable, and available software for efficiently collecting, aggregating, and moving large amounts of log data.	TODO
		Apache Hadoop	Apache Hadoop is a collection of software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation.	Awesome Hadoop
		Apache Impala	Apache Impala is a parallel processing SQL query engine for data stored in a computer cluster running Apache Hadoop.	TODO
		Apache Kafka	Apache Kafka is a distributed event store and stream-processing platform.	Awesome Kafka
		Apache NiFi	Apache NiFi is a software project designed to automate the flow of data between software systems.	Awesome NiFi
		Apache Spark	Apache Spark is unified analytics engine for large-scale data processing.	Awesome Spark
		Apache Flink	Apache Flink is unified stream-processing and batch-processing framework.	Awesome Flink
		Kubernetes	Kubernetes is a system for managing containerized applications across multiple hosts.	Awesome Kubernetes

Cloud providers
		Amazon Web Services	Amazon web service is an online platform that provides scalable and cost-effective cloud computing solutions.	Awesome AWS
		Microsoft Azure	Microsoft Azure is Microsoft's public cloud computing platform.	Awesome Azure
		Google Cloud Platform	Google Cloud Platform is a suite of cloud computing services.	Awesome GCP

Theory
		DWH Architectures	A data warehouse architecture is a method of defining the overall architecture of data communication processing and presentation that exist for end-clients computing within the enterprise.	Awesome databases
		Data Structures	A data structure is a specialized format for organizing, processing, retrieving and storing data.	TODO
		SQL	SQL is a domain-specific language used in programming and designed for managing data held in a relational database management system (RDBMS).	Awesome SQL

Data visualization tools/BI
		Tableau	Tableau is a powerful data visualization tool used in the Business Intelligence.	TODO
		Looker	Looker is an enterprise platform for BI, data applications, and embedded analytics that helps you explore and share insights in real time.	TODO
			Apache Superset	Superset is a modern data exploration and data visualization platform	TODO

Name	Name	Last commit message	Last commit date
Latest commit gouravchawla334 Gouravchawla334 patch 1 (#18 ) Jan 26, 2025 b296248 · Jan 26, 2025 History 19 Commits
content	content	Gouravchawla334 patch 1 (#18 )	Jan 26, 2025
img	img	add missing images from readme (#16 )	Aug 13, 2024
.gitignore	.gitignore	init	Jul 27, 2021
README.md	README.md	Feature/next updates (#6 )	Jul 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

More than 2000+ questions for preparing a Data Engineer interview.

Full list of questions

Interview questions for Data Engineer

Contribution

Please contribute to this repository to help it make better. Any change like new question, code improvement, doc improvement etc is very welcome.

About

Releases

Packages

Contributors 7

OBenner/data-engineering-interview-questions

Folders and files

Latest commit

History

Repository files navigation

More than 2000+ questions for preparing a Data Engineer interview.

Full list of questions

Interview questions for Data Engineer

Contribution

Please contribute to this repository to help it make better. Any change like new question, code improvement, doc improvement etc is very welcome.

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Packages