Data Warehouse Architect Position at Sephora SEA

You can write your code in the language of your choice. But we have preferences for the language of our stack:

Go (Highly recommended)
Python
NodeJS

Please organise your code, document it and write relevant unit tests. Please also inform on any specific setup which is required to make your code run.

Context

At Sephora SEA, we are building business schemas on top of the e-commerce databases. For instance, we have a products table which is a consolidation of 30 other tables. It contains information for each of our products (ids, descriptions, categories...).

Our tables are organised in datasets (folders), eg. final, raw, tmp.

The products table is created from a chain of dependent SQL scripts. The process is as follows:

Dump the e-commerce databases into a raw dataset (raw folder)
Clean and consolidate the data on top of these raw tables using SQL scripts stored in the tmp folder. For instance, the result of the tmp/inventory_items.sql script will be stored in the tmp.inventory_items table.
Create the final.products table (from the final/products.sql script)

Your task

Your task is to build parts of the tool that will orchestrate the aforementioned process, in order to create the final.products table:

Write a function that shows the dependencies between all the sql scripts (from scratch, no specialized library!) eg. showing that tmp/item_purchase_prices.sql depends on raw.purchase_line_items and raw.purchase_items. We are expecting some kind of simple visualization (graph, tree, ...)
Write a function that, using the previous question, runs the sql scripts in the correct order. Please provide documentation as of how you are proceeding.

Going further, we would like to parallelize the execution of few of these scripts. If you think of the dependencies as a tree: scripts from different nodes can work simultaneously, but, still, must not be executed before its children's tasks are done.

Write a function that paralellizes the execution of the SQL scripts, ensuring they respect their dependencies. Please provide documentation as of how you are proceeding.

Notes

*The files in the raw folder represents the available raw data tables in the raw dataset.

*For point 1, parsing the query will be necessary. We can assume the shape of the tables in the scripts will always be dataset_name.table_name

*You can use the following dummy function as a placeholder for the function actually running the sql scripts (in go):

func() {
 // This is execution time of the tmp.inventory_items script
 time.Sleep(time.Second * 3)
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
final		final
raw		raw
tmp		tmp
readME.md		readME.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Warehouse Architect Position at Sephora SEA

Context

Your task

Notes

About

Releases

Packages

Contributors 3

anaisghelfi/sephora-test

Folders and files

Latest commit

History

Repository files navigation

Data Warehouse Architect Position at Sephora SEA

Context

Your task

Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Packages