by Joy Qi
This repo has implemented submissions to the Insight Data Science Challenge 2019.
Python 3.7
Input Datasets
order_products.csv
products.csv
Output Dataset
report.csv
: Calculated information that for each department, the number of times a product was requested, number of times a product was requested for the first time, and a ratio of those two numbers.
- Load csv files from the
input
folder - Clean the DataFrame by filtering columns selectively
- Join the two cleaned DataFrame
- Groupby
department_id
and aggregate twice to count for number of (first) orders - Apply column functions
- Filter out
number_of_records
greater than 0 - Calculate
percentage
column and round it to the second decimal - Sort the result DataFrame by
department_id
in ascending order
- Filter out
- Save the result DataFrame as a
.csv
file and into theoutput
folder