About • How to run • Populating the database • Jupyter Notebooks • Observed Correlations •
This is a repository where I am apllying to Gamers Club Data Analystic Test. Here I am going to show how I UP the database, how I analysed the charts using python, show the results (more charts) and discuss about it. In this repo I am trying to understand why much people leave the Game course after watch a few classes and look for a way to solve this.
The database credentials has been omitted.
To run this repository, follow the steps:
-
Create a virtualenv (this isn't necessary, but I belive this is a good practic):
- python3 -m venv .venv
to create a virtualenv
- source .venv/bin/activate
to use this virtualenv
- python3 -m venv .venv
-
Clone this repository:
-
Install the dependencies:
- pip install ipython[notebook]
to install jupyter notebook
- pip install python-dotenv
to install dotenv
- pip install pymysql
to install this pure-Python MySQL client library
3.1 Graph dependencies:
- pip install matplotlib
- pip install seaborn
- pip install ipython[notebook]
-
Open python notebook:
- ipython notebook
-
To see a Database manager:
- Adminer
sudo docker run
--name adminer
-p 8080:8080
-d
adminer
- Adminer
sudo docker run
now you can see your database by accessing: http://localhost:8080/
To populate this database to analyse the values, relations etc you need to use Adminer:
-
With Adimer running, go to: http://localhost:8080/, and follow the steps above:
1.1
Enter with your credentials(in this case, I am using a remote database from remotemysql.com)
1.2
Look to the database structure.
1.3
The easiest way to populate the data base. The another is just run by python.
2.With jupyter notebook:
2.1
This way is going to waste much more than the previous way, because we have much data amount.
or
2.2
This way is going to waste much time, as the 2.1, but in this case you are going to execute a single line*.
But, if you are using a free account, it might be necessary to use the 2.1 way, because in this case, your connection with remotemysql.com are going to break in less time than the necessary time to finish python execution.
To describe what are the reason of each jupyter notebook in this repo:
- DDL.ipynb:
This notebook is used to create the database structure executing SQL statment in python.
- pre-processing.ipynb:
This notebook is used to do the pre-processing in the sql file, to manually export a csv where I can use the correlation pandas feature.
- Analysis.ipynb:
This notebook is used to do the data analysis, even in the remote mysql table as the csv files, to mainly answer the questions in the test. In this notebook I argue about the correlation method to choose, and why.
4.Analysis2.ipynb:
This notebook is where I analyse the players skills, and the growth in the number of accounts created.
Here are some the images of the correlations:
Here are the images of where trends are concluded