About

About • How to run • Populating the database • Jupyter Notebooks • Observed Correlations •

About

This is a repository where I am apllying to Gamers Club Data Analystic Test. Here I am going to show how I UP the database, how I analysed the charts using python, show the results (more charts) and discuss about it. In this repo I am trying to understand why much people leave the Game course after watch a few classes and look for a way to solve this.
The database credentials has been omitted.

How_to_run

To run this repository, follow the steps:

Create a virtualenv (this isn't necessary, but I belive this is a good practic):
- python3 -m venv .venv to create a virtualenv
- source .venv/bin/activate to use this virtualenv
Clone this repository:
- git clone https://github.com/victor-s-santos/DataAnalystic-Test.git
Install the dependencies:
- pip install ipython[notebook] to install jupyter notebook
- pip install python-dotenv to install dotenv
- pip install pymysql to install this pure-Python MySQL client library
3.1 Graph dependencies:
- pip install matplotlib
- pip install seaborn
Open python notebook:
- ipython notebook
To see a Database manager:
- Adminer sudo docker run
  --name adminer
  -p 8080:8080
  -d
  adminer

now you can see your database by accessing: http://localhost:8080/

Populating_the_database

To populate this database to analyse the values, relations etc you need to use Adminer:

With Adimer running, go to: http://localhost:8080/, and follow the steps above:

1.1 Enter with your credentials(in this case, I am using a remote database from remotemysql.com)

1.2 Look to the database structure.

1.3 The easiest way to populate the data base. The another is just run by python.

2.With jupyter notebook:

2.1 This way is going to waste much more than the previous way, because we have much data amount.

or

2.2 This way is going to waste much time, as the 2.1, but in this case you are going to execute a single line*. But, if you are using a free account, it might be necessary to use the 2.1 way, because in this case, your connection with remotemysql.com are going to break in less time than the necessary time to finish python execution.

Jupyter_Notebooks

To describe what are the reason of each jupyter notebook in this repo:

DDL.ipynb:

This notebook is used to create the database structure executing SQL statment in python.

pre-processing.ipynb:

This notebook is used to do the pre-processing in the sql file, to manually export a csv where I can use the correlation pandas feature.

Analysis.ipynb:

This notebook is used to do the data analysis, even in the remote mysql table as the csv files, to mainly answer the questions in the test. In this notebook I argue about the correlation method to choose, and why.

4.Analysis2.ipynb:

This notebook is where I analyse the players skills, and the growth in the number of accounts created.

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
csv		csv
images		images
metabase		metabase
python-notebooks		python-notebooks
sql		sql
.env-sample		.env-sample
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

How_to_run

Populating_the_database

Jupyter_Notebooks

Observed_Correlations

The correlation in the matchmaking_summary table:

The correlation in the player_mounthly_stats table:

Trends noticed

Relation account_created x day:

Relation account_created x month:

About

Uh oh!

Releases

Packages

Languages

victor-s-santos/DataAnalystic-Test

Folders and files

Latest commit

History

Repository files navigation

About

How_to_run

Populating_the_database

Jupyter_Notebooks

Observed_Correlations

The correlation in the matchmaking_summary table:

The correlation in the player_mounthly_stats table:

Trends noticed

Relation account_created x day:

Relation account_created x month:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages