This project trains a model using Multinomial Naive Bayes
algorithm to predict gender of a person from his/her first name. For this project, we used a dataset
downloaded from data.gov which contains a zip file containing
142 txt
files. There are files for every year from 1800 to 2021.
###Instruction
- Clone this repository:
git clone https://github.com/taeefnajib/predict-gender-from-first-name
-
Download the zip file from data.gov and unzip the
names
folder. Place it in the working directory. -
Install all the dependencies:
pip install -r requirements.txt
-
data.py
prepare acsv
file from all thetxt
files and pre-processes the dataset. You don't need to run it in the command line. -
train.py
builds a model and trains it on the dataset. The repository contains the filesdata.csv
andmodel.pkl
. If you remove them and runtrain.py
, this file will create the filesdata.csv
andmodel.pkl
-
test.py
usesargparse
to allow users to predict genders from first names in the command line. Use--name
or-n
followed by the name you want to predict gender for. Example:
python test.py --name Josh
- If you want to use
FastAPI
instead, you can do it:
uvicorn main:app --reload
This will open Swagger UI interface at 127.0.0.1 using port 8080 (if it is available). If you use the first name as a string
it will reuturn a dictionary
for Gender
and Probability