Datasets

A collection of dummy / public datasets for apps to use.

atherosclerosis.db: Atherosclerosis is a longitudinal 20 years lasting primary preventive study of middle-aged men. CTU
card_transactions.csv: 10,000 rows of fake card transaction data. Generated by Straive.
craftbeer.db: Craft beers labeled by styles and composition. A separate dataset lists breweries by state. Kaggle
data_scientist_jobstreet_scraped_v2.csv: from Data Scientist Job Dataset
ehr.csv: 28 rows of electronic health records of patient demographics, co-morbidities, adherence, and clinical notes. Useful to explore preference for injectables vs pills. Generated by Straive.
employee_data.csv: 2,000 rows of fake employee data generated by employee_data.py
got_book1.csv: from game_of_thrones_dataset
nba.db: A database with information about basketball matches from the National Basketball Association. Lists Players, Teams, and matches with action counts for each player. CTU
supply_chain.csv: 10,000 rows of fake supply chain data. Generated by Straive.
tourists.csv: 100 rows of supply chain data from a Fashion and Beauty startup chain of makeup products. Kaggle
world.db: A database of 239 states and their cities. CTU

This repo hosts:

Fake data. (Real data is usually in own sensible repo.)
Medium-sized data. (GitHub isn't ideal for large data. Small data is too trivial to host.)
Data used by multiple apps. (Single-app datasets should be in the app's repo.)

Provide feedback

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSE		LICENSE
README.md		README.md
atherosclerosis.db		atherosclerosis.db
card_transactions.csv		card_transactions.csv
craftbeer.db		craftbeer.db
data_scientist_jobstreet_scraped_v2.csv		data_scientist_jobstreet_scraped_v2.csv
ehr.csv		ehr.csv
employee_data.csv		employee_data.csv
employee_data.py		employee_data.py
got_book1.csv		got_book1.csv
nba.db		nba.db
supply_chain.csv		supply_chain.csv
tourists.csv		tourists.csv
tourists.py		tourists.py
world.db		world.db