UC Irvine Datasets

class UC_Irvine_datasets()

The UC_Irvine_datasets() object contains a pandas dataframe of all the datasets available on the UC-Irvine Machine Learning Repository. Many methods make it easy to peruse, export, and even import the datasets inside the object.

See below for examples of ways to use methods of UC_Irvine_datasets():

from ucidata import UC_Irvine_datasets, df_first_row_to_header

# Create an instance of the class, which loads the dataframe of UC Irvine datasets
ucid = UC_Irvine_datasets()

string representation

The string property allows users to understand the current state of the class object.

print(ucid)

object.list_all_datasets()

Look at what datasets area available with list_all_datasets()

ucid.list_all_datasets()

object.limit(fieldname, value_to_match)

If you want to select only a single kind of dataset, limit to a single value with limit().

ucid.limit("Area", "Business")
print(ucid)
ucid.list_all_datasets()

object.show_me_dataset(ID)

Wow that's too many datasets all at once.

Let's just look at one with show_me_dataset(ID)

ds = UC_Irvine_datasets()
ds = ds.show_me_dataset("wine-quality")
print(ds)

object.load_small_dataset_df(ID)

There's a flag set on this data set called small = 1.

In this case that means that our team decided the dataset was sufficiently small to safely import directly as a dataframe.

You can try to import any small dataset as load_small_dataset_df(ID)

Note that if there are multiple datasets available, only the first dataset is loaded.

test_load_df = ucid.load_small_dataset_df("wine-quality")
print(f"There are {len(test_load_df.index)} rows")
print(test_load_df.head())

df_first_row_to_header(df)

Sometimes the datasets come with headers, and sometimes they don't.

df_first_row_to_header(df) will resolve this issue.

test_load_df = df_first_row_to_header(test_load_df)
print("\n\n### WITH HEADERS CORRECTED ###\n")
print(test_load_df.head())

object.small_datasets_only()

object.small_datasets_only() will return a new object of only "small" datasets.

Let's see what returns from using it.

small_ucid = UC_Irvine_datasets().small_datasets_only()
small_uci = small_ucid.list_all_datasets()
print(small_ucid)

The object can also produce simple plots from the dataframe:

ucid.print_distribution("NumberofInstances")

ucid.print_barplot("year_donated","NumberofWebHits", colorcol="header")

Name		Name	Last commit message	Last commit date
Latest commit History 127 Commits
archive_data_prep		archive_data_prep
project_submission		project_submission
readme_images		readme_images
.gitignore		.gitignore
README.md		README.md
_config.yml		_config.yml
conda_env_irvinemetadata.yml		conda_env_irvinemetadata.yml
project_submission.zip		project_submission.zip
viz_worldmap_sourced_pct_datasets.html		viz_worldmap_sourced_pct_datasets.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

archive_data_prep

archive_data_prep

project_submission

project_submission

readme_images

readme_images

.gitignore

.gitignore

README.md

README.md

_config.yml

_config.yml

conda_env_irvinemetadata.yml

conda_env_irvinemetadata.yml

project_submission.zip

project_submission.zip

viz_worldmap_sourced_pct_datasets.html

viz_worldmap_sourced_pct_datasets.html

Repository files navigation

UC Irvine Datasets

class UC_Irvine_datasets()

string representation

object.list_all_datasets()

object.limit(fieldname, value_to_match)

object.show_me_dataset(ID)

object.load_small_dataset_df(ID)

df_first_row_to_header(df)

object.small_datasets_only()

About

Releases

Packages

Contributors 4

Languages

kipmccharen/UC_Irvine_Dataset_MetaAnalysis

Folders and files

Latest commit

History

Repository files navigation

UC Irvine Datasets

class UC_Irvine_datasets()

string representation

object.list_all_datasets()

object.limit(fieldname, value_to_match)

object.show_me_dataset(ID)

object.load_small_dataset_df(ID)

df_first_row_to_header(df)

object.small_datasets_only()

About

Topics

Resources

Stars

Watchers

Forks

Languages