Skip to content

Commit dddcdb0

Browse files
committed
Add datasets
1 parent 79207e6 commit dddcdb0

36 files changed

+3626
-1
lines changed

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# Project-specific settings:
2-
data/
2+
33

44

55
# Created by https://www.toptal.com/developers/gitignore/api/python,jupyternotebooks,data,linux,macos,windows

data/ADL/Leotta_2021/Leotta_2021_get_X_y_sub.ipynb

Lines changed: 1 addition & 0 deletions
Large diffs are not rendered by default.

data/ADL/Leotta_2021/leotta_2021_get_x_y_sub.py

Lines changed: 391 additions & 0 deletions
Large diffs are not rendered by default.

data/ADL/Leotta_2021/leotta_2021_load_dataset.ipynb

Lines changed: 1 addition & 0 deletions
Large diffs are not rendered by default.

data/ADL/Leotta_2021/leotta_2021_load_dataset.py

Lines changed: 422 additions & 0 deletions
Large diffs are not rendered by default.
10.2 KB
Binary file not shown.

data/Gesture Phase Segmentation/Gesture-Phase-Segmentation_load_dataset.ipynb

Lines changed: 1 addition & 0 deletions
Large diffs are not rendered by default.

data/Gesture Phase Segmentation/Get_IR1_files_from_zip.ipynb

Lines changed: 1 addition & 0 deletions
Large diffs are not rendered by default.

data/Gesture Phase Segmentation/gesture_phase_segmentation_load_dataset.py

Lines changed: 375 additions & 0 deletions
Large diffs are not rendered by default.

data/HAR/HAR_model_eval_defined_subj.ipynb

Lines changed: 1 addition & 0 deletions
Large diffs are not rendered by default.

data/HAR/HAR_model_eval_stratification.ipynb

Lines changed: 1 addition & 0 deletions
Large diffs are not rendered by default.

data/HAR/TWristAR/README.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
11/24/2019 e4 data collection, processing updated June 2021. Lee B. Hinkle, Texas State University
2+
3+
Data collected using e4 wristband https://www.empatica.com/research/e4/ for the 6 activities in the WISDM data set https://www.cis.fordham.edu/wisdm/dataset.php
4+
5+
Data was collected in 3 separate structured runs of approximately 6 minutes each and a unstructured walk. The e4 was worn on the right wrist and was removed after run/walk to check data.
6+
7+
Sessions were video recorded using an action camera strapped to upper arm facing downward. Session numbers are from the Empatica.com/connect site and are different from the name of the matching zip file.
8+
9+
The plots directory has screengrabs from the e4 connect site for each of the four files.
10+
11+
Video describing the code and processing of the data https://mediaflo.txstate.edu/Watch/e4_data_processing
12+
13+
Session 794445 zip file 1574621345_A01F11.zip video link https://mediaflo.txstate.edu/Watch/e4_794445
14+
Upstairs/downstairs, each up & down cycle was slightly less than a minute (~27 seconds up, slightly less down). There are two short landings in the staircase (by facilities building)
15+
16+
Session 794446 zip file 1574622389_A01F11.zip video link https://mediaflo.txstate.edu/Watch/e4_794446
17+
Run/walk sequence alternated activities for 1 minute each. Middle run session included a brief turn around due to group taking graduation photos on the sidewalk. It was the loop around the theatre building.
18+
19+
Session 794456 zip file 1574624998_A01F11.zip video link https://mediaflo.txstate.edu/Watch/e4_794456
20+
Sit/stand, performed outside, 1 min intervals.
21+
22+
Session 794457 zip file 1574625540_A01F11.zip video link https://mediaflo.txstate.edu/Watch/e4_794457
23+
Unstructured walk on the quad from Old Main to Alkek and back. A variety of short stairs and a sitting segment before reversing direction are included. This activity has not been labeled - no \_label.csv has been made.

data/HAR/TWristAR/TWristAR_load_dataset.ipynb

Lines changed: 1 addition & 0 deletions
Large diffs are not rendered by default.

data/HAR/TWristAR/UE4W_load_dataset.ipynb

Lines changed: 1 addition & 0 deletions
Large diffs are not rendered by default.

data/HAR/TWristAR/e4_end_to_end.ipynb

Lines changed: 1 addition & 0 deletions
Large diffs are not rendered by default.

data/HAR/TWristAR/e4_get_X_y_sub.ipynb

Lines changed: 1 addition & 0 deletions
Large diffs are not rendered by default.

data/HAR/TWristAR/e4_get_x_y_sub.py

Lines changed: 357 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"e4_load_dataset.ipynb","provenance":[],"collapsed_sections":[],"mount_file_id":"1GlkU4dtwyqOoQXpbHp5Vi6sTmCAzugyY","authorship_tag":"ABX9TyOP5YcuMxrf42EBKoEn6eb7"},"kernelspec":{"name":"python3","display_name":"Python 3"}},"cells":[{"cell_type":"markdown","metadata":{"id":"Khc4g511HMYk"},"source":["#e4_load_dataset.ipynb\n","This data set loader uses the e4_get_X_y_sub.py file generated by downloading the python version of the same name Jupyter notebook.\n","\n","Important note: The current data set is single subject, however there are\n","three subject numbers included {11,12,13} in order to perform the subject\n","based train/validate/test split.\n","\n","Example usage:\n","\n"," x_train, y_train, x_test, y_test = e4_load_dataset()\n"," \n","\n","Developed and tested using colab.research.google.com \n","To save as .py version use File > Download .py\n","\n","Author: [Lee B. Hinkle](https://userweb.cs.txstate.edu/~lbh31/), [IMICS Lab](https://imics.wp.txstate.edu/), Texas State University, 2021\n","\n","<a rel=\"license\" href=\"http://creativecommons.org/licenses/by-sa/4.0/\"><img alt=\"Creative Commons License\" style=\"border-width:0\" src=\"https://i.creativecommons.org/l/by-sa/4.0/88x31.png\" /></a><br />This work is licensed under a <a rel=\"license\" href=\"http://creativecommons.org/licenses/by-sa/4.0/\">Creative Commons Attribution-ShareAlike 4.0 International License</a>.\n","\n","TODOs:\n","* \n"]},{"cell_type":"code","metadata":{"id":"NmKBvlsatEdF","executionInfo":{"status":"ok","timestamp":1624653030543,"user_tz":300,"elapsed":84,"user":{"displayName":"Lee Hinkle","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GgewSVTK-UUEEP0ihHQARBRqOb4YrK-IiepxHiI=s64","userId":"00071704663307985880"}}},"source":["#mount google drive in colab session\n","#enter path to where the git repo was cloned\n","my_path = '/content/drive/My Drive/Colab Notebooks/imics_lab_repositories/load_data_time_series_dev'"],"execution_count":1,"outputs":[]},{"cell_type":"code","metadata":{"id":"q6H67o-YARCx","executionInfo":{"status":"ok","timestamp":1624653034996,"user_tz":300,"elapsed":2724,"user":{"displayName":"Lee Hinkle","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GgewSVTK-UUEEP0ihHQARBRqOb4YrK-IiepxHiI=s64","userId":"00071704663307985880"}}},"source":["import os\n","import shutil #https://docs.python.org/3/library/shutil.html\n","from shutil import unpack_archive # to unzip\n","import requests #for downloading zip file\n","import numpy as np\n","from tabulate import tabulate # for verbose tables, showing data\n","import matplotlib.pyplot as plt\n","from tensorflow.keras.utils import to_categorical # for one-hot encoding\n","from sklearn.model_selection import train_test_split\n","from sklearn.preprocessing import LabelEncoder\n","from sklearn.preprocessing import OneHotEncoder"],"execution_count":2,"outputs":[]},{"cell_type":"code","metadata":{"id":"IZYH9-5wuINO","executionInfo":{"status":"ok","timestamp":1624653038706,"user_tz":300,"elapsed":1120,"user":{"displayName":"Lee Hinkle","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GgewSVTK-UUEEP0ihHQARBRqOb4YrK-IiepxHiI=s64","userId":"00071704663307985880"}}},"source":["# use get_x_y_sub to get partially processed numpy arrays\n","full_filename = my_path+os.path.join('/HAR/e4_wristband_Nov2019/'+'e4_get_x_y_sub.py')\n","shutil.copy(full_filename,'e4_get_x_y_sub.py')\n","from e4_get_x_y_sub import get_X_y_sub"],"execution_count":3,"outputs":[]},{"cell_type":"code","metadata":{"id":"trfLorthy59i","executionInfo":{"status":"ok","timestamp":1624655099511,"user_tz":300,"elapsed":89,"user":{"displayName":"Lee Hinkle","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GgewSVTK-UUEEP0ihHQARBRqOb4YrK-IiepxHiI=s64","userId":"00071704663307985880"}}},"source":["def e4_load_dataset(\n"," verbose = True,\n"," incl_xyz_accel = False, # include component accel_x/y/z in ____X data\n"," incl_rms_accel = True, # add rms value (total accel) of accel_x/y/z in ____X data\n"," incl_val_group = False, # split train into train and validate\n"," split_subj = dict\n"," (train_subj = [11],\n"," validation_subj = [12],\n"," test_subj = [13]),\n"," one_hot_encode = True # make y into multi-column one-hot, one for each activity\n"," ):\n"," \"\"\"calls e4_get_X_y_sub and processes the returned arrays by separating\n"," into _train, _validate, and _test arrays for X and y based on split_sub\n"," dictionary. Note current dataset is single subject labeled as 11, 12, 13\n"," in order to exercise the code\"\"\"\n"," e4_flist = ['1574621345_A01F11.zip','1574622389_A01F11.zip', '1574624998_A01F11.zip']\n"," X, y, sub, xys_info = get_X_y_sub(zip_flist = e4_flist)\n"," log_info = 'Processing e4 files'+str(e4_flist)\n"," #remove component accel if needed\n"," if (not incl_xyz_accel):\n"," print(\"Removing component accel\")\n"," X = np.delete(X, [0,1,2], 2)\n"," if (not incl_rms_accel):\n"," print(\"Removing total accel\")\n"," X = np.delete(X, [3], 2) \n"," #One-Hot-Encode y...there must be a better way when starting with strings\n"," #https://machinelearningmastery.com/how-to-one-hot-encode-sequence-data-in-python/\n","\n"," if (one_hot_encode):\n"," # integer encode\n"," y_vector = np.ravel(y) #encoder won't take column vector\n"," le = LabelEncoder()\n"," integer_encoded = le.fit_transform(y_vector) #convert from string to int\n"," name_mapping = dict(zip(le.classes_, le.transform(le.classes_)))\n"," print(\"One-hot-encoding: category names -> int -> one-hot\")\n"," print(name_mapping) # seems risky as interim step before one-hot\n"," log_info += \"One Hot:\" + str(name_mapping) +\"\\n\\n\"\n"," onehot_encoder = OneHotEncoder(sparse=False)\n"," integer_encoded = integer_encoded.reshape(len(integer_encoded), 1)\n"," onehot_encoded = onehot_encoder.fit_transform(integer_encoded)\n"," print(\"One-hot-encoding\",onehot_encoder.categories_)\n"," y=onehot_encoded\n"," #return X,y\n"," # split by subject number pass in dictionary\n"," sub_num = np.ravel(sub[ : , 0] ) # convert shape to (1047,)\n"," if (not incl_val_group):\n"," train_index = np.nonzero(np.isin(sub_num, split_subj['train_subj'] + \n"," split_subj['validation_subj']))\n"," x_train = X[train_index]\n"," y_train = y[train_index]\n"," else:\n"," train_index = np.nonzero(np.isin(sub_num, split_subj['train_subj']))\n"," x_train = X[train_index]\n"," y_train = y[train_index]\n","\n"," validation_index = np.nonzero(np.isin(sub_num, split_subj['validation_subj']))\n"," x_validation = X[validation_index]\n"," y_validation = y[validation_index]\n","\n"," test_index = np.nonzero(np.isin(sub_num, split_subj['test_subj']))\n"," x_test = X[test_index]\n"," y_test = y[test_index]\n"," if (incl_val_group):\n"," return x_train, y_train, x_validation, y_validation, x_test, y_test\n"," else:\n"," return x_train, y_train, x_test, y_test\n","\n","\n"," if(verbose):\n"," headers = (\"Reshaped data\",\"shape\", \"object type\", \"data type\")\n"," mydata = [(\"x_train:\", x_train.shape, type(x_train), x_train.dtype),\n"," (\"y_train:\", y_train.shape ,type(y_train), y_train.dtype),\n"," (\"x_test:\", x_test.shape, type(x_test), x_test.dtype),\n"," (\"y_test:\", y_test.shape ,type(y_test), y_test.dtype)]\n"," print(tabulate(mydata, headers=headers))\n","\n"," return x_train, y_train, x_test, y_test"],"execution_count":25,"outputs":[]},{"cell_type":"code","metadata":{"id":"MaT1dfqavvtk","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1624655112594,"user_tz":300,"elapsed":8508,"user":{"displayName":"Lee Hinkle","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GgewSVTK-UUEEP0ihHQARBRqOb4YrK-IiepxHiI=s64","userId":"00071704663307985880"}},"outputId":"f4bcf463-1ce5-4548-d959-b7837c02bcf2"},"source":["if __name__ == \"__main__\":\n"," print(\"Downloading and processing e4 dataset\")\n"," x_train, y_train, x_test, y_test = e4_load_dataset()\n"," print(\"\\nreturned arrays without validation group:\")\n"," print(\"x_train shape \",x_train.shape,\" y_train shape \", y_train.shape)\n"," print(\"x_test shape \",x_test.shape,\" y_test shape \",y_test.shape)\n","\n"," x_train, y_train, x_validation, y_validation, x_test, y_test = e4_load_dataset(incl_val_group=True)\n"," print(\"\\nreturned arrays with validation group:\")\n"," print(\"x_train shape \",x_train.shape,\" y_train shape \", y_train.shape)\n"," print(\"x_validation shape \",x_validation.shape,\" y_validation shape \", y_validation.shape)\n"," print(\"x_test shape \",x_test.shape,\" y_test shape \",y_test.shape)"],"execution_count":26,"outputs":[{"output_type":"stream","text":["Downloading and processing e4 dataset\n","Processing 1574621345_A01F11.zip\n","Unzipping e4 file in local directory /content/temp\n","/content/temp/ACC.csv Sample frequency = 32.0 Hz\n","File start time = Sun, 24 Nov 2019 18:49:05\n","File end time = Sun, 24 Nov 2019 18:58:11\n","Tag info (button presses) from tags.csv\n"," UTC_time Local Time\n","0 1574621375.17 Sun, 24 Nov 2019 18:49:35\n","1 1574621774.22 Sun, 24 Nov 2019 18:56:14\n","Warning: Multiple subjects detected in csv, unusual for e4 data.\n","Label Counts\n"," Upstairs 6208\n","Downstairs 5889\n","Undefined 5405\n","Name: label, dtype: int64\n","No NaN entries found\n","shapes call broke when making the function - not sure why\n","Processing 1574622389_A01F11.zip\n","Unzipping e4 file in local directory /content/temp\n","/content/temp/ACC.csv Sample frequency = 32.0 Hz\n","File start time = Sun, 24 Nov 2019 19:06:29\n","File end time = Sun, 24 Nov 2019 19:15:03\n","Tag info (button presses) from tags.csv\n"," UTC_time Local Time\n","0 1574622432.21 Sun, 24 Nov 2019 19:07:12\n","1 1574622822.72 Sun, 24 Nov 2019 19:13:42\n","Warning: Multiple subjects detected in csv, unusual for e4 data.\n","Label Counts\n"," Walking 5793\n","Jogging 5792\n","Undefined 4885\n","Name: label, dtype: int64\n","No NaN entries found\n","shapes call broke when making the function - not sure why\n","Processing 1574624998_A01F11.zip\n","Unzipping e4 file in local directory /content/temp\n","/content/temp/ACC.csv Sample frequency = 32.0 Hz\n","File start time = Sun, 24 Nov 2019 19:49:58\n","File end time = Sun, 24 Nov 2019 19:57:15\n","Tag info (button presses) from tags.csv\n"," UTC_time Local Time\n","0 1574625042.71 Sun, 24 Nov 2019 19:50:42\n","1 1574625419.43 Sun, 24 Nov 2019 19:56:59\n","Warning: Multiple subjects detected in csv, unusual for e4 data.\n","Label Counts\n"," Sitting 5857\n","Standing 5632\n","Undefined 2503\n","Name: label, dtype: int64\n","No NaN entries found\n","shapes call broke when making the function - not sure why\n","shapes call broke when making the function - not sure why\n","Final Label Counts\n","[['Downstairs' '170']\n"," ['Jogging' '175']\n"," ['Sitting' '177']\n"," ['Standing' '170']\n"," ['Upstairs' '180']\n"," ['Walking' '175']]\n","e4 November 2019 zip files\n","1574621345_A01F11.zip 1574622389_A01F11.zip 1574624998_A01F11.zip\n","Time steps =96, Step =32, no resample\n","Final Shapes\n","shapes call broke when making the function - not sure why\n","Removing component accel\n","One-hot-encoding: category names -> int -> one-hot\n","{'Downstairs': 0, 'Jogging': 1, 'Sitting': 2, 'Standing': 3, 'Upstairs': 4, 'Walking': 5}\n","One-hot-encoding [array([0, 1, 2, 3, 4, 5])]\n","\n","returned arrays without validation group:\n","x_train shape (709, 96, 1) y_train shape (709, 6)\n","x_test shape (338, 96, 1) y_test shape (338, 6)\n","Processing 1574621345_A01F11.zip\n","Unzipping e4 file in local directory /content/temp\n","/content/temp/ACC.csv Sample frequency = 32.0 Hz\n","File start time = Sun, 24 Nov 2019 18:49:05\n","File end time = Sun, 24 Nov 2019 18:58:11\n","Tag info (button presses) from tags.csv\n"," UTC_time Local Time\n","0 1574621375.17 Sun, 24 Nov 2019 18:49:35\n","1 1574621774.22 Sun, 24 Nov 2019 18:56:14\n","Warning: Multiple subjects detected in csv, unusual for e4 data.\n","Label Counts\n"," Upstairs 6208\n","Downstairs 5889\n","Undefined 5405\n","Name: label, dtype: int64\n","No NaN entries found\n","shapes call broke when making the function - not sure why\n","Processing 1574622389_A01F11.zip\n","Unzipping e4 file in local directory /content/temp\n","/content/temp/ACC.csv Sample frequency = 32.0 Hz\n","File start time = Sun, 24 Nov 2019 19:06:29\n","File end time = Sun, 24 Nov 2019 19:15:03\n","Tag info (button presses) from tags.csv\n"," UTC_time Local Time\n","0 1574622432.21 Sun, 24 Nov 2019 19:07:12\n","1 1574622822.72 Sun, 24 Nov 2019 19:13:42\n","Warning: Multiple subjects detected in csv, unusual for e4 data.\n","Label Counts\n"," Walking 5793\n","Jogging 5792\n","Undefined 4885\n","Name: label, dtype: int64\n","No NaN entries found\n","shapes call broke when making the function - not sure why\n","Processing 1574624998_A01F11.zip\n","Unzipping e4 file in local directory /content/temp\n","/content/temp/ACC.csv Sample frequency = 32.0 Hz\n","File start time = Sun, 24 Nov 2019 19:49:58\n","File end time = Sun, 24 Nov 2019 19:57:15\n","Tag info (button presses) from tags.csv\n"," UTC_time Local Time\n","0 1574625042.71 Sun, 24 Nov 2019 19:50:42\n","1 1574625419.43 Sun, 24 Nov 2019 19:56:59\n","Warning: Multiple subjects detected in csv, unusual for e4 data.\n","Label Counts\n"," Sitting 5857\n","Standing 5632\n","Undefined 2503\n","Name: label, dtype: int64\n","No NaN entries found\n","shapes call broke when making the function - not sure why\n","shapes call broke when making the function - not sure why\n","Final Label Counts\n","[['Downstairs' '170']\n"," ['Jogging' '175']\n"," ['Sitting' '177']\n"," ['Standing' '170']\n"," ['Upstairs' '180']\n"," ['Walking' '175']]\n","e4 November 2019 zip files\n","1574621345_A01F11.zip 1574622389_A01F11.zip 1574624998_A01F11.zip\n","Time steps =96, Step =32, no resample\n","Final Shapes\n","shapes call broke when making the function - not sure why\n","Removing component accel\n","One-hot-encoding: category names -> int -> one-hot\n","{'Downstairs': 0, 'Jogging': 1, 'Sitting': 2, 'Standing': 3, 'Upstairs': 4, 'Walking': 5}\n","One-hot-encoding [array([0, 1, 2, 3, 4, 5])]\n","\n","returned arrays with validation group:\n","x_train shape (380, 96, 1) y_train shape (380, 6)\n","x_validation shape (329, 96, 1) y_validation shape (329, 6)\n","x_test shape (338, 96, 1) y_test shape (338, 6)\n"],"name":"stdout"}]}]}

0 commit comments

Comments
 (0)