Added Unsupervised Learning Methods (Clustering + Dimensionality Reduction) #240
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This pull request introduces several unsupervised learning methods to the repository, including clustering methods (Gaussian Mixture Models) and Dimensionality Reduction (Principal Component Analysis, Autoencoder). These new modules are designed to work in tandem with existing features such as DataMaster.DataProcessor, in hopes of providing improvements to the previously existing CNN structure in the areas of data exploration and scalability.
New Features
UnsupervisedLearning.py
PCAModel
class, with methods to build model, encode, and decode input data.Autoencoder
class, with methods to build & train model, encode, and decode input data. Furthermore, users can download & upload model weights files.GMM
class, with methods to build & train model, calculate expected Y values, and make estimations with new input. Furthermore, users can download & upload model files.demo.py
Example Usage
PCA
Autoencoder
GMM
Initialize & Fit Model
To initialize every model, you pass in the same 6 parameters in their constructor. Then, you call the class's build_model and train_model methods with the necessary config dict for build_model. The process is very similar for the other 2 classes.
Saving Models
Using autoencoder as an example here, but it works the same with GMM.
Notes
I would appreciate any feedback on the new clustering & dimensionality reduction methods. In the future I could plan to expand upon these features, both by finding exploring different methods and by fine-tuning the existing methods.