- Logistic Regression
- Linear regression
- multiple linear regression
- LSTM
- Gradient descent
- KNN
- Naive Bayes
- Support Vector machine
- Decision tree
- random forest
- homogenity
- gini index
- information gain
- detect heart desease
- ads
- disads :
- overfitting
- Ensembling
- boosting
- stacking
- random subspaces
- Extreme randomized tree
- Bayesian network
- Association rule learning
- NLP
- stop words
- name entity recognition
- parts of speech
- relationships
- tf-idf
- Anomaly detection
- association rule learning
- Semi supervised learning algo
- Deep belief network
- K Means cluster
- XGBoost classifier
- K Fold crossvalidation
- Classify malware family using K Means cluster
- Prediction analysis with different algorithms
- Anomaly detection with Isolation Forest algorithm
- Analyze time series algorithms
- Recommendation algorithms
- Correlation based PearR
- KNN
- Cosine similarity
- Ensembling using in project
- Data acquisition
- crawl
- sources : text-based documentation, multimedia, video, audio
- ML model
- seletec model
- train
- fine tune model
- Online learning
- Save trained model (joblib)
- Monitor system
- Update dataset and retrain model regularly
- Automate
- collect data regularly
- script to train model & fune tuning hyperparameters automatically, run every (week)
- Script to deploy model to prod
- script to evaluate model input data quality
- Backup model
- Google Cloud AI platform
-
Java
-
Python
-
Hardware : 64 bit 2+ GHzCPU, 1Gb
-
Tensorflow, Keras, Scikit learn
-
Colaboratory, Deepnote, Binder
-
https://nbviewer.jupyter.org/github/ageron/handson-ml2/blob/master/index.ipynb
-
Docker
-
GPU driver
-
If use GPU :
conda env create -f environment.yml
conda activate tf2
python -m ipykernel install --user --name=python3
jupyter notebook
-
Weka
- coreML
- tensorflow Lite
- ML kit
- Huawei AI mobile computing
- real time object classification
- Core ML
- CNN
- 1,000 imagenet
- dynamic model deployment
- on device training
- benchmark DL model on iOS devices
- Hand gesture recognition using kinect depth sensor
-
Parallelization and Amdahl's Law
-
Stack :
- Profiler: cProfile
python -m cProfile -s cumtime mandelbrot1.py > mandelbrot_profile.txt
-
Set up : 64bit Intel/AMD based PC - Ubuntu LTS, 4GB RAM, NVIDIA Geforce GTX 1050 GPU +
- Intel's Math Kernel Library (MKL)
- macOS, Red Hat/Fedora, OpenSUSE, and CENTOS) should consult the official NVIDIA CUDA documentation (https://docs.nvidia.com/cuda/)
- Cloud : Azure, AWS (read driver, compiler, CUDA toolkit )
- Install : NVIDIA GPU driver
- Set up C/C++ env
- Install NVIDIA CUDA toolkit
- Set up Python for GPU
-
PyCUDA : memory capacity, core, transfer data between host -> device
-
Scan CUDA kernel
-
Streaming Multiprocessors (SMs)
-
http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capabilities
- https://www.datasetlist.com/
- https://mybinder.org/
- https://paperswithcode.com/
- https://modeldepot.io/
- https://www.visualdata.io/
- https://github.com/huggingface/knockknock
Archived repo: https://github.com/hiejulia/Machine-Learning---Deep-Learning---AI
-http://archive.ics.uci.edu/ml/index.php
- https://www.kaggle.com/datasets
- http://dataportals.org/
- https://opendatamonitor.eu/frontend/web/index.php?r=dashboard%2Findex
- https://www.quandl.com/
- https://en.wikipedia.org/wiki/List_of_datasets_for_machine-learning_research
- https://www.quora.com/Where-can-I-find-large-datasets-open-to-the-public
- https://www.reddit.com/r/datasets/
- https://registry.opendata.aws/
- Hotstar
- Netflix