-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recommendations for handling large datasets #83
Comments
Hi I am also using this tool with large datasets (~150k sequences). The KNN classification returns empty knn_seq.pkl and an error like below. I am wondering if you have ever encountered this error? and I suspect it may be an out-of-memory issue of KNN? ValueError Traceback (most recent call last) ~/deeptcr/lib/python3.7/site-packages/DeepTCR/DeepTCR.py in KNN_Sequence_Classifier(self, folds, k_values, rep, plot_metrics, by_class, plot_type, metrics, n_jobs, Load_Prev_Data) ~/deeptcr/lib/python3.7/site-packages/seaborn/_decorators.py in inner_f(*args, **kwargs) ~/deeptcr/lib/python3.7/site-packages/seaborn/categorical.py in catplot(x, y, hue, data, row, col, col_wrap, estimator, ci, n_boot, units, seed, order, hue_order, row_order, col_order, kind, height, aspect, orient, color, palette, legend, legend_out, sharex, sharey, margin_titles, facet_kws, **kwargs) ~/deeptcr/lib/python3.7/site-packages/seaborn/categorical.py in establish_colors(self, color, palette, saturation) ValueError: min() arg is an empty sequence |
Hi, thank you for creating this great tool!
I was wondering if you could offer some guidance on handling large datasets in the unsupervised workflow? In particular this seems to be a problem with the clustering/KNN classification steps as it seems to be prohibitively memory-expensive.
I think that downsampling is interfering with the classification accuracy so I would like to use all the data if possible.
Thanks so much for your help!
Leeana
The text was updated successfully, but these errors were encountered: