Skip to content

Text clustering & visualisation library using word embeddings, text hashing & TF-IDF

Notifications You must be signed in to change notification settings

Psimkin/Text-clustering-end2end

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Document Clustering

Made by Timothy Avni (tavni96) & Peter Simkin (DolphinDance)

We present a way to cluster text documents by stacking features from TFIDF, pretrained word embeddings and text hashing.

We then reduce these dimensions using UMAP and HDBSCAN to produce a 2-D D3.js visualisation.

from TextProcessor.features import text_features
from TextProcessor.reduction import Mapper
from TextProcessor.labeller import automatic_labelling
from TextProcessor.viz import Visualiser

corpus = ### List of Documents
tf = text_features(corpus)
data = Mapper(tf.values, corpus)
mapping = automatic_labelling(data)
Visualiser(mapping,folder='test')
	

Viz

About

Text clustering & visualisation library using word embeddings, text hashing & TF-IDF

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published