Is my idea a good candidate for ML or DL? #157

elewis33 · 2022-10-26T16:09:16Z

elewis33
Oct 26, 2022

I've got a significant repository of email messages/threads that I'd like to analyze and essentially create a sort of prediction engine, where I can say, if a specific topic comes up in the future, I should probably reach out to these specific people to either get them involved or answer some questions. So aside from a trained model of some sort, my input would be a current email message and the output would be, based on historical analysis, you should reach out to these specific people based on the topic of the email message.

I know I can do a certain amount of analysis just using a database, but I'm thinking that the nuances of written language in email messages might make this a better candidate for ML/DL.

Does anyone have any thoughts about this particular problem/model?

A couple details that might help understand what I'm dealing with.

I'm using my personal email archive, so these are only emails that are either addressed directly to me in the from field or I'm CC'd on the message.
My email archive is organized into folders that correspond to specific projects, to there is already a sort of labeling/tagging for the messages in a single folder.

I'd love to hear what people think about this idea because it's been rattling around in my head for a long time and would like to get some of the pros and cons from others that know the ML/DL field better than me. Thank you!

Earl

Answered by mrdbourke

Nov 9, 2022

Hey @elewis33,

Like @fivefishstudios said, there's a great resource at https://huggingface.co with plenty of pretrained natural language processing (NLP) models.

What you might want to start looking for is a "zero-shot classification model" where you can provide an example text and the model will try to classify the topics based on what's the in the text with no pre-assigned labels.

See this example model here: https://huggingface.co/facebook/bart-large-mnli?candidateLabels=mobile%2C+website%2C+billing%2C+account+access&multiClass=false&text=Last+week+I+upgraded+my+iOS+version+and+ever+since+then+my+phone+has+been+overheating+whenever+I+use+your+app.

After you've tired that, you can go in…

View full answer

fivefishstudios · 2022-11-02T12:17:09Z

fivefishstudios
Nov 2, 2022

I'm still new/learning ML, so this may not be the best answer. There is a website called Hugging Face (https://huggingface.co/) where you can download pre-trained models for your specific problem and maybe just further train them for your specific problem.

Example, here is a model that has been trained to recognize four types of entities/tokens: like location (LOC), organizations (ORG), person (PER) and Miscellaneous (MISC).

I fed it a sample sentence of "The XYZ project at New York is delayed due to problems in acquiring building materials by Joe, purchaser at ABC Hardware Company. "

and the model recognized which text are persons, organizations, locations and miscellaneous items.

I guess this model or something similar to it can be used as a starting point. Then further link the org/person to your existing database, maybe use SQL LIKE statement for company names and SQL SOUNDEX for person's names.

2 replies

mrdbourke Nov 9, 2022
Maintainer

Fantastic use case!

elewis33 Nov 10, 2022
Author

Thanks so much @fivefishstudios! I'm pretty determined to make this into something useful so your suggestions are greatly appreciated. I'll report back if I make any sort of reasonable progress.

mrdbourke · 2022-11-09T23:48:36Z

mrdbourke
Nov 9, 2022
Maintainer

Hey @elewis33,

Like @fivefishstudios said, there's a great resource at https://huggingface.co with plenty of pretrained natural language processing (NLP) models.

What you might want to start looking for is a "zero-shot classification model" where you can provide an example text and the model will try to classify the topics based on what's the in the text with no pre-assigned labels.

See this example model here: https://huggingface.co/facebook/bart-large-mnli?candidateLabels=mobile%2C+website%2C+billing%2C+account+access&multiClass=false&text=Last+week+I+upgraded+my+iOS+version+and+ever+since+then+my+phone+has+been+overheating+whenever+I+use+your+app.

After you've tired that, you can go into more of a fine-tuning mode and use your existing folders/archive as the labels for new emails coming through.

So an existing model can be tailored to your own labels.

I'd research something like "how to fine-tune an NLP model huggingface".

But on the whole your use case is ideal for ML as NLP models have drastically improved over the past couple of years.

2 replies

elewis33 Nov 10, 2022
Author

@mrdbourke thanks so much for the response and suggestions. After I got into huggingface.co it got a little overwhelming with how things are organized so I backed off initially. But will definitely jump in there again to look at your pointer and the suggestions you made. I'm really anxious to see if this is anything. /s (snarky Jerry Seinfeld reference). Thanks again!

mrdbourke Nov 15, 2022
Maintainer

There's a lot there! And it will take a while to get around everything. But with practice you will likely discover a fair few things that can help you on your journey!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is my idea a good candidate for ML or DL? #157

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Is my idea a good candidate for ML or DL? #157

Uh oh!

elewis33 Oct 26, 2022

Replies: 2 comments · 4 replies

Uh oh!

fivefishstudios Nov 2, 2022

Uh oh!

mrdbourke Nov 9, 2022 Maintainer

Uh oh!

elewis33 Nov 10, 2022 Author

Uh oh!

Uh oh!

mrdbourke Nov 9, 2022 Maintainer

Uh oh!

elewis33 Nov 10, 2022 Author

Uh oh!

mrdbourke Nov 15, 2022 Maintainer

elewis33
Oct 26, 2022

Replies: 2 comments 4 replies

fivefishstudios
Nov 2, 2022

mrdbourke Nov 9, 2022
Maintainer

elewis33 Nov 10, 2022
Author

mrdbourke
Nov 9, 2022
Maintainer

elewis33 Nov 10, 2022
Author

mrdbourke Nov 15, 2022
Maintainer