Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question, which model users for Fraud Prediction #274

Open
fhferreira opened this issue Jan 30, 2023 · 2 comments
Open

Question, which model users for Fraud Prediction #274

fhferreira opened this issue Jan 30, 2023 · 2 comments
Labels
question Further information is requested

Comments

@fhferreira
Copy link

fhferreira commented Jan 30, 2023

I am checking a solution to prevent "fraudster" to create "store/ecommerces" to sell products as a fraud only.

Example:
Product: Stove brand Consul
Price: 100
Real price at normal shoppings: 500

Product: Washing machine Eletroclux
Price: 119
Real price at normal shoppings: 900

I am new in Machine Learning, so I would like a suggestion.

@andrewdalpino
Copy link
Member

andrewdalpino commented Feb 3, 2023

Alot of times, fraud detection can be framed in the context of anomaly detection which is an unsupervised approach. The problem with a supervised approach is that it is sometimes not practical to accumulate enough labeled samples that represent fraud situations. The prior probability is just too low i.e. people are generally honest. Fortunately, this skew is acknowledged and handled by most Anomaly Detectors by adjusting the contamination hyper-parameter.

https://docs.rubixml.com/2.0/what-is-machine-learning.html#anomaly-detection

If you took this approach, you can start with a simple Anomaly Detector such as Gaussian MLE and if you need more flexibility, Loda and Isolation Forests work pretty well.

If you went with a supervised approach, you can train a classifier to classify "fraud" or "not fraud" but be mindful if you are using a highly imbalanced dataset (mostly not fraud samples). Some classifiers such as Random Forest will compensate for imbalanced datasets, but it's no substitute for actually having more data to represent the fraud case.

https://docs.rubixml.com/2.0/what-is-machine-learning.html#classification

Hope this helps!

@fhferreira
Copy link
Author

andrewdalpino

tks man, helped a lot.

@andrewdalpino andrewdalpino added the question Further information is requested label May 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants