Hello👋, I am Andrejs

Hello👋, I am Andrejs
Fake news detection

Hello👋, I am Andrejs

I am a passionate data scientist and ML engineer on a continuous journey of learning by doing, and this is my MLOps portfolio project for the MLOps Zoomcamp course.

Fake news detection

Problem description

The expansion of information outlets in the digital era is akin to a two-sided coin. On one hand, it has equalized the distribution of knowledge and news, but on the other, it has facilitated the dissemination of misinformation and spurious news. Such misleading information has the potential to warp public conversation, sway personal convictions, and potentially manipulate the results of elections and public health policies.

Importance

Considering the significant implications, the urgency for robust and scalable methods to distinguish authentic news from misinformation is paramount. This is an area where the power of machine learning can be crucially employed.

Project goals

This project goal is two-fold. Apparent goal is to derive a machine learning model that can efficiently categorize news articles as genuine or fake, relying on their content and headline. We will harness natural language processing methodologies and advanced machine learning algorithms. Our model is designed to scrutinize the textual characteristics of news articles for its classification.

But the actual learning goal of the MLOps Zoomcamp is to create an example or template repository employing most up-to-date MLOps and data management practices. Therefore, we utilize the first goal above as a backbone to a comprehensive ML engineering workflow, spanning every stage of the MLOps process. See Solution architecture for more information.

Fast-track run

In short, to replicate the project one needs to:

Fullfil the Pre-requisites
Setup the Infrastructure
Setup Orchestration and Experiment tracking
Deploy Prefect flows
Run model training at least once.
Deploy the best model as a web service either locally or using Cloud Run
Monitoring
Best practices

For full understanding, please, refer to Solution architecture, Project organization, full list of project components and overall Project progress.

Project progress

Project completion evaluation according to the Zoomcamp criteria see in #Zoomcamp-criteria-self-evaluation. See the overview of implemented features and TODOs in PROJECT_PROGRESS.md.

Datasets

This project is primarily based on Fake and real news dataset @ Kaggle

As noted by the community, data collection methodology of this dataset is questionable, which is probably why it is possible to reach very high accuracy scores close to 100%.

Though, this does not disturb the main goals of this project as it is focused on the MLOps and best software practices while working with ML tasks. This particular NLP application is just an example. Adding more datasets for training and validation is on the project TODO list.

Solution architecture

Project organization

==Generate tree and describe each folder\file==

Project components and reproducibility

Area	Description
Problem description	Explains the project goals, motivation and general outline
Infrastructure	Shows how to setup GCP project, cloud resources, venv, VM tooling, etc.
Orchestration	Controlled and scheduled execution of flows and tasks using Prefect
Training	Train an LSTM model for text classification
Deployment	Deploy the model as a service for inference
Monitoring
Best practices	unit-testing, integration tests, auto-formatting, etc.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
best practices		best practices
credentials		credentials
data		data
deployment		deployment
infrastructure		infrastructure
monitoring		monitoring
orchestration		orchestration
tests		tests
tracking		tracking
training		training
utilities		utilities
.env_sample		.env_sample
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.prefectignore		.prefectignore
Makefile		Makefile
PREREQUISITES.md		PREREQUISITES.md
PROJECT_PROGRESS.md		PROJECT_PROGRESS.md
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hello👋, I am Andrejs

Fake news detection

Problem description

Importance

Project goals

Fast-track run

Project progress

Datasets

Solution architecture

Project organization

Project components and reproducibility

About

Releases

Packages

Languages

fluentnumbers/mlops_pipeline_fake_news

Folders and files

Latest commit

History

Repository files navigation

Hello👋, I am Andrejs

Fake news detection

Problem description

Importance

Project goals

Fast-track run

Project progress

Datasets

Solution architecture

Project organization

Project components and reproducibility

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages