Fake News Detection

Introduction

How many of you verify each and every piece of news you read before believing it?
Research shows that 1 in 2 Indians receive fake news via Whatsapp and Facebook.
Fake news is news designed to deliberately spread disinformation.
The spread of fake news has far-reaching consequences.
Spammers use appealing news headings to generate revenue using advertisements.
We aim to perform binary classification of various news articles available online with the help of concepts pertaining to artificial intelligence, Natural Language Processing and Machine Learning.

Objective

To characterize and analyze Fake News threat detection
To check the truthfulness of major claims in social media news articles to decide the news veracity.

NLP Classifier

The aim of this project is to determine the authenticity of the contents of a particular news article accurately. To achieve this, a model is built that would classify text to determine whether a certain news article is fake or real. The dataset used initially for this project is a dataset downloaded from Kaggle namedfake_or_real_news.csv. The dataset consisted of various attributes like ID, Title, Text andLabel.. Label attribute specified the authenticity of the news article. The news articles were classified in binary form as REAL or FAKE. The dataset involved a fair collection of about 4000 news articles classified in the above-mentioned labels. It was composed of a distribution of 50% real and 50% fake news articles.

Text Preprocessing

Several preprocessing techniques are used to get attuned datasets for further modeling. Some of them involved Lowercasing the text, removal of words with one or two letters or with numbers, tokenization, removal of stop words, lemmatization, etc.

Modeling and training the dataset

During the modeling step, ML model is selected, trained, validated and tested. In this project we have tried modelling dataset using three algorithms like SVM (Support Vector Machine), Naïve Bayes Classifier and MaxEnt Classifier. Based on the performance of each algorithm and its accuracy, one of them is selected to predict the results of test dataset.

Feature Engineering

Various Feature Engineering techniques are applied to the dataset and it is rerun through all the three models in order to notice change inaccuracy and to improve the performance of models.

Some of them are:

TF-IDF

TF-IDF stands for Term Frequency – Inverse Document Frequency. It is one of the It is based on the importance of a word depending upon the occurrence in the documents.

Trigram Vectorizer

This vectorization process vectorizes triplets of words instead of single word.

On analysis, MaxEnt Model on Trigram Vectorizer + TF-IDF Vectorizer is chosen the Best Model with the Highest Score of 0.94393. Hence is used to Predict and Classify the test set.

Figure 1:Dataflow diagram

Web Application Architecture:

A web interface is created in which the user can enter some news text and click on a button for the application to preprocess the input using additional "Preprocess Module" and feed it to the trained model and show the classification result back on screen.

Frontend/UI:

(HTML 5/CSS3 /JS-REACT)

This is going to be a simple one page with one input Text Area which accepts input text/news article(Minimum of 25 words)and a Button to classify text.
As mentioned, React would be a better fit for such straightforward function.
The result of classification is shown on the page with "REAL OR FAKE" Tag, along with complete text preprocessing pipeline consisting of "INPUT-PREPROCESSED-POS TAGGED TEXT".

Backend/Server:

Since we need to “unpickle” our model (the pipeline) to use it, the best choice would be a python web server that can receive the input over HTTP and return back the prediction result.
One of the easiest and most straightforward frameworks for this is Flask.
Another addition to the server would be implementing a “random picker” that can fetch one random news article at a time from the test dataset.
This will be used to populate the UI input field.
The goal is to make it easier for the user to test the application, without actually writing a news article.

figure 2: Working

Technology Stack

Client-Side

Html
CSS
JavaScript
React

Server-side :

Flask
Python

Features

Use of various classifier algorithms to find the accuracy of the news whether it is REAL or FAKE.
Also shows a text processing pipeline consisting of Original text, Preprocessed text and POS-tagged text

Conclusion

Our work focuses on understanding and detecting fake news stories that are disseminated on social media. To accomplish this goal, these works explore several types of features extracted from news stories, including sources and posts from social media.
A large body of recent works have focused on understanding and detecting fake news stories that are disseminated on social media. To accomplish this goal, these works explore several types of features extracted from news stories, including source and posts from social media.
Our results reveal interesting findings on the usefulness and importance of features for detecting false news.
Finally, we discuss how fake news detection approaches can be used in the practice, highlighting challenges and opportunities.

Acknowledgment

It gives me immense pleasure in submitting the mega-project blog on Fake News Detection to my guide Prof. Chandrajeet Borkar and Head of Department Dr. Latesh Bhagat, Computer Science and Engineering department, who was a constant source of guidance and inspiration through the seminar work.

I am also very thankful to all the staff members of the Computer Science and Engineering department; whose encouragement and suggestions helped me to complete the Mega-Project work.

At last, I am thankful to my friends whose encouragement and constant inspiration helped me to complete this seminar work verbally and theoretically.

Project Details

College : Government College of Engineering Nagpur
Department : Computer Science and Engineering
Project Guide : Prof. Chandrajeet Borkar
Group Members :

Indrakshi Basu
Devyani Kadu
Pinak Acharya
Shrikant Nimkar
Shubham Rajput

Search This Blog

Fake News Detection