Fake News Detection
Fake News Detection
Introduction
- How many of you verify each and every piece of news you read before believing it?
- Research shows that 1 in 2 Indians receive fake news via Whatsapp and Facebook.
- Fake news is news designed to deliberately spread disinformation.
- The spread of fake news has far-reaching consequences.
- Spammers use appealing news headings to generate revenue using advertisements.
- We aim to perform binary classification of various news articles available online with the help of concepts pertaining to artificial intelligence, Natural Language Processing and Machine Learning.
Objective
- To characterize and analyze Fake News threat detection
- To check the truthfulness of major claims in social media news articles to decide the news veracity.
NLP Classifier
The aim of this project is to determine the authenticity of the contents of a particular news article accurately. To achieve this, a model is built that would classify text to determine whether a certain news article is fake or real. The dataset used initially for this project is a dataset downloaded from Kaggle namedfake_or_real_news.csv. The dataset consisted of various attributes like ID, Title, Text andLabel.. Label attribute specified the authenticity of the news article. The news articles were classified in binary form as REAL or FAKE. The dataset involved a fair collection of about 4000 news articles classified in the above-mentioned labels. It was composed of a distribution of 50% real and 50% fake news articles.
Text Preprocessing
Several preprocessing
techniques are used to get attuned datasets for further modeling. Some of them
involved Lowercasing the text, removal of words with one or two letters or with
numbers, tokenization, removal of stop words, lemmatization, etc.
Modeling and
training the dataset
During the modeling step,
ML model is selected, trained, validated and tested. In this project we have
tried modelling dataset using three algorithms like SVM (Support Vector
Machine), Naïve Bayes Classifier and MaxEnt Classifier. Based on the performance
of each algorithm and its accuracy, one of them is selected to predict the results of test dataset.
Feature
Engineering
Various Feature Engineering techniques are applied to the dataset and it is rerun through all the three models in order to notice change inaccuracy and to improve the performance of models.
Some of them are:
- TF-IDF
TF-IDF stands for Term Frequency – Inverse Document Frequency. It is one
of the It is based on the importance of a word depending upon the occurrence in the
documents.
- Trigram Vectorizer
This vectorization process vectorizes triplets of words instead of single
word.
On analysis, MaxEnt Model on Trigram Vectorizer + TF-IDF Vectorizer is chosen the Best Model with the Highest Score of 0.94393. Hence is used to Predict and Classify the test set.
Figure 1:Dataflow diagram |
Web
Application Architecture:
A web interface is created in which the user can enter some news text and click on a button for the application to preprocess the input using additional "Preprocess Module" and feed it to the trained model and show the classification result back on screen.
Frontend/UI:
(HTML 5/CSS3 /JS-REACT)
- This is going to be a simple one page with one input Text Area which accepts input text/news article(Minimum of 25 words)and a Button to classify text.
- As mentioned, React would be a better fit for such straightforward function.
- The result of classification is shown on the page with "REAL OR FAKE" Tag, along with complete text preprocessing pipeline consisting of "INPUT-PREPROCESSED-POS TAGGED TEXT".
Backend/Server:
- Since we need to “unpickle” our model (the pipeline) to use it, the best choice would be a python web server that can receive the input over HTTP and return back the prediction result.
- One of the easiest and most straightforward frameworks for this is Flask.
- Another addition to the server would be implementing a “random picker” that can fetch one random news article at a time from the test dataset.
- This will be used to populate the UI input field.
- The goal is to make it easier for the user to test the application, without actually writing a news article.
figure 2: Working |
Technology Stack
- Client-Side
- Html
- CSS
- JavaScript
- React
- Server-side :
- Flask
- Python
Features
- Use of various classifier algorithms to find the accuracy of the news whether it is REAL or FAKE.
- Also shows a text processing pipeline consisting of Original text, Preprocessed text and POS-tagged text
Conclusion
- Our work focuses on understanding and detecting fake news stories that are disseminated on social media. To accomplish this goal, these works explore several types of features extracted from news stories, including sources and posts from social media.
- A large body of recent works have focused on understanding and detecting fake news stories that are disseminated on social media. To accomplish this goal, these works explore several types of features extracted from news stories, including source and posts from social media.
- Our results reveal interesting findings on the usefulness and importance of features for detecting false news.
- Finally, we discuss how fake news detection approaches can be used in the practice, highlighting challenges and opportunities.
Acknowledgment
It gives me immense pleasure in
submitting the mega-project blog on Fake News Detection to my guide Prof.
Chandrajeet Borkar and Head of Department Dr. Latesh Bhagat,
Computer Science and Engineering department, who was a constant
source of guidance and inspiration through the seminar work.
I am also very thankful to all the staff members
of the Computer Science and Engineering department; whose encouragement
and suggestions helped me to complete the Mega-Project work.
Project Details
- College
: Government College of Engineering Nagpur
- Department
: Computer Science and Engineering
- Project
Guide : Prof. Chandrajeet Borkar
- Group
Members :
- Indrakshi
Basu
- Devyani
Kadu
- Pinak Acharya
- Shrikant Nimkar
- Shubham Rajput
Comments
Post a Comment