Project

Twitter Sentiment Analysis

PythonScikit-learnNLTKStreamlitML

About This Project

An end-to-end machine learning pipeline for Twitter sentiment analysis. The project began with exploratory data analysis on the Sentiment140 dataset (1.6 million tweets) in a Jupyter notebook, where data patterns, text characteristics, and preprocessing techniques were analyzed. Following this analysis, a production-ready sentiment analyzer was built that classifies text into positive or negative sentiments using a Logistic Regression model with TF-IDF vectorization.

Key Features

End-to-end ML pipeline: From raw tweet data exploration to trained model deployment
Real-time sentiment prediction: Interactive web interface for instant text classification
Binary sentiment classification: Positive/Negative detection with confidence scores
Comprehensive text preprocessing: @mention removal, URL cleaning, stopword filtering, and Porter stemming
Model transparency: See how text is processed before prediction
Sample text testing: Pre-loaded examples for quick demonstrations

Technology Stack

ML/Data: Pandas, NumPy, Scikit-learn
NLP: NLTK (Tokenization, Stemming, Stopwords)
Vectorization: TF-IDF (5,000 features, unigrams + bigrams)
Model: Logistic Regression with L2 regularization
Web App: Streamlit
Visualization: Matplotlib, Seaborn

Twitter Sentiment Analysis

About This Project

Key Features

Technology Stack

Video Demo