Twitter Sentiment Analysis

Project

Twitter Sentiment Analysis

PythonScikit-learnNLTKStreamlitML

About This Project

An end-to-end machine learning pipeline for Twitter sentiment analysis. The project began with exploratory data analysis on the Sentiment140 dataset (1.6 million tweets) in a Jupyter notebook, where data patterns, text characteristics, and preprocessing techniques were analyzed. Following this analysis, a production-ready sentiment analyzer was built that classifies text into positive or negative sentiments using a Logistic Regression model with TF-IDF vectorization.

Key Features

  • End-to-end ML pipeline: From raw tweet data exploration to trained model deployment
  • Real-time sentiment prediction: Interactive web interface for instant text classification
  • Binary sentiment classification: Positive/Negative detection with confidence scores
  • Comprehensive text preprocessing: @mention removal, URL cleaning, stopword filtering, and Porter stemming
  • Model transparency: See how text is processed before prediction
  • Sample text testing: Pre-loaded examples for quick demonstrations

Technology Stack

  • ML/Data: Pandas, NumPy, Scikit-learn
  • NLP: NLTK (Tokenization, Stemming, Stopwords)
  • Vectorization: TF-IDF (5,000 features, unigrams + bigrams)
  • Model: Logistic Regression with L2 regularization
  • Web App: Streamlit
  • Visualization: Matplotlib, Seaborn

Video Demo