YouTube Data Analytics

Project

YouTube Data Analytics

PythonPySparkStreamlitBig DataPlotly

About This Project

A comprehensive Big Data Analytics dashboard built with Streamlit that analyzes YouTube trending video patterns across 113 countries. Due to the massive dataset size (5.5GB+, 10M+ records), Apache Spark (PySpark) was used for distributed data processing, efficient sampling, and large-scale aggregations. The application visualizes key insights about video performance, engagement metrics, and regional trending patterns through interactive charts and graphs.

Key Features

  • Interactive filtering by country and language
  • Real-time data visualization with interactive charts
  • Analysis of video engagement metrics (views, likes, comments, engagement rate)
  • Geographic distribution analysis across 113 countries
  • Language-wise content pattern identification
  • Top performing channels analysis
  • Auto-generated key insights (top region, dominant language, highest engagement)
  • Trending activity over time visualization
  • Noise reduction with minimum threshold filtering

Technology Stack

  • Apache Spark (PySpark) for distributed Big Data processing
  • Streamlit for web application framework
  • Pandas & NumPy for data manipulation and analysis
  • Plotly for interactive visualizations
  • Python for backend processing
  • Deployed on Streamlit Cloud

Video Demo