About This Project
A comprehensive Big Data Analytics dashboard built with Streamlit that analyzes YouTube trending video patterns across 113 countries. Due to the massive dataset size (5.5GB+, 10M+ records), Apache Spark (PySpark) was used for distributed data processing, efficient sampling, and large-scale aggregations. The application visualizes key insights about video performance, engagement metrics, and regional trending patterns through interactive charts and graphs.
Key Features
- Interactive filtering by country and language
- Real-time data visualization with interactive charts
- Analysis of video engagement metrics (views, likes, comments, engagement rate)
- Geographic distribution analysis across 113 countries
- Language-wise content pattern identification
- Top performing channels analysis
- Auto-generated key insights (top region, dominant language, highest engagement)
- Trending activity over time visualization
- Noise reduction with minimum threshold filtering
Technology Stack
- Apache Spark (PySpark) for distributed Big Data processing
- Streamlit for web application framework
- Pandas & NumPy for data manipulation and analysis
- Plotly for interactive visualizations
- Python for backend processing
- Deployed on Streamlit Cloud
