Project

AV Sync Detection

PythonDeep LearningSyncNetComputer Vision

About This Project

This project was developed as a solution to a problem statement given by AMAGI, a leading media technology company.

It implements a deep learning-based audio-visual synchronization detection system using the SyncNet architecture. The system analyzes video content to detect lip-sync errors and dubbing mismatches by comparing audio signals with visual lip movements. It uses a Fully Convolutional Network (FCN) approach to process video frames and audio spectrograms, generating synchronization confidence scores that indicate whether the audio and video are properly aligned.

Key Features

Detects lip-sync errors in video content with high accuracy
Uses SyncNet FCN architecture for temporal alignment analysis
Processes both audio waveforms and video frames simultaneously
Generates frame-by-frame synchronization confidence scores
Identifies dubbing mismatches in movies and TV shows
Supports batch processing of multiple video files

Technology Stack

Python for core development
PyTorch/TensorFlow for deep learning implementation
SyncNet architecture with FCN modifications
OpenCV for video frame extraction
Librosa for audio processing and spectrogram generation
NumPy for numerical computations

AV Sync Detection

About This Project

Key Features

Technology Stack

Video Demo