Blur Faces in Videos with MTCNN + OpenCV on Podstack

Introduction

When building video datasets that contain real people - such as stock footage, surveillance clips, or user-generated content - protecting the privacy of individuals is critical. Faces must be anonymised before any dataset can be responsibly published or shared.

In this tutorial, you will walk through a Jupyter notebook - faceblur_opencv.ipynb (GITHUB LINK TO REPOSITORY) - that runs entirely on Podstack.ai using the PyTorch CUDA 12 + OpenCV template. The notebook is organised into self-contained cells, each building on the last. By the time you reach the final cell, it will have:

Streamed and filtered the WebVid-10M dataset to find videos containing people
Downloaded those videos using Python's requests library
Verified each file is readable with OpenCV
Detected every face in every frame using MTCNN (Multi-task Cascaded Convolutional Networks) on GPU
Blurred each detected face using OpenCV's Gaussian blur
Written the anonymised frames to new video files
Archived everything into a single zip for download

The notebook produced these results on a single Podstack GPU pod: 550 videos processed, 256,326 frames read, 171,480 faces blurred - in approximately 92 minutes.

Prerequisites

Before you begin, you will need:

A Podstack.ai account - sign up and claim your joining bonus to receive free GPU credits
A pod launched from the PyTorch CUDA 12 + OpenCV template, which comes with torch, torchvision, opencv-python, and CUDA 12 pre-installed
The notebook file faceblur_opencv.ipynb, which you can upload directly to your pod's Jupyter environment

The following additional packages are installed inside the notebook itself in the first cell, so no manual setup is required:

datasets - for streaming WebVid-10M from Hugging Face
requests - for downloading video files
facenet-pytorch - for MTCNN face detection
tqdm - for progress tracking

Step 1 - Launching Your Podstack Pod and Opening the Notebook

Log in to Podstack.ai and click New Pod. From the template gallery, select the PyTorch CUDA 12 + OpenCV template. This template ships with:

Python 3.10
PyTorch with CUDA 12 support
OpenCV pre-built with video codec support
JupyterLab accessible directly from your browser

Once your pod is running, click Open JupyterLab from the pod dashboard. In the JupyterLab file browser, upload faceblur_opencv.ipynb using the upload button, then double-click it to open it.

Note: The Podstack PyTorch CUDA 12 + OpenCV template pre-configures all CUDA environment variables. You do not need to set CUDA_HOME or install GPU drivers manually - the pod handles this for you.

Step 2 - Cell 1: Exploring the Dataset

The first cell loads the WebVid-10M dataset in streaming mode and prints the very first entry to confirm the connection is working.

python

from datasets import load_dataset

ds = load_dataset(
    "TempoFunk/webvid-10M",
    split="train",
    streaming=True
)

sample = next(iter(ds))

print(sample)

Cell output:

text

{'videoid': 21179416, 'contentUrl': 'https://ak.picdn.net/shutterstock/videos/21179416/preview/stock-footage-aerial-shot-winter-forest.mp4', 'duration': 'PT00H00M11S', 'page_dir': '006001_006050', 'name': 'Aerial shot winter forest'}

The dataset is loaded in streaming mode - no data is cached locally. The next(iter(ds)) call fetches only the first entry over the network, confirming the dataset is accessible without downloading all 10 million records.

Note: streaming=True means each next() call fetches one entry from the Hugging Face servers in real time. This is ideal for large datasets where you only need a subset.

Step 3 - Cell 2: Filtering for Videos That Contain People

The second cell adds a keyword filter on the video caption (name field) to find clips likely to contain human faces. Only videos whose captions include words like "woman", "man", "person", or "face" are kept.

python

human_keywords = [
    "man", "woman", "person",
    "girl", "boy", "face",
    "people", "human"
]

filtered = (
    x for x in ds
    if any(k in x["name"].lower() for k in human_keywords)
)

sample = next(filtered)

print(sample["name"])
print(sample)

Cell output:

text

Young beautiful woman using smartphone in cafe
{'videoid': 21157780, 'contentUrl': 'https://ak.picdn.net/shutterstock/videos/21157780/preview/stock-footage-young-beautiful-woman-using-smartphone-in-cafe.mp4', 'duration': 'PT00H00M09S', 'page_dir': '136051_136100', 'name': 'Young beautiful woman using smartphone in cafe'}

This keyword approach is a fast, cheap heuristic - it will not catch every video containing a face, but it dramatically narrows the candidate pool before any expensive GPU inference runs.

Step 4 - Cell 3: Downloading a Sample Video

The third cell downloads the filtered video using requests in streaming mode. Chunked downloading avoids loading the entire file into memory at once, which matters when working with many files.

python

import requests

video_url = "https://ak.picdn.net/shutterstock/videos/21157780/preview/stock-footage-young-beautiful-woman-using-smartphone-in-cafe.mp4"

response = requests.get(video_url, stream=True)

with open("sample.mp4", "wb") as f:
    for chunk in response.iter_content(chunk_size=8192):
        f.write(chunk)

print("download complete")

Cell output:

text

download complete

Step 5 - Cell 4: Verifying the Video with OpenCV

Before committing GPU time to a file, this cell checks that OpenCV can open it and successfully read at least one frame. This guards against corrupt downloads and codec-incompatible files - both of which appear in real-world datasets.

python

import cv2

cap = cv2.VideoCapture("sample.mp4")

print("opened:", cap.isOpened())

ret, frame = cap.read()

print("frame read:", ret)

if ret:
    print("frame shape:", frame.shape)

cap.release()

Cell output:

text

opened: True
frame read: True
frame shape: (316, 600, 3)

The tuple (316, 600, 3) represents height, width, and the three BGR colour channels OpenCV uses by default. If opened returns False, the file is either missing, corrupt, or using an unsupported codec.

Step 6 - Cell 5: Running the Face Detection and Blur Loop

This is the core cell of the notebook. It processes every downloaded video file frame by frame - using MTCNN for face detection and OpenCV's Gaussian blur for anonymisation - then writes each modified frame to a new output file.

python

import cv2
import torch
from facenet_pytorch import MTCNN
import os
from tqdm import tqdm

device = "cuda" if torch.cuda.is_available() else "cpu"

mtcnn = MTCNN(keep_all=True, device=device)

input_dir = "videos"
output_dir = "blurred_videos"

os.makedirs(output_dir, exist_ok=True)

video_files = [
    f for f in os.listdir(input_dir)
    if f.endswith(".mp4")
]

total_frames = 0
total_faces = 0
processed_videos = 0
failed_videos = 0

for vf in tqdm(video_files):

    try:
        input_path = os.path.join(input_dir, vf)
        output_path = os.path.join(output_dir, vf)

        cap = cv2.VideoCapture(input_path)

        fps = int(cap.get(cv2.CAP_PROP_FPS))
        w   = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
        h   = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))

        out = cv2.VideoWriter(
            output_path,
            cv2.VideoWriter_fourcc(*'mp4v'),
            fps,
            (w, h)
        )

        while True:
            ret, frame = cap.read()

            if not ret:
                break

            total_frames += 1

            # MTCNN expects RGB; OpenCV reads BGR
            rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

            boxes, probs = mtcnn.detect(rgb)

            if boxes is not None:
                for box in boxes:
                    x1, y1, x2, y2 = map(int, box)

                    face = frame[y1:y2, x1:x2]

                    # Guard against out-of-frame bounding boxes
                    if face.size > 0:
                        blurred = cv2.GaussianBlur(face, (51, 51), 30)
                        frame[y1:y2, x1:x2] = blurred
                        total_faces += 1

            out.write(frame)

        cap.release()
        out.release()

        processed_videos += 1

    except Exception as e:
        failed_videos += 1
        print("failed:", vf, e)

print("processed videos:", processed_videos)
print("failed videos:   ", failed_videos)
print("total frames:    ", total_frames)
print("total faces blurred:", total_faces)

Cell output:

text

100%|██████████| 550/550 [1:32:00<00:00, 10.04s/it]
processed videos:  550
failed videos:     0
total frames:      256326
total faces blurred: 171480

The tqdm progress bar updates live in the notebook output area as each video is processed. Some videos emitted codec warnings to stderr - Unable to read codec parameters and moov atom not found - but these were caught by the try/except block and did not interrupt the run.

Understanding the Key Parameters

keep_all=True tells MTCNN to return bounding boxes for every face in the frame, not just the highest-confidence one. This is essential for crowd scenes or any frame with more than one person.

cv2.GaussianBlur(face, (51, 51), 30) applies a Gaussian blur with a 51×51 kernel and standard deviation of 30. A larger kernel produces a heavier blur. The kernel dimensions must always be odd integers. This setting renders faces unrecognisable without leaving a visually jarring black rectangle over the region.

cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) is required on every frame because OpenCV reads video as BGR by default, while MTCNN was trained on RGB images. Skipping this conversion leads to noticeably degraded detection accuracy.

cv2.VideoWriter_fourcc(*'mp4v') uses the MPEG-4 codec for output, which is broadly compatible across platforms. If you need H.264 output, replace 'mp4v' with 'avc1', though availability depends on your OpenCV build.

Warning: If MTCNN returns a bounding box that partially falls outside the frame boundaries, slicing frame[y1:y2, x1:x2] can produce an empty array. The face.size > 0 guard prevents a crash in this case.

Step 7 - Cell 6: Archiving the Output

The final cell zips the entire blurred_videos directory into a single archive for easy download.

python

import shutil

shutil.make_archive("blurred_videos", "zip", "blurred_videos")

print("zip created")

Cell output:

text

zip created

This produces blurred_videos.zip in the notebook's working directory. You can download it directly from the JupyterLab file browser by right-clicking the file and selecting Download.

Conclusion

In this tutorial, you walked through faceblur_opencv.ipynb cell by cell - a Jupyter notebook running on Podstack's PyTorch CUDA 12 + OpenCV template. Across six cells, the notebook streamed a large video dataset, filtered for human-containing clips, downloaded and verified them, ran GPU-accelerated face detection with MTCNN, applied Gaussian blur to every detected face, and packaged the results for download. No local environment setup, no driver installation, and no infrastructure management was needed.

What to Try Next

To extend the notebook further, consider these improvements directly in new cells:

Skip frames. At 30fps, consecutive frames are nearly identical. Running MTCNN every 3rd or 5th frame and reusing bounding boxes in between cuts inference time significantly.
Use MTCNN's batch mode. Pass a list of frames instead of one at a time to better saturate the GPU.
Try a faster detector. YOLOv8-face and RetinaFace offer higher throughput than MTCNN if processing speed is the priority.
Process multiple videos in parallel. Wrap the video loop in a ThreadPoolExecutor to process several files concurrently.

Run This Notebook on Podstack

This notebook was built and executed entirely on Podstack.ai using the PyTorch CUDA 12 + OpenCV template - no local CUDA setup, no driver headaches, no dependency conflicts. The pod was live and the notebook was running in under a minute.

To run it yourself:

Go to podstack.ai and create a free account
Claim your joining bonus - new users receive free GPU credits on sign-up
Launch a new pod using the PyTorch CUDA 12 + OpenCV template
Upload faceblur_opencv.ipynb via JupyterLab and run each cell from top to bottom

The Podstack template gallery also includes pre-built example notebooks for common computer vision tasks - object detection, image segmentation, video processing, and more - so you can explore and adapt them without starting from scratch.

Get started on Podstack →

Don't forget to claim your joining bonus when you sign up - it gives you free GPU credits to try out the notebook examples immediately, at no cost.

How To Blur Faces in Videos Using a Jupyter Notebook on Podstack

Introduction

Prerequisites

Step 1 - Launching Your Podstack Pod and Opening the Notebook

Step 2 - Cell 1: Exploring the Dataset

Step 3 - Cell 2: Filtering for Videos That Contain People

Step 4 - Cell 3: Downloading a Sample Video

Step 5 - Cell 4: Verifying the Video with OpenCV

Step 6 - Cell 5: Running the Face Detection and Blur Loop

Understanding the Key Parameters

Step 7 - Cell 6: Archiving the Output

Conclusion

What to Try Next

Run This Notebook on Podstack

Related posts

How to Generate Multilingual Video Ads with ComfyUI, Wan 2.2, and Sarvam AI

How To Fine-Tune an LLM with Unsloth Studio on Podstack

Podstack vs. Runpod vs. CoreWeave: Which Cloud GPU Platform Should You Choose in 2026?

Subscribe to the Podstack blog