Real-time face detection using YuNet and CLI automation

Real-time face detection powers everything from access control to live streaming overlays. YuNet—a lightweight, millisecond-level detector bundled with OpenCV—delivers the needed speed and accuracy without a heavyweight GPU setup. This DevTip shows how to pair YuNet with straightforward CLI automation to process images and videos at scale.
Understand yunet’s strengths
YuNet is a deep-learning face detector that lives inside OpenCV’s DNN module. It balances speed, accuracy, and size, making it ideal for edge devices and serverless functions.
• Detection Range: Detects faces from roughly 10 × 10 pixels up to 300 × 300 pixels. • Performance: Scores 0.8844 (AP_easy), 0.8656 (AP_medium), and 0.7503 (AP_hard) on the WIDER Face validation set. • Efficiency: Runs in milliseconds on modest CPUs.
Citation:
@article{wu2023yunet,
title={Yunet: A tiny millisecond-level face detector},
author={Wu, Wei and Peng, Hanyang and Yu, Shiqi},
journal={Machine Intelligence Research},
volume={20},
number={5},
pages={656--665},
year={2023},
publisher={Springer}
}
Set up YuNet quickly
Create an isolated Python environment and grab OpenCV:
python3 -m venv venv
source venv/bin/activate # On Windows: .\venv\Scripts\activate
pip install --upgrade opencv-python opencv-python-headless
Download the latest model (face_detection_yunet_2023mar.onnx, as of March 2023):
curl -fsSLo face_detection_yunet.onnx \
https://github.com/opencv/opencv_zoo/raw/main/models/face_detection_yunet/face_detection_yunet_2023mar.onnx
Automate single-image detection
detect_faces.py
accepts a file path, draws rectangles, and writes output.jpg
:
import cv2, sys, os
MODEL_PATH = 'face_detection_yunet.onnx'
# This is the input size the YuNet model expects (width, height).
# Images will be resized to this size before being fed into the network.
NETWORK_INPUT_SIZE = (320, 320)
if not os.path.isfile(MODEL_PATH):
sys.exit(f"❌ Model file not found: {MODEL_PATH}")
img_path = sys.argv[1] if len(sys.argv) > 1 else ''
if not img_path:
sys.exit("❌ Please provide an image path as an argument.")
if not os.path.isfile(img_path):
sys.exit(f"❌ Image not found: {img_path}")
img = cv2.imread(img_path)
if img is None:
sys.exit(f"❌ Could not read image: {img_path}")
fd = cv2.FaceDetectorYN_create(MODEL_PATH, '', NETWORK_INPUT_SIZE)
# Set the input size for the detection Step to the image's actual dimensions.
# This helps the detector scale results correctly to the original image coordinates.
fd.setInputSize((img.shape[1], img.shape[0]))
_, faces = fd.detect(img)
if faces is not None:
for face_data in faces:
# face_data: [x, y, w, h, re_x, re_y, le_x, le_y, nt_x, nt_y, rcm_x, rcm_y, lcm_x, lcm_y, score]
box = list(map(int, face_data[:4]))
x, y, w, h = box[0], box[1], box[2], box[3]
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
else:
print(f"⚠️ No faces detected in {img_path}")
output_filename = 'output.jpg'
cv2.imwrite(output_filename, img)
print(f"✅ Saved to {output_filename} (processed {img_path})")
Run it:
python detect_faces.py path/to/photo.jpg
Batch-process folders
A short shell loop works, but Python’s multiprocessing
can fully saturate CPU cores:
# Batch_detect.py
from glob import glob
from multiprocessing import Pool
from subprocess import run
import os
# Ensure detect_faces.py is in the same directory or adjust path
SCRIPT_PATH = os.path.join(os.path.dirname(__file__), 'detect_faces.py')
images = glob('images/*.jpg') # Assumes images are in an 'images' subdirectory
if not images:
print("No .jpg images found in the 'images' folder.")
else:
print(f"Found {len(images)} images to process.")
with Pool() as p:
# Note: This will create multiple 'output.jpg' files, each overwriting the previous.
# For unique outputs, detect_faces.py would need modification to accept output filenames.
p.map(lambda f: run(['python', SCRIPT_PATH, f]), images)
print("Batch processing complete. Each result was saved as output.jpg.")
# Create an 'images' folder and put your JPGs there first
python batch_detect.py
Handle video streams
The snippet below streams, annotates, and saves output.mp4
while guarding resources:
import cv2, sys, os
MODEL_PATH = 'face_detection_yunet.onnx'
NETWORK_INPUT_SIZE = (320, 320)
if not os.path.isfile(MODEL_PATH):
sys.exit(f"❌ Model file not found: {MODEL_PATH}")
video_path = sys.argv[1] if len(sys.argv) > 1 else ''
if not video_path:
sys.exit("❌ Please provide a video path as an argument.")
if not os.path.isfile(video_path):
sys.exit(f"❌ Video not found: {video_path}")
cap = None
out = None
try:
fd = cv2.FaceDetectorYN_create(MODEL_PATH, '', NETWORK_INPUT_SIZE)
cap = cv2.VideoCapture(video_path)
if not cap.isOpened():
sys.exit(f"❌ Cannot open video: {video_path}")
frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = cap.get(cv2.CAP_PROP_FPS)
if fps == 0:
fps = 30 # Default to 30 FPS
fd.setInputSize((frame_width, frame_height))
output_filename = 'output.mp4'
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter(output_filename, fourcc, fps, (frame_width, frame_height))
if not out.isOpened():
sys.exit(f"❌ Could not open VideoWriter for {output_filename}")
print(f"Processing video: {video_path}...")
frame_count = 0
detected_faces_count = 0
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
frame_count += 1
_, faces = fd.detect(frame)
if faces is not None:
detected_faces_count += len(faces)
for face_data in faces:
box = list(map(int, face_data[:4]))
x, y, w, h = box[0], box[1], box[2], box[3]
cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)
out.write(frame)
if frame_count > 0:
print(f"✅ Video saved as {output_filename} ({frame_count} frames processed, {detected_faces_count} faces detected).")
else:
print(f"⚠️ No frames processed from {video_path}")
except Exception as e:
print(f"An error occurred: {e}")
finally:
if cap is not None:
cap.release()
if out is not None:
out.release()
Tune for real-time speed
• Network Input Size: Lower the NETWORK_INPUT_SIZE
in the Python scripts (e.g., to
(240, 240)
) to cut inference time. This is the size the image is resized to before network
processing. Smaller sizes are faster but might miss smaller faces or be less accurate. • GPU
Acceleration: Prefer OpenCV’s CUDA backend if a compatible NVIDIA GPU and drivers are available.
Initialize the detector like this:
# Make sure OpenCV is built with cuda support
fd = cv2.FaceDetectorYN_create(
MODEL_PATH, '', NETWORK_INPUT_SIZE,
score_threshold=0.9, nms_threshold=0.3, top_k=5000,
backend_id=cv2.dnn.DNN_BACKEND_CUDA,
target_id=cv2.dnn.DNN_TARGET_CUDA
)
• Processing Strategy: Pin one detector instance per CPU core for parallel processing instead of spawning unbounded threads. • Frame Skipping: For video, process every Nth frame (e.g., every second or third frame) to maintain a higher output FPS in resource-constrained environments, at the cost of temporal smoothness in detection.
Compare alternative detectors briefly
| Model | Speed | Accuracy | When to pick it | | ----------------- | -----: | -------: | ---------------------------------------------- | --- | ------------- | -------- | --- | ------------------------------------------- | | YuNet (this post) | ⚡⚡⚡ | ⚡⚡ | Real-time apps on CPU or edge devices | | OpenCV DNN SSD | ⚡ | ⚡⚡⚡ | Highest accuracy when latency is less critical | n | Haar Cascades | ⚡⚡⚡⚡ | ⚡ | Legacy projects or ultra-low-power hardware |
Wrap-up
YuNet’s lean architecture, paired with a few CLI scripts, lets you batch- or stream-process faces in seconds, not minutes. Give it a spin on your next project—and if you ever need to run computer-vision pipelines at cloud scale, check out Transloadit’s Artificial Intelligence service.