Developed a real-time bird detection and identification system capable of accurately detecting and classifying approximately 900 bird species with an average latency of ~400ms. Upon detection, the system automatically sends an email alert.
Picamera2 → Capture frame
↓
NCNN INT8 YOLOv11n (~300 ms)
↓
Postprocess (NMS, confidence threshold)
↓
Bird detected?
├─ YES → Lookup species → Save image → Email notification
└─ NO → Discard frame
↓
Sleep / next capture
Base Model: YOLOv11n (nano variant) configured for single-class bird detection with 320×320 training resolution.
Training Setup:
- Batch size: 32
- Dataset: Google open image dataset V7 (~27,000+ annotated bird bounding boxes across diverse backgrounds like water, grass, rocks, sky and more)
After training, the YOLOv11n model was exported from the training framework and converted into NCNN format (.param + .bin), which is designed for efficient inference on ARM CPUs.
NCNN’s quantization tooling was used to convert the model to INT8 weights and activations:
Compression: FP32 → INT8 (roughly 4× smaller model).
Speedup: around 1.3–1.4× faster on the Pi Zero 2W CPU compared to FP32 NCNN.
Accuracy impact: <5% relative drop (mAP@0.5 from ≈0.79 to ≈0.75).
On-device, inference is performed using the NCNN C++/C API (or its Python binding, depending on setup):
Picamera2 captures frames and resizes them (e.g., to 224×224 or another tuned size).
The NCNN pipeline performs preprocessing, INT8 inference and postprocessing (decoding + NMS).
Only a single thread is used to match the Pi Zero’s limited CPU resources.
Result: roughly 450ms end‑to‑end latency per frame (capture → NCNN inference → postprocess).
The model maintains high precision (~0.80) across a wide confidence range, with recall gradually dropping as threshold increases. This suggests the model is well-calibrated and benefits from careful threshold tuning rather than retraining.
True positives: 84% (normalized).
False negatives: 16% (birds missed at low confidence).
False positives: <1% on backgrounds (grass, water, rocks).
The model rarely "hallucinates" birds, which is critical for a bird feeder camera—fewer false alarms mean more reliable notifications.
This project demonstrates that sophisticated computer vision—reliable bird detection with 0.79 mAP—can run continuously on a $15 microcomputer without cloud infrastructure or GPU acceleration. The combination of model selection (YOLOv11n), aggressive quantization, and careful edge optimization bridges the gap between research-grade accuracy and production-grade efficiency on ultra-constrained hardware.