CVPR 2026 Tutorial — Half-Day (3.5 hours)

Computer Vision at Scale: Multi-Camera Tracking, Calibration, and Event Detection for Checkout-Free Retail

Hareesh Kolluru
Head of AI/ML, Motive
Motilal Agarwal
Chief Scientist & Co-founder, Zippin
Tanmay Bangalore
Senior Software Engineer, Meta

About

Checkout-free retail represents one of the most challenging real-world computer vision deployments, requiring reliable performance across hundreds of sites and millions of interactions. This tutorial bridges academic research and production deployment by framing checkout-free retail as a canonical large-scale multi-camera vision systems problem.

The tutorial focuses on three foundational pillarsβ€”presented as generalizable computer vision problems rather than application-specific solutions. Each component is discussed in terms of its underlying formulations, scalability constraints, failure modes, and design tradeoffs that transfer directly to autonomous driving, smart spaces, sports analytics, warehouses, and urban sensing.

Automatic Multi-Camera Calibration

Continuous online estimation addressing drift and partial failures using deep learning and conventional CV pipelines.

Real-Time Multi-Camera Tracking

Global data association under asynchronous, unreliable observations via integer programming and graph-based formulations.

Structured Event Detection

Inference from partial visual evidence with edge computing, Kubernetes deployment, and CPU/GPU/TPU optimization.

A central theme is how infrastructure constraintsβ€”including limited bandwidth, latency requirements, camera reliability, and edge computing budgetsβ€”fundamentally shape algorithmic and architectural decisions. Attendees will learn how classical and modern deep learning techniques are adapted for continuous online operation, partial observability, and 99.9%+ system reliability.

Key Learning Objectives

Live Demonstration

Attendees will interact with a live multi-camera perception system demonstrating online calibration, global multi-object tracking, and structured event inference under realistic bandwidth, latency, and hardware constraints. The system operates fully offline using local compute and networking, with pre-recorded fallback visualizations for all scenarios. Each component (calibration, tracking, event inference) can be demonstrated independently.

Broader Impact

While checkout-free retail serves as the motivating application, the tutorial deliberately abstracts away domain-specific details to focus on generalizable formulations applicable to autonomous vehicles, warehouses, smart cities, and sports analyticsβ€”any application requiring automatic multi-camera calibration, real-time edge analysis, robust event detection with limited bandwidth, and scale deployment. The systems discussed have contributed to large-scale commercial deployments recognized through industry innovation awards and government-recognized R&D programs.

Schedule

Duration Topic Presenter
15 min Introduction & Multi-Camera Vision at Scale
  • Industry scale and infrastructure constraints
  • Live demo setup introduction
All
60 min Automatic Multi-Camera Calibration under Scale, Drift, and Partial Observability
  • Deep learning: SuperPoint, LoFTR, bundle adjustment
  • Conventional CV: SIFT/ORB, SfM, homography
  • Production: online refinement, failure detection
▶ Live demo: Calibration visualization
Agarwal
15 min Break — Attendees interact with demo
45 min Real-Time Multi-Camera Tracking under Asynchrony and Missing Data
  • Global optimization via integer programming
  • Graph-based formulation, solver strategies
  • Camera failures, async data, occlusions
▶ Live demo: Tracking & occlusion handling
Kolluru
45 min Structured Event Inference and Reliability in Multi-Camera Vision Systems
  • Hand-shelf interactions, multi-camera fusion
  • Edge computing, Kubernetes deployment
  • CPU/GPU/TPU optimization, simulation stores
  • Case studies and failure modes
▶ Live demo: Product interaction detection
Bangalore + All
30 min Interactive Q&A & Hands-on Demo
  • Attendees test live checkout-free system
All

Organizers

HK

Hareesh Kolluru

Head of AI/ML, Motive
sethuhareesh.kolluru@gmail.com

Previously led deployment of checkout-free shopping platforms to 250+ stores worldwide at Zippin, visual search at Slyce, and served as Principal Architect for Self-Driving at Faraday Future. Holds an M.S. from UMass Amherst and 6 U.S. patents in computer vision, autonomous driving, and edge AI. Recipient of the Sports Business Journal Best Innovation Award (2022) and IDC Innovator (2023).

MA

Dr. Motilal Agarwal

Chief Scientist & Co-founder, Zippin
motilal76@yahoo.com

Previously a principal computer scientist at SRI International, leading large-scale robotics and vision systems for autonomous grasping, surveillance, and biometric capture, with collaborations at Stanford, MIT, and Carnegie Mellon. Ph.D. in Computer Vision from the University of Maryland; B.S. in Computer Science from IIT Delhi. Co-author of AdaMAE (CVPR 2023) and holder of multiple patents in automated checkout.

TB

Tanmay Bangalore

Senior Software Engineer, Meta
tanmaybangalore@gmail.com

Previously Staff Software Engineer and Team Lead for Detection & Tracking at Zippin, leading computer vision infrastructure development for checkout-free retail systems deployed across hundreds of stores. Prior experience in autonomous vehicle perception and localization at General Motors and L3Harris Technologies.

Target Audience

Prerequisites: Basic computer vision (CNNs, detection). Camera geometry helpful but not required.

Expected Attendance: 100–300 attendees.

CV Researchers Interested in real-world deployment challenges and production failure modes
Edge AI Engineers Working on bandwidth-constrained, latency-sensitive vision systems
PhD Students In tracking, calibration, and video understanding
Industry Practitioners In retail tech, autonomous vehicles, and warehouses
Research-Production Bridge Anyone interested in the gap between academic research and real-world deployment

Subject Areas

Primary: Multi-object tracking and re-identification.
Secondary: Camera calibration & 3D reconstruction; action/event recognition; real-time CV systems; applications in retail automation; deep learning for correspondence.

Resources

Planned Materials (Public GitHub Repository)

References

  1. Sarlin et al., "SuperGlue: Learning Feature Matching with Graph Neural Networks," CVPR 2020
  2. Sun et al., "LoFTR: Detector-Free Local Feature Matching with Transformers," CVPR 2021
  3. Wojke et al., "DeepSORT: Simple Online and Realtime Tracking," 2017
  4. Zhang et al., "ByteTrack: Multi-Object Tracking by Associating Every Detection Box," 2021
  5. SchΓΆnberger & Frahm, "Structure-from-Motion Revisited," CVPR 2016
  6. Bandara et al., "AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning," CVPR 2023 (co-authored by M. Agarwal)
  7. Agrawal et al., "Training Data Acquisition for Automated Checkout," US Patent US20220327511A1, 2022
  8. Bernardin & Stiefelhagen, "Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics," EURASIP 2008
  9. Carreira & Zisserman, "Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset," CVPR 2017
  10. Lane et al., "DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices," IPSN 2016