CV at Scale: Checkout-Free Retail

About

Checkout-free retail represents one of the most challenging real-world computer vision deployments, requiring reliable performance across hundreds of sites and millions of interactions. This tutorial bridges academic research and production deployment by framing checkout-free retail as a canonical large-scale multi-camera vision systems problem.

The tutorial focuses on three foundational pillars—presented as generalizable computer vision problems rather than application-specific solutions. Each component is discussed in terms of its underlying formulations, scalability constraints, failure modes, and design tradeoffs that transfer directly to autonomous driving, smart spaces, sports analytics, warehouses, and urban sensing.

Automatic Multi-Camera Calibration

Continuous online estimation addressing drift and partial failures using deep learning and conventional CV pipelines.

Real-Time Multi-Camera Tracking

Global data association under asynchronous, unreliable observations via integer programming and graph-based formulations.

Structured Event Detection

Inference from partial visual evidence with edge computing, Kubernetes deployment, and CPU/GPU/TPU optimization.

A central theme is how infrastructure constraints—including limited bandwidth, latency requirements, camera reliability, and edge computing budgets—fundamentally shape algorithmic and architectural decisions. Attendees will learn how classical and modern deep learning techniques are adapted for continuous online operation, partial observability, and 99.9%+ system reliability.

Key Learning Objectives

Master automatic multi-camera calibration under scale, drift, and partial observability
Understand real-time tracking under asynchrony and missing data
Learn structured event inference in multi-camera systems
Recognize how infrastructure constraints drive architectural decisions
Design systems for reliability at scale

Live Demonstration

Attendees will interact with a live multi-camera perception system demonstrating online calibration, global multi-object tracking, and structured event inference under realistic bandwidth, latency, and hardware constraints. The system operates fully offline using local compute and networking, with pre-recorded fallback visualizations for all scenarios. Each component (calibration, tracking, event inference) can be demonstrated independently.

Broader Impact

While checkout-free retail serves as the motivating application, the tutorial deliberately abstracts away domain-specific details to focus on generalizable formulations applicable to autonomous vehicles, warehouses, smart cities, and sports analytics—any application requiring automatic multi-camera calibration, real-time edge analysis, robust event detection with limited bandwidth, and scale deployment. The systems discussed have contributed to large-scale commercial deployments recognized through industry innovation awards and government-recognized R&D programs.

Schedule

Duration	Topic	Presenter
15 min	Introduction & Multi-Camera Vision at Scale Industry scale and infrastructure constraints Live demo setup introduction	All
60 min	Automatic Multi-Camera Calibration under Scale, Drift, and Partial Observability Deep learning: SuperPoint, LoFTR, bundle adjustment Conventional CV: SIFT/ORB, SfM, homography Production: online refinement, failure detection ▶ Live demo: Calibration visualization	Agarwal
15 min	Break — Attendees interact with demo	—
45 min	Real-Time Multi-Camera Tracking under Asynchrony and Missing Data Global optimization via integer programming Graph-based formulation, solver strategies Camera failures, async data, occlusions ▶ Live demo: Tracking & occlusion handling	Kolluru
45 min	Structured Event Inference and Reliability in Multi-Camera Vision Systems Hand-shelf interactions, multi-camera fusion Edge computing, Kubernetes deployment CPU/GPU/TPU optimization, simulation stores Case studies and failure modes ▶ Live demo: Product interaction detection	Bangalore + All
30 min	Interactive Q&A & Hands-on Demo Attendees test live checkout-free system	All

Organizers

Hareesh Kolluru

Head of AI/ML, Motive

sethuhareesh.kolluru@gmail.com

Previously led deployment of checkout-free shopping platforms to 250+ stores worldwide at Zippin, visual search at Slyce, and served as Principal Architect for Self-Driving at Faraday Future. Holds an M.S. from UMass Amherst and 6 U.S. patents in computer vision, autonomous driving, and edge AI. Recipient of the Sports Business Journal Best Innovation Award (2022) and IDC Innovator (2023).

Dr. Motilal Agarwal

Chief Scientist & Co-founder, Zippin

motilal76@yahoo.com

Previously a principal computer scientist at SRI International, leading large-scale robotics and vision systems for autonomous grasping, surveillance, and biometric capture, with collaborations at Stanford, MIT, and Carnegie Mellon. Ph.D. in Computer Vision from the University of Maryland; B.S. in Computer Science from IIT Delhi. Co-author of AdaMAE (CVPR 2023) and holder of multiple patents in automated checkout.

Tanmay Bangalore

Senior Software Engineer, Meta

tanmaybangalore@gmail.com

Previously Staff Software Engineer and Team Lead for Detection & Tracking at Zippin, leading computer vision infrastructure development for checkout-free retail systems deployed across hundreds of stores. Prior experience in autonomous vehicle perception and localization at General Motors and L3Harris Technologies.

Target Audience

Prerequisites: Basic computer vision (CNNs, detection). Camera geometry helpful but not required.

Expected Attendance: 100–300 attendees.

CV Researchers Interested in real-world deployment challenges and production failure modes

Edge AI Engineers Working on bandwidth-constrained, latency-sensitive vision systems

PhD Students In tracking, calibration, and video understanding

Industry Practitioners In retail tech, autonomous vehicles, and warehouses

Research-Production Bridge Anyone interested in the gap between academic research and real-world deployment

Subject Areas

Primary: Multi-object tracking and re-identification.
Secondary: Camera calibration & 3D reconstruction; action/event recognition; real-time CV systems; applications in retail automation; deep learning for correspondence.

Resources

Planned Materials (Public GitHub Repository)

Slides: 150+ slides with speaker notes (PDF/PPT)
Code: Calibration pipeline (DL + conventional), tracking formulation, event detection, bandwidth calculators, visualization notebooks
Toolkits: Feature matching examples, bundle adjustment, validation metrics, sample scenarios
Reading list: Calibration papers, tracking methods, edge AI, retail automation

References

Sarlin et al., "SuperGlue: Learning Feature Matching with Graph Neural Networks," CVPR 2020
Sun et al., "LoFTR: Detector-Free Local Feature Matching with Transformers," CVPR 2021
Wojke et al., "DeepSORT: Simple Online and Realtime Tracking," 2017
Zhang et al., "ByteTrack: Multi-Object Tracking by Associating Every Detection Box," 2021
Schönberger & Frahm, "Structure-from-Motion Revisited," CVPR 2016
Bandara et al., "AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning," CVPR 2023 (co-authored by M. Agarwal)
Agrawal et al., "Training Data Acquisition for Automated Checkout," US Patent US20220327511A1, 2022
Bernardin & Stiefelhagen, "Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics," EURASIP 2008
Carreira & Zisserman, "Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset," CVPR 2017
Lane et al., "DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices," IPSN 2016