548 - FLASH-TV 2.0: Refining and assessing the FLASH-TV methods for TV viewing estimation
Saturday, April 23, 2022
3:30 PM – 6:00 PM US MT
Poster Number: 548 Publication Number: 548.245
Teresia M. O'Connor, Baylor College of Medicine, Houston, TX, United States; Anil Kumar Vadathya, Rice University, Houston, TX, United States; Alicia Beltran, USDA/ARS Children's Nutrition Research Center/ Baylor College of Medicine Department of Pediatrics, Houston, TX, United States; Oriana Perez, Baylor College of Medicine, Houston, TX, United States; Salma Musaad, Baylor College of Medicine, Houston, TX, United States; Sheryl O. Hughes, Baylor College of Medicine, Houston, TX, United States; Jason A. Mendoza, Fred Hutchinson Cancer Research Center, Seattle, WA, United States; Ashok Veeraraghavan, Rice University, Houston, TX, United States; Tom Baranowski, Baylor College of Medicine, Houston, TX, United States
Associate Professor of Pediatrics Baylor College of Medicine Houston, Texas, United States
Background: Excessive TV-viewing among children is a public health concern, yet tools to measure children’s TV viewing suffer from biases. We developed FLASH-TV 1.0 to objectively measure children’s TV viewing using computer vision and machine learning algorithms to analyze video images of children in front of TVs.
Objective: Our goal was to refine FLASH-TV 1.0 algorithms for processing the video data and reassess the new version, FLASH-TV 2.0 as an objective measure of children’s TV viewing.
Design/Methods: Four design studies (n=21) were conducted with family triads (parent and 2 siblings): 3 in an observation lab and 1 in the child’s home. A 5th confirmation study was conducted in the lab (n=10). Family triads participated in task-based screen use protocols for about 90 minutes. The FLASH-TV system included a video camera placed near TV facing the room in front of TV during data collection. Video data coded by staff using duration coding for whether the target child’s gaze was on the TV were the gold-standard (10% double coded, mean Kappa 0.83-0.88). FLASH-TV estimated a child’s TV viewing time by sequentially detecting faces in a video frame, verifying that the face was the target child, and assessing TV-watching (gaze) behavior. Enhancements of convolutional neural network algorithms for each step included substituting YOLOv2 for RetinaFace for face-detection; DeepFace for ArcFace for face verification; and using a combination of Gaze360 and ETH-XGaze for gaze estimation. Additionally, the video-data were assessed at 5-second epochs to reduce the noise in the system. The target child’s TV viewing duration estimated by FLASH-TV running the three steps sequentially was compared to the gold standard, with criterion validity for overall TV viewing calculated using intra-class correlation (ICC) in a generalized linear mixed model.
Results: The target child’s mean age across studies was 8.6 years (SD 1.5), with 32.3% non-Hispanic White, 22.6% Black, 22.6% Hispanic White, and 22.6% other. Face detector’s overall sensitivity improved from 93.6% to 96.1%. Face verification overall positive predictive value improved from 90% to 96% reducing the false positive rate. The ICC improved from 0.725 to 0.961 when comparing the child’s gold standard TV viewing time (min) to FLASH-TV 2.0 estimated time analyzed at 5 second epochs. The ICC of FLASH-TV 2.0 estimate to gold standard was 0.963 for the confirmation sample.Conclusion(s): FLASH-TV 2.0 significantly improved the performance of FLASH-TV 1.0 to identify when a target child is watching TV and offers a critical new tool to accurately measure children’s TV viewing.