Deep CNN, Body Pose, and Body-Object Interaction Features for Drivers’ Activity Monitoring

Automatic recognition and prediction of in-vehicle human activities has a significant impact on the next generation of driver assistance and intelligent autonomous vehicles. In this article, the authors present a novel single image driver action recognition algorithm inspired by human perception that often focuses selectively on parts of the images to acquire information at specific places which are distinct to a given task. Unlike existing approaches, the authors argue that human activity is a combination of pose and semantic contextual cues. In detail, the authors model this by considering the configuration of body joints, their interaction with objects being represented as a pairwise relation to capture the structural information. The authors' body-pose and body-object interaction representation is built to be semantically rich and meaningful, which is highly discriminative even though it is coupled with a basic linear support vector machine (SVM) classifier. The authors also propose a Multi-stream Deep Fusion Network (MDFN) for combining high-level semantics with convolutional neural network (CNN) features. The authors' experimental results demonstrate that the proposed approach significantly improves the drivers’ action recognition accuracy on two exacting datasets.

Language

  • English

Media Info

Subject/Index Terms

Filing Info

  • Accession Number: 01845150
  • Record Type: Publication
  • Files: TRIS
  • Created Date: May 11 2022 9:57AM