Identification and Validation of Themes from Vehicle Owner Complaints and Fatality Reports Using Text Analysis

The National Highway Transportation and Safety Administration (NHTSA) has compiled crash information from customers using the vehicle owner’s questionnaire (VOQ) since January 1995. Most researchers have not yet utilized this data source, specifically the complaints compiled in the form of free response text. Using natural language processing and unsupervised machine learning algorithms, this research looks to identify the emergent themes that capture the key issues faced by vehicle owners. The Fatality Analysis Reporting System (FARS) dataset was used to validate whether the themes emergent from the reports are areas of concern. Customers commonly reported themes related to rear-ending, air bag deployment issues, parking related crashes, and speeding. Additionally, the validation techniques identified anomalies in the relationship between fatality and airbag deployment as well as a higher propensity of front-to-front collision versus front-rear ending of the vehicle. This research has far reaching implications with respect to creating a framework for integrating multiple data sources for analysis that could potentially affect national transportation policy.

  • Supplemental Notes:
    • This paper was sponsored by TRB committee ABJ20 Standing Committee on Statewide Transportation Data and Information Systems.
  • Authors:
    • Mehrotra, Shashank Kumar
    • Roberts, Shannon C
  • Conference:
  • Date: 2018

Language

  • English

Media Info

  • Media Type: Digital/other
  • Features: Figures; References; Tables;
  • Pagination: 18p

Subject/Index Terms

Filing Info

  • Accession Number: 01661423
  • Record Type: Publication
  • Report/Paper Numbers: 18-05457
  • Files: TRIS, TRB, ATRI
  • Created Date: Feb 27 2018 9:46AM