Spatio-Temporal Crash Prediction: Effects of Negative Sampling on Understanding Network-Level Crash Occurrence

In projects centered around rare event case data, the challenge of data comprehension is greatly increased because of insufficient data for deriving insight and analysis. This is particularly the case with traffic crash occurrence, where positive events (crashes) are rare and, in most cases, no data set exists for negative events (non-crashes). One method to increase available data is negative sampling, which is the process of creating a negative event based on the absence of a positive event. In this work, four negative sampling techniques are presented with varying ratios of negative to positive data. These types of techniques are based on spatial data, temporal data, and a mixture of the two, with the data ratios acting as class balancing tools. The best performing model found was with a negative sampling technique that shifted temporal information and had an even 50/50 data split, with an F-1 score, a formulaic combination of precision and recall, of 93.68. These results are promising for Intelligent Transportation Systems (ITS) applications to inform of potential crash locations in an entire area for proactive measures to be put in place.

  • Record URL:
  • Availability:
  • Supplemental Notes:
    • Data used in this project was provided to the authors by the Hamilton County Emergency Communications District. A subset of this data can be accessed on the site © National Academy of Sciences: Transportation Research Board 2021.
  • Authors:
    • Way, Peter
    • Roland, Jeremiah
    • Sartipi, Mina
    • Osman, Osama
  • Publication Date: 2021


  • English

Media Info

Subject/Index Terms

Filing Info

  • Accession Number: 01764829
  • Record Type: Publication
  • Files: TRIS, TRB, ATRI
  • Created Date: Feb 10 2021 3:08PM