Bayesian networks for imbalance data to investigate the contributing factors to fatal injury crashes on the Ghanaian highways

The crash data are often predominantly imbalanced, among which the fatal injury (or minority) crashes are significantly underrepresented relative to the non-fatal injury (or majority) ones. This unbalanced phenomenon poses a huge challenge to most of the statistical learning methods and needs to be addressed in the data preprocessing. To this end, the authors comparatively apply three data balance methods, i.e., the Synthetic Minority Oversampling Technique (SMOTE), the Borderline SMOTE (BL-SMOTE), and the Majority Weighted Minority Oversampling (MWMOTE). Then, the authors examine different Bayesian networks (BNs) to explore the contributing factors of fatal injury crashes. The 2016 highway crash data of Ghana are retrieved for the case study. The results show that the accuracy of the injury severity classification is improved by using the preprocessed data. Highest improvement is observed on the data preprocessed by the MWMOTE technique. Statistical verification is done by the Wilcoxon signed-rank test. The inference results of the best BNs show the significant factors of fatal crashes which include off-peak time, non-intersection area, pedestrian involved collisions, rural road environment, good tarred road, roads without shoulders, and multiple vehicles involved crash.

Language

  • English

Media Info

Subject/Index Terms

Filing Info

  • Accession Number: 01762090
  • Record Type: Publication
  • Files: TRIS
  • Created Date: Dec 23 2020 3:09PM