Injury severity analysis of pedestrian and bicyclist trespassing crashes at non-crossings: A hybrid predictive text analytics and heterogeneity-based statistical modeling approach

Non-motorists involved in rail-trespassing crashes are usually more vulnerable to receiving major or fatal injuries. Previous research has used traditional quantitative crash data for understanding factors contributing to injury outcomes of non-motorists in train involved collisions. However, usually overlooked crash narratives can provide useful and unique contextual crash-specific information regarding factors associated with injury outcomes. The main objective of this study is to harness the rapid advancements in more sophisticated qualitative analysis procedures for identifying thematic concepts in unstructured crash narrative data. A two-staged hybrid approach is proposed where text mining is applied first to extract valuable information from crash narratives followed by inclusion of the new variables derived from text mining in formulation of advanced statistical models for injury outcomes. By using ten-year (2006−2015) non-motorist non-crossing trespassing injury data obtained from the Federal Railroad Administration, statistical procedures and advanced machine learning text analytics are applied to extract unique information on contributory factors of trespassers’ injury outcomes. The key concepts are systematically categorized into trespasser, injury, train, medical, and location related factors. A total of 13 unique variables are extracted from the thematic concepts that are not present in traditional tabular crash data. The analysis reveals a positive statistically significant association between presence of crash narrative and trespasser’s injury outcome (coded as minor, major, and fatal injury). Compared to crashes with minor injuries, crashes involving major and fatal injuries are more likely to be reported with crash narratives. A crosstabulation of new variables derived from text mining with injury outcomes revealed that trespassers with confirmed suicide attempts, trespassers wearing headphones, or talking on cell phones are more likely to receive fatal injuries. Among other factors identified, trespassers under alcohol influence, trespasser hit by commuter train, and advance warnings by engineer are associated with more severe (major and fatal) trespasser injury outcomes. Accounting for unobserved heterogeneity and controlling for other factors, fixed and random parameter discrete outcome models are developed to understand the heterogeneous correlations between trespasser injury outcomes and the new crash specific explanatory variables derived from text mining – providing deeper insights. Practical implications and future research directions are discussed.


