Automatic Quality Control of Transportation Reports Using Statistical Language Processing

The processes of developing, monitoring, and maintaining transportation systems produce large volumes of information. Human fieldworkers are often responsible for gathering this information, and despite their best efforts, they will inevitably introduce errors into the collected data. This is a critical problem since: 1) the collected data are used to justify key infrastructure maintenance and development decisions; and 2) the volume of unstructured information (e.g., plain text) makes manual quality control prohibitively expensive. The authors introduce a solution to this problem in the example domain of vehicle accident reports. First, a sample of accident reports were analyzed and the existence of many data entry errors was confirmed. Second, a statistical language processing approach that automatically identifies reports containing data entry errors was developed and evaluated. The authors tested a variety of system configurations on real-world data and compared their performance with multiple baseline methods. The best configuration achieved a performance score of 84%, far outperforming the baseline methods. The results and analyses have quality control implications for any data source that pairs structured text (e.g., coded fields) with unstructured text.

Language

  • English

Media Info

Subject/Index Terms

Filing Info

  • Accession Number: 01527808
  • Record Type: Publication
  • Files: TLIB, TRIS
  • Created Date: Jun 5 2014 9:08AM