Efficient Approximated Matching Methods for Online Incident Data Management

One of major concerns for online traffic information management between data collection and data dissemination is how to effectively deal with data errors from various data sources, e.g., spelling mistakes, truncations, inconsistent conventions and missing fields. The “dirty” input data can be validated and cleaned by being matched against accurate reference databases, before they can be disseminated to the public. This paper proposes two novel approximated matching methods, namely the token based and q-gram based ones for online incident data management. When input records comes in, a relatively small candidate set can be efficiently constructed from records in the large reference table by using the offline pre-built similarity index. The “matched” reference records with highest similarity scores to input ones are found within the candidate set, and can be posted online as clean and accurate information. The experimental results suggest that the proposed approximated matching methods significantly improve data quality and completeness for real time online incident data management.

Language

  • English

Media Info

  • Media Type: CD-ROM
  • Features: Figures; References; Tables;
  • Pagination: 15p
  • Monograph Title: TRB 86th Annual Meeting Compendium of Papers CD-ROM

Subject/Index Terms

Filing Info

  • Accession Number: 01043520
  • Record Type: Publication
  • Report/Paper Numbers: 07-1337
  • Files: TRIS, TRB
  • Created Date: Feb 8 2007 5:56PM