Correcting Mislabeled Taxi Trajectory Occupancy Status Using Input-Output Hidden Markov Model

Taxi GPS trajectory data has been widely used in transportation research. However, the data is often corrupted with noises, of which the most commonly seen and consequential is mislabeled passenger occupancy status, which indicates whether there is a passenger in the taxi when a GPS point is recorded. This study develops an input-output hidden Markov model (IO-HMM) to identify and correct the mislabeled occupancy status in taxi trajectory data. Given the observed taxi moving behaviors and the recorded occupancy status, as well as specific spatiotemporal contexts, IO-HMM generates a series of predictions of the unknown taxi occupancy states. Since the true occupancy states are not available, the expectation-maximization (EM) algorithm is applied to train the model. The proposed model is experimented on a large taxi trajectory data set collected in Shenzhen, China. Results of numerical experiments show that the proposed model generates reliable predictions of taxi occupancy status, and consistently corrects the wrong labels in the data at a relatively high accuracy. Moreover, the proposed model shows reasonable robustness towards random errors manually added to ground truth trajectories.

  • Supplemental Notes:
    • This paper was sponsored by TRB committee ABJ70 Standing Committee on Artificial Intelligence and Advanced Computing Applications.
  • Corporate Authors:

    Transportation Research Board

    ,    
  • Authors:
    • Zhang, Kenan
    • Zhong, Lin
    • Nie, Yu (Marco)
  • Conference:
  • Date: 2019

Language

  • English

Media Info

  • Media Type: Digital/other
  • Features: Maps; References; Tables;
  • Pagination: 7p

Subject/Index Terms

Filing Info

  • Accession Number: 01698028
  • Record Type: Publication
  • Report/Paper Numbers: 19-05030
  • Files: TRIS, TRB, ATRI
  • Created Date: Mar 1 2019 3:51PM