Data selection in machine learning for identifying trip purposes and travel modes from longitudinal GPS data collection lasting for seasons

Application of machine learning methods shows a popular attempt to identify the purpose of a trip and mode of travel on Global Positioning System (GPS) trajectory data. Data selection for the training and test sets is important in these methods. However, the feasibility and effects of choosing these data from different periods of the year are still unknown. This detail is particularly important since collecting data via GPS decreases the burden on participants to such an extent that it can last for seasons which may own distinct features. In order to bridge this gap, this paper employs Aslan & Zech’s test (AZ-test) and Random Forests (RF) successively to investigate the influence of data selection from different seasons for training and test sets. The dataset obtained in a city with distinct seasons, Hakodate, Japan, is used for this empirical analysis. The results of AZ-test suggest that explanatory variables of the two data sets from distinct seasons follow different distributions. Furthermore, it concludes that data set from two-seasons and data set from single season also follow different distributions. However, this test achieves some contradictory results in some cases. Due to this, RF is used to check how the accuracy varies in a further detail. RF confirms the findings by AZ-test in most cases. In addition, RF results show that including GIS features as explanatory variables has positive effect on the identification accuracy while including weather features has negative effect on the identification accuracy.

Language

  • English

Media Info

Subject/Index Terms

Filing Info

  • Accession Number: 01760526
  • Record Type: Publication
  • Files: TRIS
  • Created Date: Dec 21 2020 10:04AM