Worldwide city transport typology prediction with sentence-BERT based supervised learning via Wikipedia
An overwhelming majority of the world’s human population lives in urban areas and cities. Understanding a city’s transportation typology is immensely valuable for planners and policy makers whose decisions can potentially impact millions of city residents. Despite the value of understanding a city’s typology, labeled data (city and its typology) is scarce, and spans at most a few hundred cities in the current transportation literature. To break this barrier, the authors propose a supervised machine learning approach to predict a city’s typology given the information in its Wikipedia page. The authors' method leverages recent breakthroughs in natural language processing, namely sentence-Bidirectional Encoder Representations from Transformers (BERT), and shows how the text-based information from Wikipedia can be effectively used as a data source for city typology prediction tasks that can be applied to over 2000 cities worldwide. The authors propose a novel method for low-dimensional city representation using a city’s Wikipedia page, which makes supervised learning of city typology labels tractable even with a few hundred labeled samples. These features are used with labeled city samples to train binary classifiers (logistic regression) for four different city typologies: (i) congestion, (ii) auto-heavy, (iii) transit-heavy, and (iv) bike-friendly cities resulting in reasonably high area under the receiver operating characteristic (ROC) curve (AUC) scores of 0.87, 0.86, 0.61 and 0.94 respectively. The authors' approach provides sufficient flexibility for incorporating additional variables in the city typology models and can be applied to study other city typologies as well. The authors' findings can assist a diverse group of stakeholders in transportation and urban planning fields, and opens up new opportunities for using text-based information from Wikipedia (or similar platforms) as data sources in such fields.
- Record URL:
- Record URL:
-
Availability:
- Find a library where document is available. Order URL: http://worldcat.org/issn/0968090X
-
Supplemental Notes:
- © 2022 Elsevier Ltd. All rights reserved. Abstract reprinted with permission of Elsevier.
-
Authors:
- Rath, Srushti
-
0000-0002-7603-339X
- Chow, Joseph Y J
-
0000-0002-6471-3419
- Publication Date: 2022-6
Language
- English
Media Info
- Media Type: Web
- Features: Figures; References; Tables;
- Pagination: 103661
-
Serial:
- Transportation Research Part C: Emerging Technologies
- Volume: 139
- Issue Number: 0
- Publisher: Elsevier
- ISSN: 0968-090X
- Serial URL: http://www.sciencedirect.com/science/journal/0968090X
Subject/Index Terms
- TRT Terms: Artificial intelligence; Automobiles; Bicycling; City planning; Classification; Computer science; Language; Machine learning; Public transit; Traffic congestion; Urban transportation
- Subject Areas: Data and Information Technology; Highways; Pedestrians and Bicyclists; Planning and Forecasting; Public Transportation;
Filing Info
- Accession Number: 01845032
- Record Type: Publication
- Files: TRIS
- Created Date: May 10 2022 2:35PM