Read the full article at https://www.tandfonline.com/doi/full/10.1080/10095020.2025.2493073
The rapid advancement of Generative Artificial Intelligence (GenAI) in 2023 has catalyzed transformative shifts across various industries, including urban transportation planning. This study evaluates the applicability of Large Language Models (LLMs) in transportation decision-making, focusing on two hypotheses: (H1) out-of-the-box LLMs exhibit basic transportation knowledge and reasoning capabilities, enabling them to design and execute analytical workflows; and (H2) larger parameter models and fine-tuned models demonstrate superior accuracy and contextual understanding, outperforming smaller and general-purpose models. Using a three-level evaluation framework, we assessed GPT-4 and Phi-3-mini across (1) geospatial skills, (2) domain-specific transportation knowledge, and (3) real-world transport problem-solving in congestion pricing scenarios. Results confirm that while LLMs possess baseline geospatial and transportation reasoning abilities, their effectiveness varies by task complexity. GPT-4 outperformed Phi-3-mini across all evaluation levels, achieving 86% accuracy in GIS tasks, 81% in MATSim comprehension, and 91% in real-world transport decision support, while Phi-3-mini scored 43–72%. These findings highlight the advantages of larger models in structured decision-making tasks and their potential as analytical copilots for transportation planners. The study contributes to the ongoing scientific debate on the role of GenAI in transportation governance, reinforcing the need for fine-tuning and retrieval-augmented generation (RAG) to enhance LLM performance in structured analytics. Future research should explore newer LLMs, transport-specific fine-tuning, and hybrid AI architectures to improve AI-driven transportation planning and decision support.
