Recent advancements in generative artificial intelligence (AI) and particularly Large Language Models (LLMs), offer promising capabilities for spatial analysis. Despite their potential, the integration of generative AI with established GIS platforms remains underexplored. In this study, we propose a framework for integrating LLMs directly into existing GIS platforms, using the well-developed GIS software QGIS as an example. Our approach leverages the reasoning and programming capabilities of LLMs to autonomously generate spatial analysis workflows and code through an informed agent that has comprehensive documentation of key GIS tools and parameters. The framework also incorporates external tools such as GeoPandas to enhance the system’s geoprocessing capabilities. The implementation of this framework resulted in the development of a “GIS Copilot” that allows GIS users to interact with QGIS using natural language for spatial analysis. The GIS Copilot was evaluated with over 100 spatial analysis tasks with three complexity levels: basic tasks that require one GIS tool and typically involve one data layer to perform simple operations; intermediate tasks involving multi-step processes with multiple tools, guided by user instructions; and advanced tasks which involve multi-step processes that require multiple tools but not guided by explicit instructions, necessitating the agent to independently decide on and executes the necessary steps. The evaluation reveals that the GIS Copilot demonstrates strong potential in automating foundational GIS operations, with a high success rate in tool selection and code generation for basic and intermediate tasks, while challenges remain in achieving full autonomy for more complex tasks. This study contributes to the emerging vision of autonomous GIS, providing a pathway for non-experts to engage with geospatial analysis with minimal prior expertise. While full autonomy is yet to be achieved, the GIS Copilot demonstrates significant potential for simplifying GIS workflows and enhancing decision-making processes.
To learn more about this GIS Copilot:
- For more details about the design, implementation, and discussions, please check out our preprint paper.
- The source code for the GIS Copilot is available on GitHub at https://shorturl.at/vRcm6.
- The GIS Copilot can be downloaded from the official QGIS plugin page at https://plugins.qgis.org/plugins/SpatialAnalysisAgent-master and installed by following the instructions provided in the GIS Copilot User Manual.
- The data used for testing alongside with the case studies can be accessed through https://shorturl.at/bI4Ep.
Case studies
Generating contour lines from a Digital Elevation Model (DEM) dataset
In this case, the agent is tasked with generating contour lines from a Digital Elevation Model (DEM) dataset, a raster data layer that represents terrain elevation. This is a foundational spatial analysis task that is frequently used in geographic studies, such as topographical mapping, environmental modeling, and landscape visualization. The agent selected the tool gdal:contour from the available geoprocessing tools within the system. This tool is specifically designed to convert elevation data (raster) into contour lines (vector), based on user-defined intervals.
Extracting land cover data for Pennsylvania from a larger dataset covering the contiguous US.
The input data includes land cover raster data from the National Land Cover Database (NLCD) and a shapefile containing the boundary of Pennsylvania. The agent accurately selected the appropriate tool (the GDAL Clip Raster by Mask Layer tool) and generated the executable code, configuring parameters such as the input layer, output path, and clipping settings. The agent was able to produce the clipped raster layer showing the land cover specific to Pennsylvania (Figure 7). The result provides a detailed view of various land cover categories within the state’s boundary.
Calculation of an obesity risk behavior index across all counties in the contiguous US.
Two data layers – a shapefile containing the boundaries of all counties and a CSV file containing the rate of visits to different places such as convenience stores, limited-service restaurants, sport centers, fitness centers, and parks. The agent was able to join the attributes to the shapefile and select the fields needed for the analysis without any instruction from the user using the data understanding module.
Analyzing and visualizing the fast-food accessibility score for each county in Pennsylvania
This operation involves analyzing and visualizing the fast-food accessibility score for each county in Pennsylvania and performing a correlation analysis between the fast-food accessibility score and the prevalence of obesity. The agent’s workflow began by calculating the fast-food accessibility score for each county based on the number of fast-food restaurants per capita by performing a spatial join operation. This was followed by generating a thematic map that displays counties with higher accessibility scores in darker shades of blue. Next, the agent analyzed the correlation between county-level obesity rates and fast-food accessibility scores. The results of this analysis are shown in Figure 11, where a scatter plot with a regression line highlights the relationship between fast-food accessibility and obesity rates across the state. The agent successfully managed the multistep process, calculating accessibility scores to perform correlation analysis, and visualizing the results in both a thematic map and a scatter plot. It should be noted that all the specific fields used in the analysis were automatically selected by the agent without explicit user guidance enabled by the data understanding module.
Generate an interactive map using leaflet to show the layer in HTML.
Click here to access the generated interactive map.