Select Page

GIS Copilot: Towards an Autonomous GIS Agent for Spatial Analysis

Recent advancements in generative artificial intelligence (AI) and particularly Large Language Models (LLMs), offer promising capabilities for spatial analysis. Despite their potential, the integration of generative AI with established GIS platforms remains underexplored. In this study, we propose a framework for integrating LLMs directly into existing GIS platforms, using the well-developed GIS software QGIS as an example. Our approach leverages the reasoning and programming capabilities of LLMs to autonomously generate spatial analysis workflows and code through an informed agent that has comprehensive documentation of key GIS tools and parameters. The framework also incorporates external tools such as GeoPandas to enhance the system’s geoprocessing capabilities. The implementation of this framework resulted in the development of a “GIS Copilot” that allows GIS users to interact with QGIS using natural language for spatial analysis. The GIS Copilot was evaluated with over 100 spatial analysis tasks with three complexity levels: basic tasks that require one GIS tool and typically involve one data layer to perform simple operations; intermediate tasks involving multi-step processes with multiple tools, guided by user instructions; and advanced tasks which involve multi-step processes that require multiple tools but not guided by explicit instructions, necessitating the agent to independently decide on and executes the necessary steps. The evaluation reveals that the GIS Copilot demonstrates strong potential in automating foundational GIS operations, with a high success rate in tool selection and code generation for basic and intermediate tasks, while challenges remain in achieving full autonomy for more complex tasks. This study contributes to the emerging vision of autonomous GIS, providing a pathway for non-experts to engage with geospatial analysis with minimal prior expertise. While full autonomy is yet to be achieved, the GIS Copilot demonstrates significant potential for simplifying GIS workflows and enhancing decision-making processes.

To learn more about this GIS Copilot:


Case studies

Generating contour lines from a Digital Elevation Model (DEM) dataset

In this case, the agent is tasked with generating contour lines from a Digital Elevation Model (DEM) dataset, a raster data layer that represents terrain elevation. This is a foundational spatial analysis task that is frequently used in geographic studies, such as topographical mapping, environmental modeling, and landscape visualization. The agent selected the tool gdal:contour from the available geoprocessing tools within the system. This tool is specifically designed to convert elevation data (raster) into contour lines (vector), based on user-defined intervals.

Generating contour lines from DEM (Task: Generate contour lines from the DEM of Puerto Rico with a 50-meter interval.)

Extracting land cover data for Pennsylvania from a larger dataset covering the contiguous US.

The input data includes land cover raster data from the National Land Cover Database (NLCD) and a shapefile containing the boundary of Pennsylvania. The agent accurately selected the appropriate tool (the GDAL Clip Raster by Mask Layer tool) and generated the executable code, configuring parameters such as the input layer, output path, and clipping settings. The agent was able to produce the clipped raster layer showing the land cover specific to Pennsylvania (Figure 7). The result provides a detailed view of various land cover categories within the state’s boundary.

Extracting land cover information of Pennsylvania. (Task: Clip the land cover data of the USA to the Pennsylvania boundary.)

Performing a detailed terrain analysis using DEM
In this case, the agent was tasked with performing a detailed terrain analysis for Richland County, SC, by merging four DEM tiles obtained from the US Geological Survey (USGS) and Shuttle Radar Topography Mission (SRTM) dataset (USGS, 2024). The generated geoprocessing workflow shows the sequence of steps to complete the task. The workflow began by merging the four individual DEMs to create an elevation model covering the area of interest. Once merged, the agent calculated multiple terrain characteristics, including slope, aspect, hillshade, Terrain Ruggedness Index (TRI), and Topographic Position Index (TPI). The outputs of each analysis step were displayed, with each terrain attribute as a separate map.

Richland County raster analysis. (Task: Merge the four DEMs into a single raster and perform terrain characteristic analysis for Richland County, including slope, aspect, hillshade, terrain ruggedness index (TRI), and topographic Position Index (TPI).)

 

Calculation of an obesity risk behavior index across all counties in the contiguous US. 

Two data layers – a shapefile containing the boundaries of all counties and a CSV file containing the rate of visits to different places such as convenience stores, limited-service restaurants, sport centers, fitness centers, and parks. The agent was able to join the attributes to the shapefile and select the fields needed for the analysis without any instruction from the user using the data understanding module.

County-level obesity risk behavior index analysis (Task: Generate an obesity risk behavior index of each county in the contiguous US by analyzing the rate of visits to unhealthy food retailers (such as convenience store, alcoholic drinking places, and limited service restaurant) and the visit rate to places that support physical activity (e.g., sports centers, parks, fitness centers). Visualize the results in a thematic map to highlight the obesity risk behavior index across counties).

 

Analyzing and visualizing the fast-food accessibility score for each county in Pennsylvania

This operation involves analyzing and visualizing the fast-food accessibility score for each county in Pennsylvania and performing a correlation analysis between the fast-food accessibility score and the prevalence of obesity. The agent’s workflow began by calculating the fast-food accessibility score for each county based on the number of fast-food restaurants per capita by performing a spatial join operation. This was followed by generating a thematic map that displays counties with higher accessibility scores in darker shades of blue. Next, the agent analyzed the correlation between county-level obesity rates and fast-food accessibility scores. The results of this analysis are shown in Figure 11, where a scatter plot with a regression line highlights the relationship between fast-food accessibility and obesity rates across the state. The agent successfully managed the multistep process, calculating accessibility scores to perform correlation analysis, and visualizing the results in both a thematic map and a scatter plot. It should be noted that all the specific fields used in the analysis were automatically selected by the agent without explicit user guidance enabled by the data understanding module.

 Generate an interactive map using leaflet to show the layer in HTML.

Click here to access the generated interactive map.