We implemented two critical modules of Autonomous GIS in LLM-Geo: decision-making and data operating, achieving three autonomous goals: self-generating, self-organizing, and self-executing. Additional modules are currently under development. The decision-making module adopts an LLM (GPT-4 in this study) as a core, or a ‘brain’, to generate step-by-step solution workflow and develop associated codes of each step for addressing various spatial questions. The data operating module is a Python environment to execute the generated code, such as spatial data loading, processing, visualization, and saving.
The following figure shows the overall workflow of how LLM-Geo answers questions. The process begins with the user inputting the spatial question along with associated data locations such as online data URLs, REST (REpresentational State Transfer) services, and API (Application Program Interface) documentation. Then, the LLM generates a solution graph similar to a geoprocessing workflow. Based on the solution graph, LLM-Geo sends the requirements of each operation node to LLM, requesting code implementation. LLM-Geo then gathers all operation code implementations and asks LLM to generate an assembly program that connects the operations based on the workflow. Finally, LLM-Geo executes the assembly program to produce the final answer.
Access the LLM-Geo source code on GitHub at https://github.com/gladcolor/LLM-Geo


Case studies
These case studies are designed to show the concepts of autonomous GIS. Please use GPT-4; the lower version of GPT will fail to generate the correct code and results. Note every time GPT-4 generates different outputs, your results may look different. Per our test, the generated program may not succeed every time, but there is about an 80% chance to run successfully. If input the generated prompts to the ChatGPT-4 chat box rather than API, the success rate will be much higher. We will improve the overall workflow of LLM-Geo, currently we do not push the entire historical conversation (i.e., sufficient information) to the GPT-4 API.
COVID-19 death rate analysis and visualization at the US county level.
The spatial problem for this case is to investigate the spatial distribution of the COVID-19 death rate (ratio of COVID-19 deaths to cases) and the association between the death rate and the proportion of senior residents (age >=65) at the US county level. The death rate is derived from the accumulated COVID-19 data as of December 31, 2020, available from New York Times (2023), based on state and local health agency reports. The population data is extracted from the 2020 ACS five-year estimates (US Census Bureau 2022). The task asks for a map to show the county level death rate distribution and a scatter plot to show the correlation and trend line of the death rate with the senior resident rate. We input the task (question) to LLM-Geo as:
Task:
1) Draw a map to show the death rate (death/case) of COVID-19 among the contiguous US counties. Use the accumulated COVID-19 data of 2020.12.31 to compute the death rate. Use scheme ='quantiles' when plotting the map. Set map projection to 'Conus Albers'. Set map size to 15*10 inches.
2) Draw a scatter plot to show the correlation and trend line of the death rate with the senior resident rate, including the r-square and p-value. Set data point transparency to 50%, regression line as red. Set figure size to 15*10 inches.
Data locations:
1) COVID-19 data case in 2020 (county-level): https://github.com/nytimes/covid-19-data/raw/master/us-counties-2020.csv. This data is for daily accumulated COVID cases and deaths for each county in the US. There are 5 columns: date (format: 2021-02-01), county, state, fips, cases, deaths.
2) Contiguous US county boundary (ESRI shapefile): https://github.com/gladcolor/spatial_data/raw/master/contiguous_counties.zip. The county FIPS column is 'GEOID'.
3) Census data (ACS2020): https://raw.githubusercontent.com/gladcolor/spatial_data/master/Demography/ACS2020_5year_county.csv. The needed columns are: 'FIPS', 'Total Population', 'Total Population: 65 to 74 Years', 'Total Population: 75 to 84 Years', 'Total Population: 85 Years and Over'.
The results are: (a) Solution graph, (b) county level death rate map of the contiguous US, (c) scatter plot showing the association between COVID-19 death rate and the senior resident rate at the county level, (d) assembly program.
Video demonstrations