The spatiotemporal index enables fast data retrieval when performing big climate data analysis with MapReduce. However, it does not support flexible query analytics of climate data, and MapReduce job needs to be developed from scratch for each data analysis operation. Leveraging the index, we propose a query analytical framework to conduct complex climate data analysis with SQL-like queries while delivering high performance.
Query analytical framework
In this framework, massive climate datasets are abstracted as a pool of grids. A grid here refers to a two-dimensional image with each pixel representing the value of a specific climate variable at a specific spatial location and time. Such a grid-based abstraction offers an integrative space and time framework for managing, querying and processing big climate data. Based on the grid abstraction, a novel Grid Transformation concept is proposed to view climate analysis from a new perspective, that is, complex climate analysis can be conducted by applying a series of atomic grid transformations to a large grid pool (abstracted from big climate data). Finally, these atomic grid transformations are implemented as user-defined-functions to be embedded in the SQL-style queries. The queries are executed in parallel as MapReduce jobs within a Hadoop high-performance environment.
Following example illustrates how to build a query to obtain the local z-score (anomaly) maps based on the MERRA Land data by chaining a series of grid transformations.
Publication:
Li Z., Huang Q., Carbone G., Hu F. (2017) A High Performance Query Analytical Framework for Supporting Data-intensive Climate Studies, Computers, Environment and Urban Systems, 62(3), 210-221