Loading…
Spatial big data architecture: From Data Warehouses and Data Lakes to the LakeHouse
The construction of systems supporting spatial data has experienced great enthusiasm in the past, due to the richness of this type of data and their semantics, which can be used in the decision-making process in various fields. Thus, the problem of integrating spatial data into existing databases an...
Saved in:
Published in: | Journal of parallel and distributed computing 2023-06, Vol.176, p.70-79 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The construction of systems supporting spatial data has experienced great enthusiasm in the past, due to the richness of this type of data and their semantics, which can be used in the decision-making process in various fields. Thus, the problem of integrating spatial data into existing databases and information systems has been addressed by creating spatial extensions to relational tables or by creating spatial data warehouses, while arranging data structures and query languages by making them more spatially-aware. With the advent of Big Data, these conventional storage and spatial representation structures are becoming increasingly outdated, and required a new organization of spatial data. Approaches based on distributed storage and data lakes have been proposed, to integrate the complexity of spatial data, with operational and analytical systems which unfortunately quickly showed their limits. Recently the concept of lakehouse was introduced in order to integrate, among other things, the notion of reliability and ACID properties to the volume of data to be managed. This new data architecture is a combination of governed and reliable Data Warehouses and flexible, scalable and cost-effective Data Lakes.
In this paper, we present how traditional approaches of spatial data management in the context of spatial big data have quickly shown their limits. We present a literature overview of these approaches, and how they led to the Data LakeHouse. We detail how the Lakehouse paradigm can be used and extended for managing spatial big data, by giving the different components and best practices for building a spatial data LakeHouse architecture optimized for the storage and computing over spatial big data.
•Limitations of Data Warehouses and Data Lakes for Spatial Big Data.•Characteristics and Architecture of the Data LakeHouse.•Overview of Three Major Open Source LakeHouse Systems.•Challenges in Using Data LakeHouse for Spatial Big Data.•Optimized Data LakeHouse Architecture for Spatial Big Data. |
---|---|
ISSN: | 0743-7315 1096-0848 |
DOI: | 10.1016/j.jpdc.2023.02.007 |