Lake Data Warehouse Architecture for Big Data Solutions
No Thumbnail Available
Date
2020
Journal Title
Journal ISSN
Volume Title
Type
Article
Publisher
SAI
Series Info
International Journal of Advanced Computer Science and Applications;, Vol. 11, No. 8, 2020
Doi
Scientific Journal Rankings
Abstract
Traditional Data Warehouse is a multidimensional
repository. It is nonvolatile, subject-oriented, integrated, time-
variant, and non-operational data. It is gathered from
multiple heterogeneous data sources. We need to adapt
traditional Data Warehouse architecture to deal with the
new challenges imposed by the abundance of data and the
current big data characteristics, containing volume, value,
variety, validity, volatility, visualization, variability, and venue.
The new architecture also needs to handle existing drawbacks,
including availability, scalability, and consequently query
performance. This paper introduces a novel Data Warehouse
architecture, named Lake Data Warehouse Architecture, to
provide the traditional Data Warehouse with the capabilities
to overcome the challenges. Lake Data Warehouse Architecture
depends on merging the traditional Data Warehouse
architecture with big data technologies, like the Hadoop
framework and Apache Spark. It provides a hybrid solution in a
complementary way. The main advantage of the proposed
architecture is that it integrates the current features
in traditional Data Warehouses and big data features
acquired through integrating the traditional Data Warehouse
with Hadoop and Spark ecosystems. Furthermore, it is tailored
to handle a tremendous volume of data while maintaining
availability, reliability, and scalability.
Description
Scopus
Keywords
October University for spark, Hadoop, novel data warehouses architecture, unstructured data, semi- structured data, Traditional data warehouse, big data