Lake Data Warehouse Architecture for Big Data Solutions

No Thumbnail Available

Date

2020

Journal Title

Journal ISSN

Volume Title

Type

Article

Publisher

SAI

Series Info

International Journal of Advanced Computer Science and Applications;, Vol. 11, No. 8, 2020

Doi

Abstract

Traditional Data Warehouse is a multidimensional repository. It is nonvolatile, subject-oriented, integrated, time- variant, and non-operational data. It is gathered from multiple heterogeneous data sources. We need to adapt traditional Data Warehouse architecture to deal with the new challenges imposed by the abundance of data and the current big data characteristics, containing volume, value, variety, validity, volatility, visualization, variability, and venue. The new architecture also needs to handle existing drawbacks, including availability, scalability, and consequently query performance. This paper introduces a novel Data Warehouse architecture, named Lake Data Warehouse Architecture, to provide the traditional Data Warehouse with the capabilities to overcome the challenges. Lake Data Warehouse Architecture depends on merging the traditional Data Warehouse architecture with big data technologies, like the Hadoop framework and Apache Spark. It provides a hybrid solution in a complementary way. The main advantage of the proposed architecture is that it integrates the current features in traditional Data Warehouses and big data features acquired through integrating the traditional Data Warehouse with Hadoop and Spark ecosystems. Furthermore, it is tailored to handle a tremendous volume of data while maintaining availability, reliability, and scalability.

Description

Scopus

Keywords

October University for spark, Hadoop, novel data warehouses architecture, unstructured data, semi- structured data, Traditional data warehouse, big data

Citation