Lake Data Warehouse Architecture for Big Data  Solutions

Traditional Data Warehouse is a multidimensional repository. It is nonvolatile, subject-oriented, integrated, time- variant, and non-operational data. It is gathered from multiple heterogeneous data sources. We need to adapt traditional Data Warehouse architecture to deal with the new challenges imposed by the abundance of data and the current big data characteristics, containing volume, value, variety, validity, volatility, visualization, variability, and venue. The new architecture also needs to handle existing drawbacks, including availability, scalability, and consequently query performance. This paper introduces a novel Data Warehouse architecture, named Lake Data Warehouse Architecture, to provide the traditional Data Warehouse with the capabilities to overcome the challenges. Lake Data Warehouse Architecture depends on merging the traditional Data Warehouse architecture with big data technologies, like the Hadoop framework and Apache Spark. It provides a hybrid solution in a complementary way. The main advantage of the proposed architecture is that it integrates the current features in traditional Data Warehouses and big data features acquired through integrating the traditional Data Warehouse with Hadoop and Spark ecosystems. Furthermore, it is tailored to handle a tremendous volume of data while maintaining availability, reliability, and scalability.

Description

Scopus

Keywords

October University for spark, Hadoop, novel data warehouses architecture, unstructured data, semi- structured data, Traditional data warehouse, big data

Full Text link

http://repository.msa.edu.eg/xmlui/handle/123456789/3786

Collections

Faculty Of Computer Science Research Paper

Full item page

Lake Data Warehouse Architecture for Big Data Solutions

Date

Authors

Journal Title

Journal ISSN

Volume Title

Type

Publisher

Series Info

Doi

Scientific Journal Rankings

Abstract

Description

Keywords

Citation

Full Text link

Collections