Lake Data Warehouse Architecture for Big Data Solutions

Saddad, EmadMokhtar, Hoda M. O.El-Bastawissy, AliHazman, Maryam2020-09-252020-09-2520202158-107Xhttp://repository.msa.edu.eg/xmlui/handle/123456789/3786ScopusTraditional Data Warehouse is a multidimensional repository. It is nonvolatile, subject-oriented, integrated, time- variant, and non-operational data. It is gathered from multiple heterogeneous data sources. We need to adapt traditional Data Warehouse architecture to deal with the new challenges imposed by the abundance of data and the current big data characteristics, containing volume, value, variety, validity, volatility, visualization, variability, and venue. The new architecture also needs to handle existing drawbacks, including availability, scalability, and consequently query performance. This paper introduces a novel Data Warehouse architecture, named Lake Data Warehouse Architecture, to provide the traditional Data Warehouse with the capabilities to overcome the challenges. Lake Data Warehouse Architecture depends on merging the traditional Data Warehouse architecture with big data technologies, like the Hadoop framework and Apache Spark. It provides a hybrid solution in a complementary way. The main advantage of the proposed architecture is that it integrates the current features in traditional Data Warehouses and big data features acquired through integrating the traditional Data Warehouse with Hadoop and Spark ecosystems. Furthermore, it is tailored to handle a tremendous volume of data while maintaining availability, reliability, and scalability.enOctober University for sparkHadoopnovel data warehouses architectureunstructured datasemi- structured dataTraditional data warehousebig dataLake Data Warehouse Architecture for Big Data SolutionsArticle