Abstract:
Information security issues in public clouds are amplified by Big Data, s unique security challenges originating from its volumetric data storage from a wide variety of sources and structures. Apache Hadoop (AH) framework is driving the Big Data paradigm is for its effectiveness in processing large datasets. AH is a typical Platform-as-a-Service cloud computing model. It is centered on the underlying Hadoop Distributed File System (HDFS). AH was originally designed to run in a well controlled private computing environment. However, when AH operates in a public cloud in large clusters, its built-in security mechanisms are subject to different types of threats. Motivated by such fundamental design concept and deployment computing environment, and for HDFS being a core component of AH, the contribution of this paper is to identify, expose, and discuss security threats and vulnerabilities in public cloud-based HDFS. © 2019 IEEE.