Authors: MUHAMMED NUMAN İNCE, MELİH GÜNAY, JOSEPH LEDET
Abstract: In recent years, the need for the ability to work remotely and subsequently the need for the availability of remote computer-based systems has increased substantially. This trend has seen a dramatic increase with the onset of the 2020 pandemic. Often local data is produced, stored, and processed in the cloud to remedy this flood of computation and storage needs. Historically, HPC (high performance computing) and the concept of big data have been utilized for the storage and processing of large data. However, both HPC and Hadoop can be utilized as solutions for analytical work, though the differences between these may not be obvious. Both use parallel processing techniques and offer options for data to be stored in either a centralized or distributed manner. Recent studies have focused on using a hybrid approach with both technologies. Therefore, the convergence between HPC and big data technologies can be filled with distributed computing machines at the layer described. This paper results from the motivation that there exists a necessity for a distributed computing framework that can scale from SOC (system on chip) boards to desktop computers and servers. For this purpose, in this article, we propose a distributed computing environment that can scale up to devices with heterogeneous architecture, where devices can set up clusters with resource-limited nodes and then run on top of. The solution can be thought of as a minimalist hybrid approach between HPC and big data. Within the scope of this study, not only the design of the proposed system is detailed, but also critical modules and subsystems are implemented as proof of concept.
Keywords: Distributed and parallel computing, big data, high performance computing, distributed programming, resource management
Full Text: PDF