In the age of the Internet of Things and social media platforms, huge amounts of digital data are generated by and collected from many sources, including sensors, mobile devices, wearable trackers and security cameras. These data, commonly referred to as big data, are challenging current storage, processing and analysis capabilities. New models, languages, systems and algorithms continue to be developed to effectively collect, store, analyze and learn from big data.
Programming Big Data Applications introduces and discusses models, programming frameworks and algorithms to process and analyze large amounts of data. In particular, the book provides an in-depth description of the properties and mechanisms of the main programming paradigms for big data analysis, including MapReduce, workflow, BSP, message passing, and SQL-like. Through programming examples it also describes the most used frameworks for big data analysis like Hadoop, Spark, MPI, Hive and Storm. Each of the different systems is discussed and compared, highlighting their main features, their diffusion (both within their community of developers and among users), and their main advantages and disadvantages in implementing big data analysis applications.
Request Inspection Copy
Sample Chapter(s)
Preface
Chapter 1: Introduction
Contents:
- Preface
- About the Authors
- Acknowledgments
- List of Figures
- List of Tables
- Introduction:
- Motivation and Goals
- Main Topics
- Audience and Organization
- Online Resources
- Big Data Concepts:
- Big Data Principles and Features
- Data Science Concepts
- Big Data Storage
- Scalable Data Analysis
- Parallel Computing
- Cloud Computing
- Toward Exascale Computing
- Parallel and Distributed Machine Learning
- Programming Models for Big Data:
- Parallel Programming for Big Data Applications
- The MapReduce Model
- The Workflow Model
- The Message-Passing Model
- The BSP Model
- The SQL-Like Model
- The PGAS Model
- Models for Exascale Systems
- Tools for Big Data applications:
- Introduction
- MapReduce-based Programming Tools
- Workflow-based Programming Tools
- Message Passing-based Programming Tools
- BSP-based Programming Tools
- SQL-like Programming Tools
- PGAS-based Programming Tools
- Comparing Programming Tools:
- Introduction
- Comparative Analysis of the System Features
- Comparative Analysis through Application Examples
- Choosing the Right Framework to Tame Big Data:
- The Input Data
- The Application Class
- The Infrastructure
- Other Factors
- Supplementary Material
- Bibliography
- Index
Readership: Undergraduate and graduate students in computer science, computer engineering, data science, and data engineering. PhD students and researchers in computer science and engineering, and data science.
Domenico Talia is a professor of computer engineering at the University of Calabria and an honorary professor at Amity University. He is a Senior Associate Editor of ACM Computing Surveys, an Associate Editor of The Computer Journal, and a member of the editorial board of Future Generation Computer Systems, IEEE Transactions on Parallel and Distributed Systems, the International Journal of Web and Grid Services, the Journal of Cloud Computing, Big Data and Cognitive Computing, and the International Journal of Next-Generation Computing. His research interests include HPC, Big Data, machine learning, parallel and distributed data analysis, cloud computing, social media analysis, distributed knowledge discovery, peer-to-peer systems, and concurrent programming models. He has authored several books and more than 400 scientific papers.
Paolo Trunfio is an associate professor of computer engineering at the University of Calabria. In 2007 he was a visiting researcher at the Swedish Institute of Computer Science (SICS) in Stockholm. He currently serves as Associate Editor of the Journal of Big Data, IEEE Transactions on Cloud Computing, and ACM Computing Surveys, and is a member of the editorial board of several scientific journals including Future Generation Computer Systems, Big Data and Cognitive Computing, the International Journal of Web Information Systems, and the International Journal of Parallel, Emergent and Distributed Systems. His research interests include cloud computing, Big Data, social media analysis, parallel and distributed knowledge discovery, and peer-to-peer systems.
Fabrizio Marozzo is an assistant professor of computer engineering at the University of Calabria. He received a PhD in Systems and Computer Engineering at the same university. In 2011–2012 he visited the Barcelona Supercomputing Center for a research internship with the Grid Computer Research group in the Computer Sciences department. He sits on the editorial board of several journals, including IEEE Access; IEEE Transactions on Big Data; the Journal of Big Data; Big Data and Cognitive Computing; Algorithms; Frontiers in Big Data; Heliyon; and SN Computer Science. His research interests include big data analysis, social media analysis, high performance computing, cloud and edge computing, and machine learning.
Loris Belcastro is a researcher of computer engineering at the University of Calabria, Italy. He received a PhD in Information and Communication Engineering at the University of Calabria. In 2012 he held a scholarship at the Institute of High-Performance Computing and Networking of the Italian National Research Council (ICAR-CNR). He serves as guest editor for numerous journals, including Future Generation Computer Systems; the Journal of Big Data; Sensors; Algorithms; Applied Sciences; and Frontiers in Big Data. His research interests include cloud and edge computing, big data, social media analysis, parallel and distributed data analysis.
Riccardo Cantini is a computer engineering researcher at the University of Calabria, Italy. He received a PhD in Information and Communication Technologies at the same university. Between 2021-2022 he was a visiting researcher at the Barcelona Supercomputing Center, working with the Workflows and Distributed Computing group in the Computer Sciences department. His research interests include social media and big data analysis, machine and deep learning, natural language processing, opinion mining, topic detection, edge computing, and high-performance data analytics.
Alessio Orsino is currently pursuing a PhD in Information and Communication Technologies at the University of Calabria, Italy. In 2023 he was a visiting researcher at the Department of Computer Science and Technology of the University of Cambridge, collaborating with the Mobile Systems Research Lab. His research interests include big data analysis, parallel and distributed computing, high-performance data analytics, cloud and edge computing, and machine learning.