6.2.1 The Challenge of Massivizing IT infrastructure

Course subject(s) 6: Distributed Ecosystems: Putting It All Together

In previous modules, we learned many aspects of distributed systems. They can already satisfy diverse functional requirements, subject to sophisticated non-functional requirements, with advanced resource management and scheduling, providing programming models to application developers or taking care of many operational aspects through good architecture. It seems we have already reached the maximum extent of this concept. And, yet, there is a larger abstraction emerging in the field. What happens when the components of the distributed systems are complex, distributed systems themselves? What happens when the scale and reach of distributed systems are very large, possibly even global-scale? What happens when the workload is massive?

Introducing Massive Processing

Definition: Massive processing is any compute/data processing workload that, in comparison with what a basic computer system can do, raises any or all of the challenges of:

  1. Volume: too much (data) to compute in a reasonable amount of time;
  2. Velocity: data arrives too fast and/or results are needed too quickly to process,
  3. Variety: workloads are too different, and serving one well, with good client-oriented metrics, leaves too many resources unused,
  4. Vicissitude: an arbitrary combination of the Vs from the previous points, occurring at arbitrary moments.

Massive processing in practice: The definition, which includes many undesirable workload properties, appears commonly in practice. It appears especially in applications that include:

  1. Big Science and eScience, where massive scientific endeavors, structured as large-scale research projects with large teams, budgets, and apparatus, conduct science fully integrated with IT, so much so that the science cannot function similarly without the IT infrastructure. Such projects, which include the Square Kilometer Array we discussed in the introduction of the module, the CERN experiments, and others, tend to raise the vicissitude challenge.
  2. Democratic Science, which are projects aligned to provide cheap access to science-grade virtual laboratories to many in the population, can raise volume or variety challenges.
  3. Online Gaming, where a massive number of players spend many hours interacting with the system, raises velocity challenges.
  4. Big Data processing, which automates data-driven decision-making, raises challenges of volume and velocity, and possibly also of variety.
  5. Artificial intelligence and in particular Machine Learning workloads, which automate decision-making and learning at scale, raise volume and velocity challenges.
  6. Serverless computing, in particular function-based (i.e., in FaaS platforms), can raise velocity and variety challenges.

 

Enhanced or New Challenges Associated with Massive Processing

In a Manifesto published in 2022 [1], the community focusing on computer systems research in the Netherlands, has identified four main challenges emerging when society depends on massive processing. Paraphrasing:

  1. Manageability: There is a Cambrian (very rapid) development of new forms (designs) of distributed systems, which need to integrate and inter-operate seamlessly. How to tame the ever-growing complexity, and related human error, in distributed computer systems and networks?
  2. Responsibility: As society is increasingly digitalized, our expectations of how distributed systems behave in practice are constantly increasing and becoming more sophisticated, for example, in terms of performance, availability, durability, ethics, privacy, security, and perspectives. How to realize responsible distributed computing infrastructure, whose operation we can rely on?
  3. Sustainability: Distributed IT infrastructure supporting massive processing is already significantly impacting our resource consumption and its related climate impact. How to reduce and make the most from the energy footprint of (increasing levels of) distributed computing?
  4. Usability: Almost every aspect of our society, and especially its key business, science, engineering, and governance processes are becoming increasingly dependent on distributed IT infrastructure. How to make distributed systems usable and accessible by all? How to enable emerging workloads that depend on distributed systems, such as AI and in particular ML, and bootstrap others, such as quantum computing?

The challenges of generality and responsibility, introduced in the Introduction Module 1 (Section 1.1.2), become acute when related to massive processing. At this scale, we do not have enough resources to replicate them for each kind of project, and sharing across projects, and among teams working on similar projects, becomes essential.

References:

[1]  Alexandru Iosup, Fernando Kuipers, Ana Lucia Varbanescu, Paola Grosso, Animesh Trivedi, Jan S. Rellermeyer, Lin Wang, Alexandru Uta, Francesco Regazzoni, The CompSysNL Community (2022) Future Computer Systems and Networking Research in the Netherlands: A Manifesto. CoRR abs/2206.03259 . (2022)

Creative Commons License
Modern Distributed Systems by TU Delft OpenCourseWare is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Based on a work at https://online-learning.tudelft.nl/courses/modern-distributed-systems/
Back to top