1.2.2 Shared State and Network Effects
In traditional, single machine programming, we can build applications with a high degree of concurrency because of the many compute cores that we have in modern systems. The way how we leverage these resources is through threads. The communication between threads in such programs is often implicit because threads share a single common address space since they belong to the same process. And since we are on a single machine, all threads can access the same shared physical memory. The main issue that we have is managing the concurrent access and avoiding race conditions, which we do through thread synchronization, locking, or using hardware support for atomic operations.
Distributed systems are fundamentally different from single machines because of their lack of shared memory. But since in most situations we still want to use the many different machines in a coordinated way and collaborating on a shared problem, we need to use explicit communication to realize this. So in essence, we need to send messages through the network in every place where a concurrent program on a single machine would be able to access shared memory.
Traditional networks have distinct disadvantages in comparison to the implicit communication on a single machine. Let us consider the following, abstract example of network communication:
In this picture, we see a sender sending a request to a receiver. The receiver processes the request and sends back a response. If the network works flawlessly, this is a successful communication. Under certain circumstances, however, this message can get lost, or experience long delays. Because of this, the receiver might want to immediately send an acknowledgement so that the sender knows that its request was received. Now, the acknowledgement is again a message, which can be lost or delayed. The same applies to the response message, and a potential acknowledgement to let the receiver know that it was successfully received.
No matter how we design the protocol, we have do deal with the asynchronicity of the physical network and the potential for message loss, e.g., due to router queues being filled up, at which point messages get dropped. Furthermore, since both machines are independent from each other, they can also fail independently, or the network in-between could have an outage. In essence, there is an unbounded time of uncertainty for a sender of a message in an asynchronous network because we cannot distinguish between a lost message and a long delayed message.
The practical solution to this problem is to implement a timeout value at which point we will declare a delayed message to be lost. This, however, creates issues when the operation triggered by the message is not idempotent because, in this case, we have to avoid that it gets triggered more than once. In real protocols, we might want to do that using message IDs to filter out duplicates.
Despite all efforts, there are additional challenges that come with a distributed architecture. In comparison to shared memory, distributed memory can consist of multiple replicas of the same data, at which point we have potentially a consistency problem because updates are propagated through messages and messages could get lost.
From these considerations, you can see that having a distributed system is never a goal in itself, it is a deliberate choice to achieve properties that are difficult or impossible to achieve with a single machine. Examples are scalability for very large problems, or resilience through redundance for which we need multiple instances. By nature, single centralized systems have simpler failure models because they tend to fail atomically (all or nothing), they have lower latency because of the lack of networks involved, and they are easier to program because we have shared memory. So you never want a distributed system but you often need one. This is why learning in this course about the different design considerations and implications is so valuable.
Modern Distributed Systems by TU Delft OpenCourseWare is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Based on a work at https://online-learning.tudelft.nl/courses/modern-distributed-systems/