Design Issues of Distributed Computing

we will briefly look at some of the key design issues that designers of distributed system are considering for building a distributed operating.

Transparency

Probably the single most important issue is how to achieve the single-system image. Transparency can be achieved at two different levels.

• Easiest to do is to hide the distribution from the users. For example, when a UNIX user types make to recompile a large number of files in a directory, he need not be told that all the compilations are proceeding in parallel on different machines and are using a variety of file servers to do it. In terms of commands issued from the terminal and results displayed on the terminal, the distributed system can be made to look just like a single-processor system.The concept of transparency can be applied to several aspects of a distributed system.

Location transparency refers to the fact that in a true distributed system, users cannot tell where hardware and software resources such as CPUs, printers, files, and data bases are located. The name of the resource must not secretly encode the location of the resource.
Migration transparency means that resources must be free to move from one location to another without having their names change.
Replication Transparency means the operating system is free to make additional copies of files and other resources on its own without the users noticing. consider a collection of n servers logically connected to form a ring. Each server maintains the entire directory tree structure but holds only a subset of the files themselves. To read a file, a client sends a message containing the full path name to any of the servers. That server checks to see if it has the file. If so, it returns the data requested. If not, it forwards the request to the next server in the ring, which then repeats the algorithm. In this system, the servers can decide by themselves to replicate any file on any or all servers, without the users having to know about it. Such a scheme is replication transparent because it allows the system to make copies of heavily used files without the users even being aware that this is happening.
Distributed systems usually have multiple, independent users. What should the system do when two or more users try to access the same resource at the same time? For example, what happens if two users try to update the same file at the same time? If the system is concurrency transparent, the users will not notice the existence of other users. One mechanism for achieving this form of transparency would be for the system to lock a resource automatically once someone had started to use it, unlocking it only when the access was finished.
A distributed system is supposed to appear to the users as a traditional, uniprocessor timesharing system. What happens if a programmer knows that his distributed system has 1000 CPUs and he wants to use a substantial fraction of them for a parallel application program. Programmers who actually want to use multiple CPUs for a single problem will have to program this explicitly.

Flexibility

The second key design issue is flexibility. It is important that the system be flexible. There are two types of structure of distributed systems.

Monolithic Kernel: it is traditional kernel that provides most services itself.
Microkernel Kernel: kernel should provide as little as possible services, most

of the operating system services available from user-level servers.

Monolithic Kernel

The monolithic kernel is basically today's centralized operating system augmented with networking facilities and the integration of remote services. Most system calls are made by trapping to the kernel, having the work performed there, and having the kernel return the desired result to the user process. With this approach, most machines have disks and manage their own local file systems. Many distributed systems that are extensions or imitations of UNIX use this approach because UNIX itself has a large, monolithic kernel.

The only potential advantage of the monolithic kernel is performance. Trapping to the kernel and doing everything there may well be faster than sending messages to remote servers.

Microkernel

The microkernel is more flexible because it does almost nothing. It basically provides just four minimal services:
An interprocess communication mechanism.
Some memory management.
A small amount of low-level process management and scheduling.
Low-level input/output.

unlike the monolithic kernel, it does not provide the file system, directory system, full process management, or much system call handling. The services that the microkernel does provide are included because they are difficult or expensive to provide anywhere else. The goal is to keep it small.

All the other operating system services are generally implemented as user-level servers. To look up a name, read a file, or obtain some other service, the user sends a message to the appropriate server using system calls, which then does the work and returns the result.

Advantages of Microkernel

It is highly modular: there is a well-defined interface to each service and every service is equally accessible to every client, independent of location.
it is easy to implement, install, and debug new services, since adding or
changing a service does not require stopping the system and booting a new kernel, as is the case with a monolithic kernel.
It is precisely this ability to add, delete, and modify services that give the microkernel its flexibility.

Reliability

One of the original goals of building distributed systems was to make them more reliable than single-processor systems. The idea is that if a machine goes down, some other machine takes over the job. It is important to distinguish various aspects of reliability.

Availability:

It is refers to the fraction of time that the system is usable. Availability can be enhanced by a design that does not require the simultaneous functioning of a substantial number of critical components. Another tool for improving availability is

redundancy: key pieces of hardware and software should be replicated, so that if one of them fails the others will be able to take up the load.

A highly reliable system must be highly available. if files are stored redundantly on multiple servers, all the copies must be kept consistent. In general, the more copies that are kept, the better the availability, but the greater the chance that they will be inconsistent, especially if updates are frequent.

Security:

Files and other resources must be protected from unauthorized usage. Although the same issue occurs in single-processor systems, in distributed systems it is more severe.

In a single-processor system, the user logs in and is authenticated. From then on, the system knows who the user is and can check whether each attempted access is legal.

In a distributed system, when a message comes into a server asking for something, the server has no simple way of determining who it is from. No name or identification field in the message can be trusted, since the sender may be lying. At the very least, considerable care is required here.

Fault Tolerance:

Computer systems sometimes fail. When faults occur in hardware or software, programs may produce incorrect results or may stop before they have completed the intended computation. Failures in a distributed system are partial – that is, some components fail while others continue to function. Therefore, the handling of failures is particularly difficult. In general, distributed systems can be designed to mask failures, that is, to hide them from the users.

Performance

Always the hidden data in the background is the issue of performance. Building a transparent, flexible, reliable distributed system, more important lies in its performance. In particular, when running a particular application on a distributed

system, it should not be appreciably worse than running the same application on a single processor.

Various performance metrics can be used to measure the performance of the distributed system, such as Response time, throughput (number of jobs per hour), system utilization, and amount of network capacity consumed.

The performance problem is compounded by the fact that communication, which is essential in a distributed system. Generally, communication is quite slow. Sending a message and getting a reply over a LAN takes time. Most of this time is due to unavoidable protocol handling on both ends, rather than the time the bits spend on the wire. Thus, to optimize performance, one often has to minimize the number of messages. The difficulty with this strategy is that the best way to gain performance is to have many activities running in parallel on different processors but doing so requires sending many messages.

Scalability

Distributed systems operate effectively and efficiently at many different scales, ranging from a small intranet to the Internet. A system is described as scalable if it will remain effective when there is a significant increase in the number of resources and the number of users.

Design Issues in Distributed Computing