跳至主要内容

Interprocess Communication in Distributed System

We are familiar with the inter-process communications of OS:

  • Named Pipe
  • Anonymous Pipe
  • Message Queue
  • Soket
  • File
  • Signal
  • …etc

This lists can be longer, but all of this can be organized by two categories:

  • File Type:
    • simple file
    • named pipe
    • network
  • Memory Type:
    • share memory
    • message queue
    • signal

But in a distributed system, memory and disk can’t be shared (at least can’t be shared without software assistant), the network become the single and more important way to achieve it. Today, we focus detail on how inter-process communication works in distributed system.

Definition

The definition of inter-process communication is message passing between a pair of process, either in the same host or not. In a distributed system, we can’t make sure the communication happen in same host, so we can hide the differences by using network.

The following is some design considerations of inter-process communications :

  • destinations: how to resolve another process?
  • reliability: should this approach handle message omission and host crash?
  • ordering: should the approach make sure the order of message?

Types

When it comes to the specific ways to achieve inter-process communications, we can have following options:

  • Socket: the abstraction of both UDP/TCP
  • Higher abstraction:
    • Indirect message
    • RMI

The higher abstraction way using socket internally and provide easier interfaces to user. This time, we focus only on basic and lower level – socket.

Socket Details

The socket actually has two types:

UDP – which uses datagram, i.e. has message boundary, so sending message will have no buffer, will be sent immediately

TCP – stream based : producer and consumer – no message boundary. So TCP may buffer some message, and we can force by flush – this behavior stems from TCP’s stream attribute.

Java API for Socket

The API for stream communication assumes that when a pair of processes are establishing a connection, one of them plays the client role and the other plays the server role, but thereafter they could be peers.

Java socket related classes simulate the differences between TCP and UDP, represent them using stream and datagram.

Stream in Java network API is simplex, only in one direction, because the input buffer and output buffer is separated and we should notice that the actual underlying TCP connection is duplex.

Data Representation

Irrespective of the form of communication used, the data structures must be flattened (converted to a sequence of bytes) before transmission and rebuilt on arrival.
Bytes is the minimal unit of data transmission and never change in transmission, what changed is:

  • bytes order: The individual primitive data items transmitted in messages can be data values of many different types, and not all computers store primitive values such as integers in the same order. The representation of floating-point numbers also differs between architectures. There are two variants for the ordering of integers: the so-called big-endian order, in which the most significant byte comes first; and little-endian order, in which it comes last.
  • how bytes are interpreted: Another issue is the set of codes used to represent characters: for example, the majority of applications on systems such as UNIX use ASCII character coding, taking one byte per character, whereas the Unicode standard allows for the representation of texts in many different languages and takes two bytes per character.

Three external data format

  • COBAR – binary data, not contain type info, it is assumed that client and server has prior knowledge of order and types info
  • Java serializable – binary data, contain the type info because it is also used in disk storage
  • XML – primitive is converted into textual format, which is generaller longer than binary format -> protocol buffer & JSON; self explain

More: Multicast

Some times we need multicast in inter-process communication, which is implemented by IP multicast or some complex protocol which provides:

  • Fault tolerance by replicated service
  • Service discovery
  • Better performance
  • Propagation of event notification

Ref

Written with StackEdit.

评论

此博客中的热门博文

Spring Boot: Customize Environment

Spring Boot: Customize Environment Environment variable is a very commonly used feature in daily programming: used in init script used in startup configuration used by logging etc In Spring Boot, all environment variables are a part of properties in Spring context and managed by Environment abstraction. Because Spring Boot can handle the parse of configuration files, when we want to implement a project which uses yml file as a separate config file, we choose the Spring Boot. The following is the problems we met when we implementing the parse of yml file and it is recorded for future reader. Bind to Class Property values can be injected directly into your beans using the @Value annotation, accessed via Spring’s Environment abstraction or bound to structured objects via @ConfigurationProperties. As the document says, there exists three ways to access properties in *.properties or *.yml : @Value : access single value Environment : can access multi...

LevelDB Source Reading (4): Concurrent Access

In this thread, we come to the issue of concurrent access of LevelDB. As a database, it can be concurrently accessed by users. But, it wouldn’t be easy to provide high throughput under product load. What effort does LevelDB make to achieve this goal both in design and implementation? Goal of Design From this github issue , we can see LevelDB is designed for not allowing multi-process access. this (supporting multiple processes) doesn’t seem like a good feature for LevelDB to implement. They believe let multiple process running would be impossible to share memory/buffer/cache, which may affect the performance of LevelDB. In the case of multiple read-only readers without altering the code base, you could simply copy the file for each reader. Yes, it will be inefficient (though not on file systems that dedupe data), but then again, so would having multiple leveldb processes running as they wouldn’t be able to share their memory/buffer/etc. They achieve it by adding a l...

LevelDB Source Reading (3): Compaction

LevelDB Source Reading (3): Compaction In the last blog that analyzes read/write process of Leveldb, we can see writing only happens to log file and memory table, then it relies on the compaction process to move the new updates into persistent sorted table for future use. So the compaction is a crucial part for the design, and we will dive into it in this blog. Compaction LevelDB compacts its underlying storage data in the background to improve read performance. The upper sentence is cited from the document of Leveldb , and we will see how it is implemented via code review. Background compaction // db_impl.cc void DBImpl :: MaybeScheduleCompaction ( ) { // use background thread to run compaction env_ - > Schedule ( & DBImpl :: BGWork , this ) ; } Two main aspects // arrange background compaction when Get, Open, Write void DBImpl :: BackgroundCompaction ( ) { // compact memtable CompactMemTable ( ) ; // compact ...