跳至主要内容

Interprocess Communication in Distributed System

We are familiar with the inter-process communications of OS:

  • Named Pipe
  • Anonymous Pipe
  • Message Queue
  • Soket
  • File
  • Signal
  • …etc

This lists can be longer, but all of this can be organized by two categories:

  • File Type:
    • simple file
    • named pipe
    • network
  • Memory Type:
    • share memory
    • message queue
    • signal

But in a distributed system, memory and disk can’t be shared (at least can’t be shared without software assistant), the network become the single and more important way to achieve it. Today, we focus detail on how inter-process communication works in distributed system.

Definition

The definition of inter-process communication is message passing between a pair of process, either in the same host or not. In a distributed system, we can’t make sure the communication happen in same host, so we can hide the differences by using network.

The following is some design considerations of inter-process communications :

  • destinations: how to resolve another process?
  • reliability: should this approach handle message omission and host crash?
  • ordering: should the approach make sure the order of message?

Types

When it comes to the specific ways to achieve inter-process communications, we can have following options:

  • Socket: the abstraction of both UDP/TCP
  • Higher abstraction:
    • Indirect message
    • RMI

The higher abstraction way using socket internally and provide easier interfaces to user. This time, we focus only on basic and lower level – socket.

Socket Details

The socket actually has two types:

UDP – which uses datagram, i.e. has message boundary, so sending message will have no buffer, will be sent immediately

TCP – stream based : producer and consumer – no message boundary. So TCP may buffer some message, and we can force by flush – this behavior stems from TCP’s stream attribute.

Java API for Socket

The API for stream communication assumes that when a pair of processes are establishing a connection, one of them plays the client role and the other plays the server role, but thereafter they could be peers.

Java socket related classes simulate the differences between TCP and UDP, represent them using stream and datagram.

Stream in Java network API is simplex, only in one direction, because the input buffer and output buffer is separated and we should notice that the actual underlying TCP connection is duplex.

Data Representation

Irrespective of the form of communication used, the data structures must be flattened (converted to a sequence of bytes) before transmission and rebuilt on arrival.
Bytes is the minimal unit of data transmission and never change in transmission, what changed is:

  • bytes order: The individual primitive data items transmitted in messages can be data values of many different types, and not all computers store primitive values such as integers in the same order. The representation of floating-point numbers also differs between architectures. There are two variants for the ordering of integers: the so-called big-endian order, in which the most significant byte comes first; and little-endian order, in which it comes last.
  • how bytes are interpreted: Another issue is the set of codes used to represent characters: for example, the majority of applications on systems such as UNIX use ASCII character coding, taking one byte per character, whereas the Unicode standard allows for the representation of texts in many different languages and takes two bytes per character.

Three external data format

  • COBAR – binary data, not contain type info, it is assumed that client and server has prior knowledge of order and types info
  • Java serializable – binary data, contain the type info because it is also used in disk storage
  • XML – primitive is converted into textual format, which is generaller longer than binary format -> protocol buffer & JSON; self explain

More: Multicast

Some times we need multicast in inter-process communication, which is implemented by IP multicast or some complex protocol which provides:

  • Fault tolerance by replicated service
  • Service discovery
  • Better performance
  • Propagation of event notification

Ref

Written with StackEdit.

评论

此博客中的热门博文

Spring Boot: Customize Environment

Spring Boot: Customize Environment Environment variable is a very commonly used feature in daily programming: used in init script used in startup configuration used by logging etc In Spring Boot, all environment variables are a part of properties in Spring context and managed by Environment abstraction. Because Spring Boot can handle the parse of configuration files, when we want to implement a project which uses yml file as a separate config file, we choose the Spring Boot. The following is the problems we met when we implementing the parse of yml file and it is recorded for future reader. Bind to Class Property values can be injected directly into your beans using the @Value annotation, accessed via Spring’s Environment abstraction or bound to structured objects via @ConfigurationProperties. As the document says, there exists three ways to access properties in *.properties or *.yml : @Value : access single value Environment : can access multi

Elasticsearch: Join and SubQuery

Elasticsearch: Join and SubQuery Tony was bothered by the recent change of search engine requirement: they want the functionality of SQL-like join in Elasticsearch! “They are crazy! How can they think like that. Didn’t they understand that Elasticsearch is kind-of NoSQL 1 in which every index should be independent and self-contained? In this way, every index can work independently and scale as they like without considering other indexes, so the performance can boost. Following this design principle, Elasticsearch has little related supports.” Tony thought, after listening their requirements. Leader notice tony’s unwillingness and said, “Maybe it is hard to do, but the requirement is reasonable. We need to search person by his friends, didn’t we? What’s more, the harder to implement, the more you can learn from it, right?” Tony thought leader’s word does make sense so he set out to do the related implementations Application-Side Join “The first implementation

Implement isdigit

It is seems very easy to implement c library function isdigit , but for a library code, performance is very important. So we will try to implement it and make it faster. Function So, first we make it right. int isdigit ( char c) { return c >= '0' && c <= '9' ; } Improvements One – Macro When it comes to performance for c code, macro can always be tried. #define isdigit (c) c >= '0' && c <= '9' Two – Table Upper version use two comparison and one logical operation, but we can do better with more space: # define isdigit(c) table[c] This works and faster, but somewhat wasteful. We need only one bit to represent true or false, but we use a int. So what to do? There are many similar functions like isalpha(), isupper ... in c header file, so we can combine them into one int and get result by table[c]&SOME_BIT , which is what source do. Source code of ctype.h : # define _ISbit(bit) (1 << (