跳至主要内容

Java IO (2): NIO

Java IO (2): NIO

As we have said in last blog post, simple blocking IO model has its advantages, but also some limitations. In Java 1.4, it introduces the Non-blocking IO model.

Non-Blocking Model

Non-Blocking IO, which is often shorted as NIO, is a different way to abstract storage device:

NIO IO
channel/buffer/selector stream
tend to support both read/write simplex: input or output
support multiple data source in one thread one data source in one thread

Then, we will dive into the three main abstractions of NIO: Channel, Buffer, Selector.

Channel

Channel is the abstraction of connection to entity (hardware device, file, socket etc), which can be used to read/write a block of datas. Channels are analogous to streams, but with a few differences:

  • While streams are typically one-way (read or write), channels support read and write.
  • Channels can be read and written in non-blocking way.
  • Channels always read to, or write from, a buffer. All data that is sent to a channel must first be placed in a buffer. Any data that is read from a channel is read into a buffer.

Buffer

While stream I/O reads a character at a time, channel I/O reads a buffer at a time. Buffer has many sub types which holds different primitive types, like ByteBuffer, CharBuffer, IntBuffer. It can be implemented in heap or direct memory, which has prefix like Heap and Direct separately, like DirectByteBufferR & HeapByteBuffer.

HeapBuffer is easy to understand that this buffer is allocated on heap. What about DirectBuffer? The buffer is directly allocated on native memory, which is not managed by heap. As we have introduced in first blog, the IO task is always done by OS kernel, that means data read from devices are first store in OS namespace, then copy to Java process. Using direct buffer can/may reduce the memory copy penalty (If the OS can read the data directly into this direct buffer, i.e. driver of device using the memory sharing as IPC method), but it also bring more allocation cost. The allocation cost of direct buffer is higher because the memory allocation in Java is a simple bump pointer in most cases, while direct buffer need request memory from OS kernel.

Because the direct buffer is not on heap, it means that it is not fully managed by Java garbage collector. In pre Java 9 environment, when the Java object of direct buffer is phantom-reachable, it will be enqueued in ReferenceQueue and Cleaner thread will run periodically to release the real memory. For more details, you may like to refer to here and here

Except the HeapBuffer and DirectBuffer, there exists a special one – MappedByteBuffer, which represents file & memory at the same time. It can be get from the FileChannel facilities. This channel can be used to map a region of a channel’s file directly into memory, which is called memory mapped file.

Selector

Selector is the core of multiplexed non-blocking IO. If a channel is a SelectableChannel (network related channel, like SocketChannel & DatagramChannel), it can register itself to a Selector. From now on, we can query the Selector about readiness of those channels via select() method. Selector#select can be blocked, and we can continue when any of channel is ready. And it can also be non-blocking, giving us the flexibility of not waiting by using similar select(long timeout) or selectNow().

Conclusion

Put them all in together, we can get a simple NIO program:

public static void main(String[] args) throws IOException {  
  ServerSocketChannel server = ServerSocketChannel.open();  
  server.bind(new InetSocketAddress(9300));  
  server.configureBlocking(false);  
  Selector selector = Selector.open();  
  server.register(selector, SelectionKey.OP_ACCEPT);  

  while (true) {  
    selector.select();  
    for (Iterator<SelectionKey> it = selector.selectedKeys().iterator(); it.hasNext(); ) {  
      SelectionKey next = it.next();  
      it.remove();  
      try {  
        if (next.isAcceptable()) {  
          ServerSocketChannel channel = (ServerSocketChannel) next.channel();  
          SocketChannel accept = channel.accept();  
          // have to config blocking mode or IllegalBlockingModeException  
          accept.configureBlocking(false);  
          SelectionKey key2 = accept.register(selector, SelectionKey.OP_WRITE);  
          ByteBuffer buffer = ByteBuffer.allocate(74); 
          // fill buffer ...
          key2.attach(buffer);  
        } else if (next.isWritable()) {  
          SocketChannel client = ((SocketChannel) next.channel());  
          ByteBuffer buffer = (ByteBuffer) next.attachment();  
          // fill buffer ...  
          client.write(buffer);  
        }  
      } catch (IOException e) {  
        next.cancel();  
        next.channel().close();  
      }  
    }  
  }  
}

It works, but somewhat tedious and easy to go wrong (have to remember to remove SelectionKey when start to handle it). So, netty get its chance.

Ref

Written with StackEdit.

评论

此博客中的热门博文

Spring Boot: Customize Environment

Spring Boot: Customize Environment Environment variable is a very commonly used feature in daily programming: used in init script used in startup configuration used by logging etc In Spring Boot, all environment variables are a part of properties in Spring context and managed by Environment abstraction. Because Spring Boot can handle the parse of configuration files, when we want to implement a project which uses yml file as a separate config file, we choose the Spring Boot. The following is the problems we met when we implementing the parse of yml file and it is recorded for future reader. Bind to Class Property values can be injected directly into your beans using the @Value annotation, accessed via Spring’s Environment abstraction or bound to structured objects via @ConfigurationProperties. As the document says, there exists three ways to access properties in *.properties or *.yml : @Value : access single value Environment : can access multi

Elasticsearch: Join and SubQuery

Elasticsearch: Join and SubQuery Tony was bothered by the recent change of search engine requirement: they want the functionality of SQL-like join in Elasticsearch! “They are crazy! How can they think like that. Didn’t they understand that Elasticsearch is kind-of NoSQL 1 in which every index should be independent and self-contained? In this way, every index can work independently and scale as they like without considering other indexes, so the performance can boost. Following this design principle, Elasticsearch has little related supports.” Tony thought, after listening their requirements. Leader notice tony’s unwillingness and said, “Maybe it is hard to do, but the requirement is reasonable. We need to search person by his friends, didn’t we? What’s more, the harder to implement, the more you can learn from it, right?” Tony thought leader’s word does make sense so he set out to do the related implementations Application-Side Join “The first implementation

Implement isdigit

It is seems very easy to implement c library function isdigit , but for a library code, performance is very important. So we will try to implement it and make it faster. Function So, first we make it right. int isdigit ( char c) { return c >= '0' && c <= '9' ; } Improvements One – Macro When it comes to performance for c code, macro can always be tried. #define isdigit (c) c >= '0' && c <= '9' Two – Table Upper version use two comparison and one logical operation, but we can do better with more space: # define isdigit(c) table[c] This works and faster, but somewhat wasteful. We need only one bit to represent true or false, but we use a int. So what to do? There are many similar functions like isalpha(), isupper ... in c header file, so we can combine them into one int and get result by table[c]&SOME_BIT , which is what source do. Source code of ctype.h : # define _ISbit(bit) (1 << (