跳至主要内容

Java IO (3): Utility & Opt

Java IO (3): Utility & Opt

In Java 1.7, more IO optimizations came. Besides the dawn of AIO (Asynchronous IO), many other new classes also introduced in later version. Most of other classes is the utility class and optimizations of original IO model. In this blog, we will dive into this (and AIO in next blog).

Utility

New Abstraction: Path

The class Path is a programmatic representation of a path in the file system, which can be seen as a better naming and more functional version of File. We said so because File is not a good abstraction. It actually has many different functionalities:

  • Operations about path manipulation: getName(), getParent() etc;
  • Operations about file meta data access (actually the operation over the inode): length(), canExectue(), setWritable() etc;
  • Operations to read/write content of Directory: listFiles(), createNewFile() etc;
  • No operations to read/write a plain File;

So we say that File abstraction has a somewhat misleading naming and violate the coherence of class design.

Path, on the other hand, has two main kinds of operations. The first is syntactic operations (which is about operations that involves manipulating paths without accessing the file system. These are logical manipulations done in memory is like String operation). The second is operations about WatchService, which we will cover it later.

File Utility: Files

Files bears some of the responsibility of File: operations about meta data and file content. It provides a set of isSomething() methods that we can use to perform various kinds of meta data checks before we actually manipulate a file or a directory. It also includes many utility function to read/write content of file, like newDirectoryStream(), lines().

Exists?

An interesting problem about Files.exists()is: !Files.exists(...) is not equivalent to Files.notExists(...), i.e. the notExists() method is not a complement of the exists() method. It is because there exist another state of file unknown.

Watch Service

The Watch Service API was introduced in Java 7 (AIO) as a thread-safe service that is capable of watching objects for changes and events. The most common usage is to monitor a directory for changes to its content through actions such as create, delete, and modify. It can be used in applications like IDE and application with config files, so that it can update the program state when file changed.

Scatter and Gather

As we have said in the first blog of this serial, we should avoid accessing the disk & underlying operating system and avoid method calls. In order to make it, Java provides Vectored IO, also known as scatter/gather IO, which can do multiple IO operation in one method call.

Besides the performance gain, Vectored IO can also makes atomicity (multiple read/write without other threads’ interleave) if specific operating system supports.

File Lock

File locks are held on behalf of the entire Java virtual machine. And it seems not so useful because it is advisory lock but not mandatory:

They are not suitable for controlling access to a file by multiple threads within the same virtual machine.” (Java Platform SE 7 official documentation)

Common Opt Example

We have introduced some basic principles to do IO optimizations in the first blog of this serial and in this blog, we dive into more specific examples.

Random Access Buffer

If we have a large file but we have data access locality, we can use buffer to reduce IO operations with the trade of more memory:

if (pos < startpos || pos > endpos) {  
  long blockstart = (pos / bufsize) * bufsize;  
  int n;  
  try {  
    raf.seek(blockstart);  
    n = raf.read(inbuf);  
  } catch (IOException e) {  
    return -1;  
  }  
  startpos = blockstart;  
  endpos = blockstart + n - 1;  
  if (pos < startpos || pos > endpos) {  
    return -1;  
  }  
}  
return inbuf[(int) (pos - startpos)] & 0xffff;

Compression

Whether compression helps or hurts I/O performance depends a lot on our local hardware config; specifically the relative speeds of the processor and disk drives. Compression using Zip technology implies typically a 50% reduction in data size, but at the cost of some time to compress and decompress.

An example of where compression is useful is in writing to very slow media such as floppy disks. A test using a fast processor (300 MHz Pentium) and a slow floppy (the conventional floppy drive found on PCs), showed that compressing a large text file and then writing to the floppy drive results in a speedup of around 50% over simply copying the file directly to the floppy drive.

Ref

Written with StackEdit.

评论

此博客中的热门博文

Spring Boot: Customize Environment

Spring Boot: Customize Environment Environment variable is a very commonly used feature in daily programming: used in init script used in startup configuration used by logging etc In Spring Boot, all environment variables are a part of properties in Spring context and managed by Environment abstraction. Because Spring Boot can handle the parse of configuration files, when we want to implement a project which uses yml file as a separate config file, we choose the Spring Boot. The following is the problems we met when we implementing the parse of yml file and it is recorded for future reader. Bind to Class Property values can be injected directly into your beans using the @Value annotation, accessed via Spring’s Environment abstraction or bound to structured objects via @ConfigurationProperties. As the document says, there exists three ways to access properties in *.properties or *.yml : @Value : access single value Environment : can access multi

Elasticsearch: Join and SubQuery

Elasticsearch: Join and SubQuery Tony was bothered by the recent change of search engine requirement: they want the functionality of SQL-like join in Elasticsearch! “They are crazy! How can they think like that. Didn’t they understand that Elasticsearch is kind-of NoSQL 1 in which every index should be independent and self-contained? In this way, every index can work independently and scale as they like without considering other indexes, so the performance can boost. Following this design principle, Elasticsearch has little related supports.” Tony thought, after listening their requirements. Leader notice tony’s unwillingness and said, “Maybe it is hard to do, but the requirement is reasonable. We need to search person by his friends, didn’t we? What’s more, the harder to implement, the more you can learn from it, right?” Tony thought leader’s word does make sense so he set out to do the related implementations Application-Side Join “The first implementation

Implement isdigit

It is seems very easy to implement c library function isdigit , but for a library code, performance is very important. So we will try to implement it and make it faster. Function So, first we make it right. int isdigit ( char c) { return c >= '0' && c <= '9' ; } Improvements One – Macro When it comes to performance for c code, macro can always be tried. #define isdigit (c) c >= '0' && c <= '9' Two – Table Upper version use two comparison and one logical operation, but we can do better with more space: # define isdigit(c) table[c] This works and faster, but somewhat wasteful. We need only one bit to represent true or false, but we use a int. So what to do? There are many similar functions like isalpha(), isupper ... in c header file, so we can combine them into one int and get result by table[c]&SOME_BIT , which is what source do. Source code of ctype.h : # define _ISbit(bit) (1 << (