跳至主要内容

Iterator Design Pattern

Iterator Design Pattern

We have talked the usage of iterator when we need to interact with collections:

  • Not expose internal state to outside world: Increasing cohesion of class and avoid mis-operation;
  • A unified interface to operate on different kinds of operation;
  • Enforce the order of method call;

Iterator Pitfall

public interface Iterator<E> {  
  boolean hasNext();  
  E next(); 
  default void remove() {  
    throw new UnsupportedOperationException("remove");  
  }
  default void forEachRemaining(Consumer<? super E> action) {  
    Objects.requireNonNull(action);  
    while (hasNext())  
      action.accept(next());  
  }
}

Checkout the interface of iterator (two more default method added in Java8), and we can see the common usage routine of Iterator.

    while (hasNext())  
      T t = next(); 
      // do with t

But it is not so friendly for newbie. We may call the next() directly without checking whether hasNext() and surprised by a horrible NoSuchElementException.

And in order to make it, the collection coder have to do many defensive and duplicative coding. They have to consider: What if the user doesn’t call hasNext() before next()? What if it calls hasNext() more than once?

Additionally, the two-method protocol generally requires the statefulness of Iterator, such as peeking ahead one element (and keeping track of whether you’ve already peeked ahead), then reverting back in some cases.

New Iterator

In order to solve the problem, Java8 introduced a new iterator called Spliterator. Although Spliterator is also a basic Iterator, it doesn’t extend Iterator, instead taking a different approach to element access.

As we have said, an Iterator has two methods to access element, hasNext() and next(); But in Java8, we have lambdas in the language enables Spliterator to take an approach to element access that’s generally more efficient — and easier to code correctly. Spliterator has two methods for accessing elements:

boolean tryAdvance(Consumer<? super T> action);
void forEachRemaining(Consumer<? super T> action);

With the help of Consumer, we can iterate the collection item and just return boolean to indicate whether we have more elements. We make it in one method call, more elegant and easier to do it right.

Besides a better design which eliminates the possible error Iterator may bring, Spliterator also born with ability to do things more efficiently. As its name indicates, it can split a iterator into two:

Spliterator<T> trySplit();

Why it can be more efficient to do element access using Spliterator? Because we can do it in parallel by splitting large collection into multiple small piece, which is used in parallelStream().

Spliterator Examples

An ArrayList with more than one element can always be split evenly and easily because it can split using middle of index without changing the underlying array. Similarly, HashSet split collection based on the underlying Node[] and index.

int hi = getFence(), lo = index, mid = (lo + hi) >>> 1;  
return (lo >= mid) ? null : // divide range in half unless too small  
  new ArrayListSpliterator<E>(list, lo, index = mid,  
                                expectedModCount);

But, a LinkedList always splits poorly, because it will copy the element into an array to split, in which way, it can accelerate further split as ArrayList does.

int n = batch + BATCH_UNIT;  
if (n > s)  
    n = s;  
if (n > MAX_BATCH)  
    n = MAX_BATCH;  
Object[] a = new Object[n];  
int j = 0;  
do { a[j++] = p.item; } while ((p = p.next) != null && j < n);  
current = p;  
batch = j;  
est = s - j;  
return Spliterators.spliterator(a, 0, j, Spliterator.ORDERED);

Things goes a little different in TreeSet: because it is implemented using Red-Black Tree, it is always split by tree’s root. Thanks to the balance of Red-Black Tree, it can always done evenly.

This is some code examples to test.

Ref

Written with StackEdit.

评论

此博客中的热门博文

Spring Boot: Customize Environment

Spring Boot: Customize Environment Environment variable is a very commonly used feature in daily programming: used in init script used in startup configuration used by logging etc In Spring Boot, all environment variables are a part of properties in Spring context and managed by Environment abstraction. Because Spring Boot can handle the parse of configuration files, when we want to implement a project which uses yml file as a separate config file, we choose the Spring Boot. The following is the problems we met when we implementing the parse of yml file and it is recorded for future reader. Bind to Class Property values can be injected directly into your beans using the @Value annotation, accessed via Spring’s Environment abstraction or bound to structured objects via @ConfigurationProperties. As the document says, there exists three ways to access properties in *.properties or *.yml : @Value : access single value Environment : can access multi

Elasticsearch: Join and SubQuery

Elasticsearch: Join and SubQuery Tony was bothered by the recent change of search engine requirement: they want the functionality of SQL-like join in Elasticsearch! “They are crazy! How can they think like that. Didn’t they understand that Elasticsearch is kind-of NoSQL 1 in which every index should be independent and self-contained? In this way, every index can work independently and scale as they like without considering other indexes, so the performance can boost. Following this design principle, Elasticsearch has little related supports.” Tony thought, after listening their requirements. Leader notice tony’s unwillingness and said, “Maybe it is hard to do, but the requirement is reasonable. We need to search person by his friends, didn’t we? What’s more, the harder to implement, the more you can learn from it, right?” Tony thought leader’s word does make sense so he set out to do the related implementations Application-Side Join “The first implementation

Implement isdigit

It is seems very easy to implement c library function isdigit , but for a library code, performance is very important. So we will try to implement it and make it faster. Function So, first we make it right. int isdigit ( char c) { return c >= '0' && c <= '9' ; } Improvements One – Macro When it comes to performance for c code, macro can always be tried. #define isdigit (c) c >= '0' && c <= '9' Two – Table Upper version use two comparison and one logical operation, but we can do better with more space: # define isdigit(c) table[c] This works and faster, but somewhat wasteful. We need only one bit to represent true or false, but we use a int. So what to do? There are many similar functions like isalpha(), isupper ... in c header file, so we can combine them into one int and get result by table[c]&SOME_BIT , which is what source do. Source code of ctype.h : # define _ISbit(bit) (1 << (