Elasticsearch Problem Lists(2): With Spring

In last blog, we have introduced some problems about Elasticsearch basic concepts confusions and some config problems we met. Now, we come to the problems came cross when using Elasticsearch in application with the help of Spring Data Elasticsearch.

With Spring

After the understanding of Elasticsearch and configuration of server, we need to write code to interact with it. We choose the Spring Data Elasticsearch framework to assist our implementations. So the following is the problem we met when using Spring to access Elasticsearch.

spring-boot-starter-data-elasticsearch: 1.5.3-RELEASE
Elasticsearch server: 2.4.x

Connection

Clients

When using Java to access Elasticsearch, we have two types of clients to choose to communicate with server:

Transport Client: this client won’t be part of cluster, It just communicate with server
Node Client: this client will be part of cluster – store data shards and respond search request

In our cases, we just want to communicate with a dedicated cluster of Elasticsearch, so we choose the Transport Client.

In order to config a Transport Client, we can do it in Java code:

@Bean
public Client elasticClient() {
    Settings settings = Settings.builder().put(ClusterName.SETTING, "demo").build();
    return TransportClient.builder().settings(settings).build()
            .addTransportAddress(new InetSocketTransportAddress(new InetSocketAddress("xxx", 9300)));
}

or even cleaner, using Spring Boot’s property file:

spring.data.elasticsearch.cluster-name=demo
# the following determine the client type to be transport, rather than node
spring.data.elasticsearch.cluster-nodes=xxx:9300

Index Definition

Dup `id`

If we just mark our id field with @Id, we will have a dup id in _source:

"_index" : "file-8947",
"_type" : "file",
"_id" : "7685",
"_source" : {
  "id" : "7685",
  "name" : "directory3",
  "uploadRoleId" : "4353",
  "type" : 1
}

and because Spring Data use the Jackson to transform object to JSON:

indexRequestBuilder.setSource(resultsMapper.getEntityMapper().mapToString(query.getObject()));

we can mark our id with @JsonIgnore to remove the field in source.

`DateFormat`

When define Field, we will notice that there is DateFormat to fill. This represent the format we want to use to interpret the JSON we send to Elasticsearch.

In JSON documents, dates are represented as strings. Elasticsearch uses a set of the formats to recognize and parse these strings into a long value representing milliseconds-since-the-epoch in UTC.

`FieldIndex`: `no` vs `not_analyzed`

FieldIndex can be used to specify how Elastcisearch will handle this field:

analyzed: it is a string type and will be analyzed by analyzer;
not_analyzed: it is a string, but no need to be analyzed. It will be stored as exact value;
no: not index this field, i.e. not searchable by filter or query etc

In the latest Spring Data Elasticsearch build, this element has been replaced by a boolean to represent whether to index and types to specify analyzed string and exact value string.

Type Auto Detection Failure

In our project, if we don’t specify the type but with index type specified, like the following shows:

@Field(index = FieldIndex.not_analyzed)
private String modifier;

Spring will log a exception message:

AbstractElasticsearchRepository : failed to load elasticsearch nodes : org.elasticsearch.index.mapper.MapperParsingException: No type specified for field [modifier]

Repo Definition

Class Cast When Using `Slice`

When use Slice as the return value of our query as the Spring Data document suggests, Spring will complain the class cast exception.

After reading the source code, we find that Spring not take this query as a paged query:

public final boolean isPageQuery() {
    return org.springframework.util.ClassUtils.isAssignable(Page.class, this.unwrappedReturnType);
}

Because Slice is the super type of Page, it isn’t assignable to Page.class. As a result, it is implemented using queryForObject which only return one result and causes the exception.

`findBy` vs `findAllBy`

Spring seems not making differences of the following two kinds of notions:

List<Announcement> findByTitle(String title);
List<Announcement> findAllByTitle(String title);

They will produce exact same JSON bodies to send.

Find by `_all`

If we want to match content of _all meta field, we can’t write repo method like a common field, because Spring can’t find that field in our Document class.

We can do like following as a workaround:

@Query("{\"bool\" : {\"must\" : [ {\"match\" : {\"?0\" : \"?1\"}} ]}}")
Page<MyDoc> getbyAll(String field, String query, Pageable pageable);

@Query

The content of “query” annotation includes brace:

{"bool" : {"should" : [ {"match" : {"?0" : "?1"}} ]}}"

Otherwise, if we miss a { like following:

@Query(" \"multi_match\": {\n" +
        "        \"query\":    \"?0\",\n" +
        "        \"fields\":   [ \"name^2\", \"path\" ]\n" +
        "    }" +
        "}")
Page<Affair> findByNameOrPath(String info, Pageable pageable);

It will miss intercept the query and simple query becomes a strange query_binary

nested: SearchParseException[failed to parse search source [{"from":0,"size":10,"query_binary":"..."}]]

Add Implementation in Repo

Spring Data Elasticsearch is very convenient way to implement many simple queries, in which it will auto generate query by method name and parameter:

Page<RolePO> findByTaskIdAndTitle(Long taskId, String title, Pageable pageable);

Or by specify string query:

@Query("{\"bool\" : {\"should\" : [ {\"match\" : {\"?0\" : \"?1\"}} ]}}")
Page<AnnouncementPO> findByAll(String field, String info, Pageable pageable);

But sometimes, we need more complex query and the functionality of Spring Data Elasticsearch at the same time. We can add some method by:

interface UserRepositoryCustom {
  public void someCustomMethod(User user);
}

@Component
class UserRepositoryImpl implements UserRepositoryCustom {
  public void someCustomMethod(User user) {
  }
}

interface UserRepository extends CrudRepository<User, Long>, UserRepositoryCustom {
  // Declare query methods here
}

Two points to notice:

the Impl postfix of the name on custom repo compared to the core repository interface;
@Component to let implementation to be found by Spring;

Searching

Nested Class Searching

Saying we have Tag as the nested object in class A, If we want to searching tag to find outer class A, we have to add toString() method in Tag like following:

@Field(type = FieldType.Nested)
private List<Tag> tags;


public class Tag {

    @Field(type = FieldType.String, index = FieldIndex.not_analyzed)
    private String des;

    // **have to add toString()**
}

Otherwise, Spring will fail to convert the query:

 "query_string" : {
   "query" : "com.superid.query.Tag@5d1d9d73",
   "fields" : [ "tags" ]
 }

It is because Spring use the toString() to construct the json query:

CriteriaQueryProcessor#processCriteriaEntry(..)

private QueryBuilder processCriteriaEntry(Criteria.CriteriaEntry entry,/* OperationKey key, Object value,*/ String fieldName) {
    Object value = entry.getValue();
    if (value == null) {
        return null;
    }
    OperationKey key = entry.getKey();
    QueryBuilder query = null;

    String searchText = StringUtils.toString(value);
    //...
}

Page Count From 0 or 1?

When we use Page as the return value of our query, we should send a Pagable parameter to specify the page. What should be noticed is the count of page starts from 0.

Completion

In this blog, we have introduce how to do auto completion in pure Elasticsearch. Here, we focus on how to do with Spring Data Elasticsearch.

Mapping

In order to use auto complete features, we can use json file or @CompletionField to define define

Concise way of using annotation:

@CompletionField()
private Completion suggest;

Or more powerful but tedious way:

{
    "file" : {
        "properties" : {
            "title" : { "type" : "string" },
            "suggest" : { "type" : "completion",
                "analyzer" : "simple",
                "search_analyzer" : "simple"
            }
        }
    }
}

// Then we can refer to the mapping by `@Mapping`:
@Setting(settingPath = "elasticsearch-settings.json")
@Document(indexName = "file", type = "file", shards = 1, replicas = 0, createIndex = true,   refreshInterval = "-1")
@Mapping(mappingPath = "/mappings/file-mapping.json")
public class file {...}

Index

We can index as our common entity:

esTemplate.save(new File(...));

Query

The ElasticsearchTemplate has the method for query suggest:

public SuggestResponse suggest(SuggestBuilder.SuggestionBuilder<?> suggestion, String... indices);

Dynamic Index Creation/Using

Elasticsearch recommend the use of rolling index, which can be used to scale our application. Spring Data Elasticsearch now don’t have the direct support.
But we can have workaround using Spring EL.

First, we define a bean to used as suffix of index:

@Bean
public Suffix suffix(){
    return suffix;
}

Then, we can use Spring Expression Language to define our index name:

@Document(indexName = "role_#{suffix.toString()}", type = "role")
public class Role {}

Now, we can change the suffix to access different index:

suffix.setSuffix("123");
roleRepo.save(new Role("7", "后端开发", false, 2L, taskId));
suffix.setSuffix("234");
roleRepo.save(new Role("3", "前端架构", false, 2L, taskId));

A Suffix example can be found here.

Furthermore, I have already submit a pull request of this kind of utility class to assist rolling index.

Search Across Index

If we have to search across multiple index, Spring can’t generate method for us. We have to write code manually:

SearchQuery searchQuery = new NativeSearchQueryBuilder()
        .withQuery(matchQuery("title", query))
        .withIndices("role_*", "-role_xxx")
        .build();

Partial Update

Sometimes, we want to do partial update:

POST /website/blog/1/_update
{
   "doc" : {
      "tags" : [ "testing" ],
      "views": 0
   }
}

IndexRequest indexRequest = new IndexRequest();
indexRequest.source("name", file.getName());
UpdateQuery updateQuery = new UpdateQueryBuilder().withId(file.getId())
    // class is used to get index and type
    .withClass(FilePO.class)
    // indexRequest will be used as 'doc'
    .withIndexRequest(indexRequest).build();
template.update(updateQuery);

Debug

In this section, we are going to introduce some necessary utilities to debug Elasticsearch.

Explain

curl -XGET 'localhost:9200/_search?pretty' -H 'Content-Type: application/json' -d'
{
    "explain": true,
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}'

Log

Setting logging level to DEBUG or lower, which will let Spring print more info and stacktrace when exception occurs.
Using the Node Client of Elasticsearch will report the error of Elasticsearch’s internal error through stacktrace.

Samples

If we can’t find how a functionality is achieve, we can find samples in following places:

Sample project
Test cases in repos

Ref

Written with StackEdit.

On teh way

Blog Search