跳至主要内容

Elasticsearch Adventure(2): Better Practice?

With the development and testing of search engine, Tony met more problems. Now, he learned much basic things about Elasticsearch, and he is seeking better practice for problems.

Query Condition Combination

Tony, “Mentor, I am using rest API of Elasticsearch to assist the access of Elasticsearch for the time being. Now we have a requirement of very flexible searching: user can choose and combine different fields using checkbox to search. I am wondering how to implement it gracefully. For now, I can only think about using different pre-write query string of ES, but it seems cumbersome and I think I may need some string builder to compose a customized query.”

Tim, “Em, maybe QueryBuidlers of Elasticsearch Java API is what you want. Let’s say we want a bool query1, and the condition for query may exists or not. So we can do like following:”

BoolQueryBuilder bool = boolQuery()
    .should(matchQuery("must_have1name", info.getQuery()))
    .should(nestedQuery("nested1", matchQuery("nested1.des", info.getQuery()), ScoreMode.Avg));
if (info.A() != null) {
  bool.filter(termQuery("a", info.A()));
}
if (info.B() != null) {
  bool.filter(termQuery("b", info.B()));
}

y Tony, “Great, this is what I want. I can use QueryBuilder to compose any query user request. Thanks.”

Tree Structure

Tony, “Mentor, I met another problem: I have to implement foreign key like relationships in ES.”

Tim, “Have you tried parent-child relationship and nested object?”

Tony, “Yes, parent-child relationship is suitable for one-to-many mapping but can’t refer the same time as parent. For the same reason, we can’t use nested object.”

Tim, “So, you mean you are storing some self referential document. For the time being, ES doesn’t support this kind of document. Actually, ES (as far as the owner) doesn’t recommend any relationship between different index. Every document should be self contained. Query one index is enough.”

Tony, “So do I have to solve this in business logic layer?”

Tim, "Maybe not. In very special cases, like this file system example, we can use customized analyzer and [multi-fields](https://www.elastic.co/guide/en/elasticsearch/creereerfied to achieve sort of self reference. For more information about relationships in ES, you may like to read here.

Update by Query

Today, Tony invite Tim to review his code. In viewing some part of business logic, Tim found some code need to improve.

// query `BPO` whose `aid` is what we want
List<BPO> bPOs = bRepo.findByAId(aid);
List<String> ids = bPOs.stream().map(BPO::getId).collect(Collectors.toList());
// use `BPO`'s id to update ...
for (...) {
	UpdateQuery updateQuery = new UpdateQueryBuilder()
	    // class is used to infer `index` and `type`
	    .withClass(BPO.class)
	    .withId(bpo.getId())
	    // indexRequest will be used as `doc`
	    .withIndexRequest(indexRequest).build();
	template.update(updateQuery);
}

Tim, “This part of code seems need some improvements. What do you think?”

Tony, “Oh, I can use batch request to update them in one request, my mistakes.”

Tim, "Great. Only one more questions, can we do better? Can we not fetch BPO but update by query, like what we do in SQL? ES actually if (info.C() != null) { bool.filter(termQuery(“c”, info.C())); } if (info.D() != null) { bool.filter(termQuery(“d”, info.D())); }


#### Updaropertyasticseareference/.html)Allocation

A shard is not free. Remember:

A shard is a Lucene index under the covers, which uses file handles, memory, and CPU cycles.
Every search request needs to hit a copy of every shard in the index. That’s fine if every shard is sitting on a different node, but not if many shas the feature of `update_by_query`. Although this feature is not as powerful as it in SQL world (it can only search and update same index), this features is suitable for your use case. Furthermore, it can also be used to [pick up new propertyhtpeas/guide/en/elhtpeasoueeasticsearch/reference/current/docs-update-by-query.html#picking-up-a-new-property)"

### Ref

 - [Relations in EShtpeas/guide/en/elasticsearch/guide/master/relations.html)
 - [Multi-fields in ES: index a field in different wayrds have to compete for the same resources.
Term statistics, used to calculate relevance, are per shard. Having a small amount of data in many shards leads to poor relevance.

### Opt Index Rating

- [Tune For Indexing Speed](https://www.elastic.cooueeasticseacreereerrnmufieldsmaster/tune-for-indexintg-spedtl

> Written with [StackEdit](https://stackedit.io/).

  1. bool query is a compound query syntax of Elasticsearch, details can be found here. ↩︎

评论

此博客中的热门博文

Spring Boot: Customize Environment

Spring Boot: Customize Environment Environment variable is a very commonly used feature in daily programming: used in init script used in startup configuration used by logging etc In Spring Boot, all environment variables are a part of properties in Spring context and managed by Environment abstraction. Because Spring Boot can handle the parse of configuration files, when we want to implement a project which uses yml file as a separate config file, we choose the Spring Boot. The following is the problems we met when we implementing the parse of yml file and it is recorded for future reader. Bind to Class Property values can be injected directly into your beans using the @Value annotation, accessed via Spring’s Environment abstraction or bound to structured objects via @ConfigurationProperties. As the document says, there exists three ways to access properties in *.properties or *.yml : @Value : access single value Environment : can access multi

Elasticsearch: Join and SubQuery

Elasticsearch: Join and SubQuery Tony was bothered by the recent change of search engine requirement: they want the functionality of SQL-like join in Elasticsearch! “They are crazy! How can they think like that. Didn’t they understand that Elasticsearch is kind-of NoSQL 1 in which every index should be independent and self-contained? In this way, every index can work independently and scale as they like without considering other indexes, so the performance can boost. Following this design principle, Elasticsearch has little related supports.” Tony thought, after listening their requirements. Leader notice tony’s unwillingness and said, “Maybe it is hard to do, but the requirement is reasonable. We need to search person by his friends, didn’t we? What’s more, the harder to implement, the more you can learn from it, right?” Tony thought leader’s word does make sense so he set out to do the related implementations Application-Side Join “The first implementation

Implement isdigit

It is seems very easy to implement c library function isdigit , but for a library code, performance is very important. So we will try to implement it and make it faster. Function So, first we make it right. int isdigit ( char c) { return c >= '0' && c <= '9' ; } Improvements One – Macro When it comes to performance for c code, macro can always be tried. #define isdigit (c) c >= '0' && c <= '9' Two – Table Upper version use two comparison and one logical operation, but we can do better with more space: # define isdigit(c) table[c] This works and faster, but somewhat wasteful. We need only one bit to represent true or false, but we use a int. So what to do? There are many similar functions like isalpha(), isupper ... in c header file, so we can combine them into one int and get result by table[c]&SOME_BIT , which is what source do. Source code of ctype.h : # define _ISbit(bit) (1 << (