In last blog, we have introduced some problems about Elasticsearch basic concepts confusions and some config problems we met. Now, we come to the problems came cross when using Elasticsearch in application with the help of Spring Data Elasticsearch.
With Spring
After the understanding of Elasticsearch and configuration of server, we need to write code to interact with it. We choose the Spring Data Elasticsearch framework to assist our implementations. So the following is the problem we met when using Spring to access Elasticsearch.
- spring-boot-starter-data-elasticsearch: 1.5.3-RELEASE
- Elasticsearch server: 2.4.x
Connection
Clients
When using Java to access Elasticsearch, we have two types of clients to choose to communicate with server:
- Transport Client: this client won’t be part of cluster, It just communicate with server
- Node Client: this client will be part of cluster – store data shards and respond search request
In our cases, we just want to communicate with a dedicated cluster of Elasticsearch, so we choose the Transport Client.
In order to config a Transport Client, we can do it in Java code:
@Bean
public Client elasticClient() {
Settings settings = Settings.builder().put(ClusterName.SETTING, "demo").build();
return TransportClient.builder().settings(settings).build()
.addTransportAddress(new InetSocketTransportAddress(new InetSocketAddress("xxx", 9300)));
}
or even cleaner, using Spring Boot’s property file:
spring.data.elasticsearch.cluster-name=demo
# the following determine the client type to be transport, rather than node
spring.data.elasticsearch.cluster-nodes=xxx:9300
Index Definition
Dup id
If we just mark our id field with @Id
, we will have a dup id in _source
:
"_index" : "file-8947",
"_type" : "file",
"_id" : "7685",
"_source" : {
"id" : "7685",
"name" : "directory3",
"uploadRoleId" : "4353",
"type" : 1
}
and because Spring Data use the Jackson to transform object to JSON:
indexRequestBuilder.setSource(resultsMapper.getEntityMapper().mapToString(query.getObject()));
we can mark our id with @JsonIgnore
to remove the field in source.
DateFormat
When define Field
, we will notice that there is DateFormat
to fill. This represent the format we want to use to interpret the JSON we send to Elasticsearch.
In JSON documents, dates are represented as strings. Elasticsearch uses a set of the formats to recognize and parse these strings into a long value representing milliseconds-since-the-epoch in UTC.
FieldIndex
: no
vs not_analyzed
FieldIndex can be used to specify how Elastcisearch will handle this field:
analyzed
: it is a string type and will be analyzed by analyzer;not_analyzed
: it is a string, but no need to be analyzed. It will be stored as exact value;no
: not index this field, i.e. not searchable byfilter
orquery
etc
In the latest Spring Data Elasticsearch build, this element has been replaced by a boolean to represent whether to index and types to specify analyzed string and exact value string.
Type Auto Detection Failure
In our project, if we don’t specify the type but with index type specified, like the following shows:
@Field(index = FieldIndex.not_analyzed)
private String modifier;
Spring will log a exception message:
AbstractElasticsearchRepository : failed to load elasticsearch nodes : org.elasticsearch.index.mapper.MapperParsingException: No type specified for field [modifier]
Repo Definition
Class Cast When Using Slice
When use Slice
as the return value of our query as the Spring Data document suggests, Spring will complain the class cast exception.
After reading the source code, we find that Spring not take this query as a paged query:
public final boolean isPageQuery() {
return org.springframework.util.ClassUtils.isAssignable(Page.class, this.unwrappedReturnType);
}
Because Slice
is the super type of Page
, it isn’t assignable to Page.class
. As a result, it is implemented using queryForObject
which only return one result and causes the exception.
findBy
vs findAllBy
Spring seems not making differences of the following two kinds of notions:
List<Announcement> findByTitle(String title);
List<Announcement> findAllByTitle(String title);
They will produce exact same JSON bodies to send.
Find by _all
If we want to match content of _all
meta field, we can’t write repo method like a common field, because Spring can’t find that field in our Document
class.
We can do like following as a workaround:
@Query("{\"bool\" : {\"must\" : [ {\"match\" : {\"?0\" : \"?1\"}} ]}}")
Page<MyDoc> getbyAll(String field, String query, Pageable pageable);
@Query
The content of “query” annotation includes brace:
{"bool" : {"should" : [ {"match" : {"?0" : "?1"}} ]}}"
Otherwise, if we miss a {
like following:
@Query(" \"multi_match\": {\n" +
" \"query\": \"?0\",\n" +
" \"fields\": [ \"name^2\", \"path\" ]\n" +
" }" +
"}")
Page<Affair> findByNameOrPath(String info, Pageable pageable);
It will miss intercept the query and simple query becomes a strange query_binary
nested: SearchParseException[failed to parse search source [{"from":0,"size":10,"query_binary":"..."}]]
Add Implementation in Repo
Spring Data Elasticsearch is very convenient way to implement many simple queries, in which it will auto generate query by method name and parameter:
Page<RolePO> findByTaskIdAndTitle(Long taskId, String title, Pageable pageable);
Or by specify string query:
@Query("{\"bool\" : {\"should\" : [ {\"match\" : {\"?0\" : \"?1\"}} ]}}")
Page<AnnouncementPO> findByAll(String field, String info, Pageable pageable);
But sometimes, we need more complex query and the functionality of Spring Data Elasticsearch at the same time. We can add some method by:
interface UserRepositoryCustom {
public void someCustomMethod(User user);
}
@Component
class UserRepositoryImpl implements UserRepositoryCustom {
public void someCustomMethod(User user) {
}
}
interface UserRepository extends CrudRepository<User, Long>, UserRepositoryCustom {
// Declare query methods here
}
Two points to notice:
- the
Impl
postfix of the name on custom repo compared to the core repository interface; @Component
to let implementation to be found by Spring;
Searching
Nested Class Searching
Saying we have Tag
as the nested object in class A, If we want to searching tag to find outer class A, we have to add toString()
method in Tag
like following:
@Field(type = FieldType.Nested)
private List<Tag> tags;
public class Tag {
@Field(type = FieldType.String, index = FieldIndex.not_analyzed)
private String des;
// **have to add toString()**
}
Otherwise, Spring will fail to convert the query:
"query_string" : {
"query" : "com.superid.query.Tag@5d1d9d73",
"fields" : [ "tags" ]
}
It is because Spring use the toString()
to construct the json query:
CriteriaQueryProcessor#processCriteriaEntry(..)
private QueryBuilder processCriteriaEntry(Criteria.CriteriaEntry entry,/* OperationKey key, Object value,*/ String fieldName) {
Object value = entry.getValue();
if (value == null) {
return null;
}
OperationKey key = entry.getKey();
QueryBuilder query = null;
String searchText = StringUtils.toString(value);
//...
}
Page Count From 0 or 1?
When we use Page
as the return value of our query, we should send a Pagable
parameter to specify the page. What should be noticed is the count of page starts from 0.
Completion
In this blog, we have introduce how to do auto completion in pure Elasticsearch. Here, we focus on how to do with Spring Data Elasticsearch.
Mapping
In order to use auto complete features, we can use json file or @CompletionField
to define define
Concise way of using annotation:
@CompletionField()
private Completion suggest;
Or more powerful but tedious way:
{
"file" : {
"properties" : {
"title" : { "type" : "string" },
"suggest" : { "type" : "completion",
"analyzer" : "simple",
"search_analyzer" : "simple"
}
}
}
}
// Then we can refer to the mapping by `@Mapping`:
@Setting(settingPath = "elasticsearch-settings.json")
@Document(indexName = "file", type = "file", shards = 1, replicas = 0, createIndex = true, refreshInterval = "-1")
@Mapping(mappingPath = "/mappings/file-mapping.json")
public class file {...}
Index
We can index as our common entity:
esTemplate.save(new File(...));
Query
The ElasticsearchTemplate has the method for query suggest:
public SuggestResponse suggest(SuggestBuilder.SuggestionBuilder<?> suggestion, String... indices);
Dynamic Index Creation/Using
Elasticsearch recommend the use of rolling index, which can be used to scale our application. Spring Data Elasticsearch now don’t have the direct support.
But we can have workaround using Spring EL.
First, we define a bean to used as suffix of index:
@Bean
public Suffix suffix(){
return suffix;
}
Then, we can use Spring Expression Language to define our index name:
@Document(indexName = "role_#{suffix.toString()}", type = "role")
public class Role {}
Now, we can change the suffix to access different index:
suffix.setSuffix("123");
roleRepo.save(new Role("7", "后端开发", false, 2L, taskId));
suffix.setSuffix("234");
roleRepo.save(new Role("3", "前端架构", false, 2L, taskId));
A Suffix
example can be found here.
Furthermore, I have already submit a pull request of this kind of utility class to assist rolling index.
Search Across Index
If we have to search across multiple index, Spring can’t generate method for us. We have to write code manually:
SearchQuery searchQuery = new NativeSearchQueryBuilder()
.withQuery(matchQuery("title", query))
.withIndices("role_*", "-role_xxx")
.build();
Partial Update
Sometimes, we want to do partial update:
POST /website/blog/1/_update
{
"doc" : {
"tags" : [ "testing" ],
"views": 0
}
}
IndexRequest indexRequest = new IndexRequest();
indexRequest.source("name", file.getName());
UpdateQuery updateQuery = new UpdateQueryBuilder().withId(file.getId())
// class is used to get index and type
.withClass(FilePO.class)
// indexRequest will be used as 'doc'
.withIndexRequest(indexRequest).build();
template.update(updateQuery);
Debug
In this section, we are going to introduce some necessary utilities to debug Elasticsearch.
Explain
curl -XGET 'localhost:9200/_search?pretty' -H 'Content-Type: application/json' -d'
{
"explain": true,
"query" : {
"term" : { "user" : "kimchy" }
}
}'
Log
- Setting logging level to
DEBUG
or lower, which will let Spring print more info and stacktrace when exception occurs. - Using the Node Client of Elasticsearch will report the error of Elasticsearch’s internal error through stacktrace.
Samples
If we can’t find how a functionality is achieve, we can find samples in following places:
- Sample project
- Test cases in repos
Ref
Written with StackEdit.
评论
发表评论