跳至主要内容

Mysql pagination performance

I was assigned a task to do performance tuning of a pagination service method and I collect some performance statistic from both online and offline. The main aim to change is two kinds of code:

  • invoked too many times
  • cost too much time per call

I have made some minor code change to avoid call some functions repeatedly, and the next goal is to handle a very slow database query.

Such slow sql

When implementing pagination of a table query, we use the following two sql statement to accomplish it.

select count(id) from table where isDeleted = 0;
select id from table where isDeleted = 0 limit M, N;

But the this solution is very slow: for a table of 600,000 rows, this sql and rpc call cost 15s on average.
After profiling the application, I find the sql cost the most of time. To optimize the situation, I come up with the following way.

New pagination way

When it comes up to pagination, there actually have two types:

  • user interface used: show count of all item and a page content
  • back end: get the full data from this table

In the second case, we can actually avoid count all item, which may change during the time of executing limit.
We can judge when to stop query page by check whether this page size is smaller than we requested.

Page getPage(int M, int N) {
    select id from table where isDeleted = 0 limit M, N;
}

do {
    Page p = getPage(M, N);
    // handle this page
} while (p.size == N);

This can almost reduce the half of running time, but have to push the user to change this.

Limit opt

When we use the limit of mysql, we should notice the following point.

Skip longer?

A quiz: for the following sql statement, will they have same execution time?

select id from table where isDeleted = 0 limit 0, 10;
select id from table where isDeleted = 0 limit 10000, 10;

The answer is NO for most cases. For some cases, the second will far more slower than first, for the second sql actually need to find 10010 rows out and then return the last ten.

Covering index scan vs Full index scan vs Full table scan

If our query is a full index scan and can we make it better? Yes, in some cases, according to this blog, selectivity can make full index scan slower than a full table scan. For my table, isDeleted will be false for most row and the selectivity can be very high. So ignore index and try a full table scan may increase performance.

So in order to make limit faster, what can we do?
First we have to understand why it is so slow. The execution plan tell us:

type: index
Extra: Using index

This plan show it is a covering index scan which can just read index content and return(no need to access table because a index including id and isDeleted). So it seems no better solution for this query.

Cache opt

Use redis memory cache to store the id will increase the speed a lot, but we have to handle the replication between database and cache. This can be done in aop way which will not affect original code a lot, but increase the complexity. This can be a design decision for you to make.

High level: design better

Use incremental change.
Every time my table changed, we send the change to the client who request them rather then requesting full data at pre-defined interval. In this way, the pressure of data transmision become neglectable.

More

Late row lookup

As this answer explains, mysql can’t perform late row look up which will slow down some kind of pagination a lot.
And more to read about this topic:

Written with StackEdit.

评论

此博客中的热门博文

Spring Boot: Customize Environment

Spring Boot: Customize Environment Environment variable is a very commonly used feature in daily programming: used in init script used in startup configuration used by logging etc In Spring Boot, all environment variables are a part of properties in Spring context and managed by Environment abstraction. Because Spring Boot can handle the parse of configuration files, when we want to implement a project which uses yml file as a separate config file, we choose the Spring Boot. The following is the problems we met when we implementing the parse of yml file and it is recorded for future reader. Bind to Class Property values can be injected directly into your beans using the @Value annotation, accessed via Spring’s Environment abstraction or bound to structured objects via @ConfigurationProperties. As the document says, there exists three ways to access properties in *.properties or *.yml : @Value : access single value Environment : can access multi

Elasticsearch: Join and SubQuery

Elasticsearch: Join and SubQuery Tony was bothered by the recent change of search engine requirement: they want the functionality of SQL-like join in Elasticsearch! “They are crazy! How can they think like that. Didn’t they understand that Elasticsearch is kind-of NoSQL 1 in which every index should be independent and self-contained? In this way, every index can work independently and scale as they like without considering other indexes, so the performance can boost. Following this design principle, Elasticsearch has little related supports.” Tony thought, after listening their requirements. Leader notice tony’s unwillingness and said, “Maybe it is hard to do, but the requirement is reasonable. We need to search person by his friends, didn’t we? What’s more, the harder to implement, the more you can learn from it, right?” Tony thought leader’s word does make sense so he set out to do the related implementations Application-Side Join “The first implementation

Implement isdigit

It is seems very easy to implement c library function isdigit , but for a library code, performance is very important. So we will try to implement it and make it faster. Function So, first we make it right. int isdigit ( char c) { return c >= '0' && c <= '9' ; } Improvements One – Macro When it comes to performance for c code, macro can always be tried. #define isdigit (c) c >= '0' && c <= '9' Two – Table Upper version use two comparison and one logical operation, but we can do better with more space: # define isdigit(c) table[c] This works and faster, but somewhat wasteful. We need only one bit to represent true or false, but we use a int. So what to do? There are many similar functions like isalpha(), isupper ... in c header file, so we can combine them into one int and get result by table[c]&SOME_BIT , which is what source do. Source code of ctype.h : # define _ISbit(bit) (1 << (