I was assigned a task to do performance tuning of a pagination service method and I collect some performance statistic from both online and offline. The main aim to change is two kinds of code:
- invoked too many times
- cost too much time per call
I have made some minor code change to avoid call some functions repeatedly, and the next goal is to handle a very slow database query.
Such slow sql
When implementing pagination of a table query, we use the following two sql statement to accomplish it.
select count(id) from table where isDeleted = 0;
select id from table where isDeleted = 0 limit M, N;
But the this solution is very slow: for a table of 600,000 rows, this sql and rpc call cost 15s on average.
After profiling the application, I find the sql cost the most of time. To optimize the situation, I come up with the following way.
New pagination way
When it comes up to pagination, there actually have two types:
- user interface used: show count of all item and a page content
- back end: get the full data from this table
In the second case, we can actually avoid count all item, which may change during the time of executing limit.
We can judge when to stop query page by check whether this page size is smaller than we requested.
Page getPage(int M, int N) {
select id from table where isDeleted = 0 limit M, N;
}
do {
Page p = getPage(M, N);
// handle this page
} while (p.size == N);
This can almost reduce the half of running time, but have to push the user to change this.
Limit opt
When we use the limit of mysql, we should notice the following point.
Skip longer?
A quiz: for the following sql statement, will they have same execution time?
select id from table where isDeleted = 0 limit 0, 10;
select id from table where isDeleted = 0 limit 10000, 10;
The answer is NO for most cases. For some cases, the second will far more slower than first, for the second sql actually need to find 10010 rows out and then return the last ten.
Covering index scan vs Full index scan vs Full table scan
If our query is a full index scan and can we make it better? Yes, in some cases, according to this blog, selectivity can make full index scan slower than a full table scan. For my table, isDeleted will be false for most row and the selectivity can be very high. So ignore index and try a full table scan may increase performance.
So in order to make limit faster, what can we do?
First we have to understand why it is so slow. The execution plan tell us:
type: index
Extra: Using index
This plan show it is a covering index scan which can just read index content and return(no need to access table because a index including id
and isDeleted
). So it seems no better solution for this query.
Cache opt
Use redis memory cache to store the id will increase the speed a lot, but we have to handle the replication between database and cache. This can be done in aop way which will not affect original code a lot, but increase the complexity. This can be a design decision for you to make.
High level: design better
Use incremental change.
Every time my table changed, we send the change to the client who request them rather then requesting full data at pre-defined interval. In this way, the pressure of data transmision become neglectable.
More
Late row lookup
As this answer explains, mysql can’t perform late row look up which will slow down some kind of pagination a lot.
And more to read about this topic:
- Document: limit optmization
- late row look up
- late row look up in inno db
- full table scan vs full index scan
Written with StackEdit.
评论
发表评论