Database Research & Development

  • Home
  • NoSQL
    • NoSQL
    • Cassandra
  • Databases
    • Database Theory
    • Database Designing
    • SQL Server Coding Standards
    • SQL Server
    • PostgreSQL
    • MySQL
    • Greenplum
    • Linux
  • Interviews
    • SQL Server Interviews
    • MySQL Interviews
    • SQL Puzzles
  • DBA Scripts
    • SQL Server DBA Scripts
    • PostgreSQL DBA Scripts
    • MySQL DBA Scripts
    • Greenplum DBA Scripts
  • Home
  • Blog Archives !
  • (: Laugh@dbrnd :)
  • Contact Me !
sqlserverinterviews
Home 2018 April PostgreSQL: Improve Query Performance by Extended Statistics

PostgreSQL: Improve Query Performance by Extended Statistics

This article is half-done without your Comment! *** Please share your thoughts via Comment ***

In this post, I am sharing an important demonstration on how to improve query performance by adding Extended Statistics in PostgreSQL.

PostgreSQL: Important Statistics Table, Used by the Query Planner

The database statistics are always important in any database system. Because query optimizer is creating Query Execution Plan basis on stored database statistics.

Understand below official note which copied from here.

It is common to see slow queries running bad execution plans because multiple columns used in the query clauses are correlated.

The planner normally assumes that multiple conditions are independent of each other, an assumption that does not hold when column values are correlated. Regular statistics, because of their per-individual-column nature, cannot capture any knowledge about cross-column correlation.

Because the number of possible column combinations is very large, it’s impractical to compute multivariate statistics automatically. Instead, extended statistics objects, more often called just statistics objects.

Let me explain, by my way,

For example, you give two filters in a where condition like zip code and city_name.
Now, zip code is unique for dataset, but one city has multiple zip codes.
Now, this both conditions and columns called correlated columns.

To inform the planner about functional dependencies, ANALYZE can collect measurements of cross-column dependency.
It is advisable to create dependencies statistics only for column groups that are strongly correlated, to avoid unnecessary overhead in both ANALYZE and later query planning.

Check the below demonstration:

Create a sample table:

1
2
3
4
5
6
CREATE TABLE tbl_TestStats
(
ID SERIAL
,CodeDate TIMESTAMPTZ
,CodeName TEXT
);

Two sample insert for different series:
For example, 2016 year for code ‘abc’ and 2017 year for code ‘xyz’

1
2
3
4
5
6
7
8
9
10
INSERT INTO tbl_TestStats
(CodeDate, CodeName)
SELECT x, 'abc'
FROM generate_series('2016-01-01 00:00:00'::timestamptz, '2016-12-31 00:00:00'::timestamptz,'10 seconds'::interval) a(x);
 
 
INSERT INTO tbl_TestStats
(CodeDate, CodeName)
SELECT x, 'xyz'
FROM generate_series('2017-01-01 00:00:00'::timestamptz, '2017-12-31 00:00:00'::timestamptz,'10 seconds'::interval) a(x);

Total inserted records:

1
2
SELECT COUNT(1) FROM tbl_TestStats
-- '6298562'

Now, Execute ANALYZE:

1
ANALYZE tbl_TestStats;

Check the execution plan of the below query:

1
2
3
4
5
6
7
8
9
10
11
12
13
EXPLAIN ANALYZE
SELECT *FROM tbl_TestStats
WHERE CodeDate = '2016-12-03'
AND CodeName = 'abc'
'Gather (cost=1000.00..80484.88 rows=1 width=16) (actual time=347.355..347.496 rows=1 loops=1)'
' Workers Planned: 2'
' Workers Launched: 2'
' -> Parallel Seq Scan on tbl_teststats (cost=0.00..79484.77 rows=1 width=16) (actual time=278.702..340.557 rows=0 loops=3)'
' Filter: ((codedate = '2016-12-03 00:00:00+05:30'::timestamp with time zone) AND (codename = 'abc'::text))'
' Rows Removed by Filter: 2099520'
'Planning time: 0.168 ms'
'Execution time: 353.298 ms'

Now, Create an Extended Statistics for correlated columns:

1
2
CREATE STATISTICS Ext_Stat_tbl_TestStats (dependencies) ON CodeDate, CodeName
FROM tbl_TestStats;

Now, Execute ANALYZE:

1
ANALYZE tbl_TestStats;

Check execution plan for the same query:
Comparing to first result, you can find slightly performance improvement in the second result.

1
2
3
4
5
6
7
8
9
10
11
12
13
EXPLAIN ANALYZE
SELECT *FROM tbl_TestStats
WHERE CodeDate = '2016-12-03'
AND CodeName = 'abc'
'Gather (cost=1000.00..80484.86 rows=1 width=16) (actual time=150.732..311.567 rows=1 loops=1)'
' Workers Planned: 2'
' Workers Launched: 2'
' -> Parallel Seq Scan on tbl_teststats (cost=0.00..79484.76 rows=1 width=16) (actual time=251.261..304.857 rows=0 loops=3)'
' Filter: ((codedate = '2016-12-03 00:00:00+05:30'::timestamp with time zone) AND (codename = 'abc'::text))'
' Rows Removed by Filter: 2099520'
'Planning time: 0.167 ms'
'Execution time: 316.685 ms'

Apr 3, 2018Anvesh Patel
PostgreSQL: How to switch the Database Connection?PostgreSQL: How to drop all Tables?

Leave a Reply Cancel reply

CAPTCHA
Refresh

*

Anvesh Patel
Anvesh Patel

Database Engineer

ImageApril 3, 2018 PostgreSQLAnvesh Patel, correlated column, database, database research and development, Database Statistics, dbrnd, extended statistics, plpgsql, Postgres Query, postgresql, PostgreSQL Administrator, PostgreSQL Error, PostgreSQL Monitoring, PostgreSQL Performance Tuning, PostgreSQL Programming, PostgreSQL Tips and Tricks, query performance, statistic object
About Me!

I'm Anvesh Patel, a Database Engineer certified by Oracle and IBM. I'm working as a Database Architect, Database Optimizer, Database Administrator, Database Developer. Providing the best articles and solutions for different problems in the best manner through my blogs is my passion. I have more than six years of experience with various RDBMS products like MSSQL Server, PostgreSQL, MySQL, Greenplum and currently learning and doing research on BIGData and NoSQL technology. -- Hyderabad, India.

About DBRND !

dbrnd

This is a personal blog (www.dbrnd.com).

Any views or opinions represented in this blog are personal and belong solely to the blog owner and do not represent those of people, institutions or organizations that the owner may or may not be associated with in professional or personal capacity, unless explicitly stated.

Feel free to challenge me, disagree with me, or tell me I’m completely nuts in the comments section of each blog entry, but I reserve the right to delete any comment for any reason whatsoever (abusive, profane, rude, or anonymous comments) - so keep it polite.

The content of this website is protected by copyright. No portion of this website may be copied or replicated in any form without the written consent of the website owner.

Recent Comments !
  • Anvesh Patel { Sure will do... } – May 27, 12:43 PM
  • Anvesh Patel { Great... } – May 27, 12:41 PM
  • Anvesh Patel { Great... } – May 27, 12:39 PM
  • Anvesh Patel { Great... } – May 27, 12:36 PM
  • Anvesh Patel { Great... } – May 27, 12:28 PM
  • Anvesh Patel { Great... } – May 27, 12:27 PM
  • Anvesh Patel { Great... } – May 27, 12:16 PM
  • Older »
Follow Me !
  • facebook
  • linkedin
  • twitter
  • youtube
  • google
  • flickr
© 2015 – 2019 All rights reserved. Database Research & Development (dbrnd.com)
Posting....