PostgreSQL: CLUSTER – Improve Index Performance (No default cluster index)

I have received a few emails related to PostgreSQL default cluster index and I have received from the developers of Microsoft SQL Server.
If you are Microsoft SQL Server developer, you know that Primary key of the Table is working as Cluster Index and resides in the same data page where data actually stored.

In PostgreSQL, There is no concept like: Table Primary Key means default Cluster Index of that table.
In PostgreSQL, we have one CLUSTER command which is similar to Cluster Index.

Once you create your table primary key or any other Index, you can execute the CLUSTER command by specifying that Index name to achieve the physical order of the Table Data.

CLUSTER instructs PostgreSQL to cluster the table specified by table_name based on the index specified by index_name. The index must already have been defined on table_name.

A clustering on an index forces the physical ordering of the data and you can have only one clustered index per table and should carefully pick which index you will use to cluster.

When a table is clustered, it is physically reordered based on the index information. Clustering is a one-time operation: when the table is subsequently updated, the changes are not clustered. That is, no attempt is made to store new or updated rows according to their index order.

If you wish, you can periodically recluster by issuing the CLUSTER command again and again.

A executing of CLUSTER command requires an ACCESS EXCLUSIVE lock on the table because It reorders your data. You can also get some free space by issuing CLUSTER because It rearranges the data in a particular order.

When you execute the CLUSTER command, It creates a temporary copy of the table therefore, you need free space on disk at least equal to the original table size.

When an index scan is used, a temporary copy of the table is created that contains the table data in the index order. Temporary copies of each index on the table are created as well. Therefore, you need free space on disk at least equal to the sum of the table size and the index sizes.

It is advisable to set maintenance_work_mem to a reasonably large value before clustering.

Once you are done with your CLUSTER, you should execute the ANALYZE command on that table because planner records statistics about the ordering of tables.

Syntax of Cluster:
First time you must execute CLUSTER using the Index Name.

Cluster the table:
Once you have executed CLUSTER with Index, next time you should execute only CLUSTER TABLE because It knows that which index already defined as CLUSTER.

Cluster all tables of database:


Please share your ideas and opinions about this topic with me, your contribution will add true value to this topic.
If anyone has doubts on this topic then please do let me know by leaving comments or send me an email.

If you like this post, then please share it with others.
Please follow dbrnd.com, I will share my experience towards the success of Database Research and Development Activity.

I put up a post every day, please keep reading and learning.
Discover Yourself, Happy Blogging !
Anvesh M. Patel.

More from dbrnd.com