PostgreSQL: Tuning Checkpoint Parameters to Increase the Performance of Bulk Data Load

The Checkpoint parameters are again playing an importance role to improve the performance of PostgreSQL.

Important Note: PostgreSQL 9.5 onwards you can use min_wal_size and max_wal_size instead of checkpoint_segments.

Previously, I have already shared important parameters for tuning the PostgreSQL.

PostgreSQL: Important Parameters for better Performance

If our PostgreSQL set up for heavy bulk operations, after setting the above memory parameters we should also look into Checkpoint related Parameters.

In this post, I am going to share basic details about the PostgreSQL Checkpoint Parameters and its important value for bulk operations.

What is a Checkpoint?

In simple word, It writes all dirty pages from memory to disk and cleans the shared_buffers area.
It does not require any system lock and make sure that data has been written to the disk.

If our Postgres server crashed, we can measure the data loss between last Checkpoint value time and PostgreSQL stopped time. We can also recover our system using this information.

Why Checkpoint is an Important parameter?

If we have 1GB of database and 15 GB of MEMORY, we can easily store our whole database into MEMORY. Which returns the best performance because It does not require any physical page swapping into the Disk.

Now let say, PostgreSQL Checkpoint is not updated since last 48 hours and we are working with only those data pages which are in MEMORY only.

Now the system crashed because of power failure and lots of work not updated into the Disk so next, when Postgres start it takes long time to load because It has to recover last 48 hours data from WAL segments.
That’s why the proper configuration of Checkpoint Parameters is very important.

What are the different Checkpoint Parameters?

checkpoint_segments (integer) : The default value is 3 and every segments has a 16 MB capacity to store log information.
checkpoint_timeout (integer) : The default is 5min and It is time between automatic WAL checkpoints. If checkpoint set for every minute ā€“ only pages from minute would have to be written to disk, 5 minutes ā€“ more pages, 1 hour ā€“ even more pages.
checkpoint_completion_target (floating point): The default is 05, and It is completion time between the Checkpoints.
checkpoint_warning (integer): In case of overlapping, It writes a message to the server.

How we can improve the performance for Bulk data operations?

We should increase the size of checkpoint_segments, checkpoint_completion_target, checkpoint_timeout.

When we increase the size of this parameter like checkpoint_timeout from 5min to 20min, It reduces the Disk I/O by writing again and again.

Whenever we have a large Shared_buffers area, we should increase the size of the Checkpoint parameters.

I suggest to apply these changes only for large bulk load data operations related system.
For regular load, default PostgreSQL setting is fine.

Example of parameters for PostgreSQL optimization:

Anvesh Patel

Leave a Reply

Be the First to Comment!

Notify of