PostgreSQL: Optimized way to populate new Column (UUID) in a Large Table

The default nature of PostgreSQL is based on MVCC architecture which managed by different type of ROW versions.
You can read more about MVCC,

What is Multi Version Concurrency Control (MVCC)

Why I have started this article with MVCC because whenever we update data into the table, it creates a new record and marked the old record as an old version or dead tuple.

You can visit UUID related article here,

Database Theory: The truth about Universally Unique Identifier – UUID

PostgreSQL: Working with Universally Unique Identifier – UUID Data type

My problem:

One of our table size is more than 30gb and I got one requirement to add one new column with default UUID data. This table is already loaded by bulk insertion and exclusive lock never work in this situation.

While adding a column with default value requires PostgreSQL to make new versions of all rows, rewritten the entire table – indexes and store them on the disk and during that time Table will be locked.

The Solution should be like:

This task is required to perform on Production report server so it is very much important to take care about everything like: no exclusive lock, within given time frame limit, data should not be fragmented.
I have tried different approaches which are, I am going to share in this article.

Option 1:

Add new column with NULL value, do not assign any default value in ALTER statement.

After adding a new column, we should run different small batches to update the newly created column.

Find out data fragmentation, dead tuples and try to optimize and remove using VACUUM – ANALYZE.

Option 2:

CREATE one new blank table with require all old and new columns.
Create different small batches and move data from Old table to the newly created table.

While transferring the data, populate default value of the new column. (In my case, I have generated UUID on the fly)
ADD all require constraint and indexes on the newly created table.
Stop all the running data insertion queues for a while, drop the old table and rename a new table to require an original table name.

Yet, I am not 100% sure about these approaches, but I have done my task without any locking issue and completed within given time frame.

If you have any other optimized way, Please share your ideas.

Anvesh Patel

Leave a Reply

Be the First to Comment!

Notify of