Database Research & Development

  • Home
  • NoSQL
    • NoSQL
    • Cassandra
  • Databases
    • Database Theory
    • Database Designing
    • SQL Server Coding Standards
    • SQL Server
    • PostgreSQL
    • MySQL
    • Greenplum
    • Linux
  • Interviews
    • SQL Server Interviews
    • MySQL Interviews
    • SQL Puzzles
  • DBA Scripts
    • SQL Server DBA Scripts
    • PostgreSQL DBA Scripts
    • MySQL DBA Scripts
    • Greenplum DBA Scripts
  • Home
  • Blog Archives !
  • (: Laugh@dbrnd :)
  • Contact Me !
sqlserverinterviews
Home 2016 August PostgreSQL: Optimized way to populate new Column (UUID) in a Large Table

PostgreSQL: Optimized way to populate new Column (UUID) in a Large Table

This article is half-done without your Comment! *** Please share your thoughts via Comment ***

The default nature of PostgreSQL is based on MVCC architecture which managed by different type of ROW versions.
You can read more about MVCC,

What is Multi Version Concurrency Control (MVCC)

Why I have started this article with MVCC because whenever we update data into the table, it creates a new record and marked the old record as an old version or dead tuple.

You can visit UUID related article here,

Database Theory: The truth about Universally Unique Identifier – UUID

PostgreSQL: Working with Universally Unique Identifier – UUID Data type

My problem:

One of our table size is more than 30gb and I got one requirement to add one new column with default UUID data. This table is already loaded by bulk insertion and exclusive lock never work in this situation.

While adding a column with default value requires PostgreSQL to make new versions of all rows, rewritten the entire table – indexes and store them on the disk and during that time Table will be locked.

The Solution should be like:

This task is required to perform on Production report server so it is very much important to take care about everything like: no exclusive lock, within given time frame limit, data should not be fragmented.
I have tried different approaches which are, I am going to share in this article.

Option 1:

Add new column with NULL value, do not assign any default value in ALTER statement.

After adding a new column, we should run different small batches to update the newly created column.

Find out data fragmentation, dead tuples and try to optimize and remove using VACUUM – ANALYZE.

Option 2:

CREATE one new blank table with require all old and new columns.
Create different small batches and move data from Old table to the newly created table.

While transferring the data, populate default value of the new column. (In my case, I have generated UUID on the fly)
ADD all require constraint and indexes on the newly created table.
Stop all the running data insertion queues for a while, drop the old table and rename a new table to require an original table name.

Yet, I am not 100% sure about these approaches, but I have done my task without any locking issue and completed within given time frame.

If you have any other optimized way, Please share your ideas.

Aug 21, 2016Anvesh Patel
PostgreSQL: ERROR - data type character varying has no default operator class for access method "gist"PostgreSQL: Example of Partial UNIQUE Composite Index
Anvesh Patel
Anvesh Patel

Database Engineer

August 21, 2016 PostgreSQLAnvesh Patel, database, database research and development, dbrnd, large column, plpgsql, Postgres Query, postgresql, PostgreSQL Administrator, PostgreSQL Error, PostgreSQL Programming, PostgreSQL Tips and Tricks, Universally Unique Identifier, UUID, UUID Data type
About Me!

I'm Anvesh Patel, a Database Engineer certified by Oracle and IBM. I'm working as a Database Architect, Database Optimizer, Database Administrator, Database Developer. Providing the best articles and solutions for different problems in the best manner through my blogs is my passion. I have more than six years of experience with various RDBMS products like MSSQL Server, PostgreSQL, MySQL, Greenplum and currently learning and doing research on BIGData and NoSQL technology. -- Hyderabad, India.

About DBRND !

dbrnd

This is a personal blog (www.dbrnd.com).

Any views or opinions represented in this blog are personal and belong solely to the blog owner and do not represent those of people, institutions or organizations that the owner may or may not be associated with in professional or personal capacity, unless explicitly stated.

Feel free to challenge me, disagree with me, or tell me I’m completely nuts in the comments section of each blog entry, but I reserve the right to delete any comment for any reason whatsoever (abusive, profane, rude, or anonymous comments) - so keep it polite.

The content of this website is protected by copyright. No portion of this website may be copied or replicated in any form without the written consent of the website owner.

Recent Comments !
  • Anvesh Patel { Sure will do... } – May 27, 12:43 PM
  • Anvesh Patel { Great... } – May 27, 12:41 PM
  • Anvesh Patel { Great... } – May 27, 12:39 PM
  • Anvesh Patel { Great... } – May 27, 12:36 PM
  • Anvesh Patel { Great... } – May 27, 12:28 PM
  • Anvesh Patel { Great... } – May 27, 12:27 PM
  • Anvesh Patel { Great... } – May 27, 12:16 PM
  • Older »
Follow Me !
  • facebook
  • linkedin
  • twitter
  • youtube
  • google
  • flickr
© 2015 – 2019 All rights reserved. Database Research & Development (dbrnd.com)
Posting....