This article is half-done without your Comment! *** Please share your thoughts via Comment ***
In this post, I am sharing an important gp_toolkit view to finding a list of a table on which VACUUM is required to be run in Greenplum Database Server.
Greenplum is based on PostgreSQL 8.2 and MVCC architecture. It generates dead tuples at every UPDATE and DELETE which is a fragmentation in the form of dead tuples which we can remove by executing a VACUUM/VACUUM FULL command.
A gp_toolkit is an administrative schema of Greenplum Database server. It provides the list of various views to monitor the Greenplum server.
A gp_bloat_diag is one of the views of a gp_toolkit schema which gives us a list of tables which may require a VACUUM to be run.
It returns the actual number of pages and the expected number of pages. If a count of an actual number of pages is greater than expected number of pages, that table requires VACUUM.
If you want to re-use the wasted space of table, execute VACUUM FULL on it which requires ACCESS EXCLUSIVE LOCK on a table because it re-writes all the data pages from the scratch.
If you execute only VACUUM, you cannot reclaims the free space of table.
You must execute ANALYZE after the execution of VACUUM/VACUUM FULL for updating database statistics.
Below is a query:
SELECT * FROM gp_toolkit.gp_bloat_diag;
After executing the above query, you can get the list of six columns. The important columns are “bdirelpages” and “bdiexppages” means real pages and expected pages.
If you find your table in the result set, your table requires VACUUM/VACUUM FULL.