If a table has maybe 5000 or 500000 or 5000000000 records and requirement is to find the total row count of this table, then most of the Database Developer always executes COUNT(*) for this row count.
I have found that many of our team members also execute COUNT(*), but just imagine that table has 5000000000 rows so your COUNT(*) will take a lot time to return a result.
The counting rows in such a big table are always created performance issue and it’s also required I/O operation.
If you require exact row count for a given time, then COUNT(*) is mandatory. But you speed up this dramatically if the count does not have to be exact. You can use a metadata table or statistical table to find the row count, which is quite same as real row count.
This is happening with PostgreSQL due to the nature of MVCC. Here, you can visit more about MVCC Architecture.
You can easily find roughly row count using statistical tables within 1 second. I have tested and compared this two result in my local machine with the row count of 5000000000.
My COUNT (*) returns a result after 8 to 10 minutes and also taken 10% to 25% CPU and Memory.
After this, I have found row count from statistical table and it’s taken not even 1 second.
But I have found little difference between both the count.The count of statistical table is higher than the actual count. (12585 row count is higher) because of MVCC.
You should configure auto-vacuum and analyze on this table.
I execute vacuum and analyze on this table and now my count is same. But any how DBA can figure this maintenance task in routine activity.
I suggest you to please use statistical table for row count.
I am providing two different scripts to find the roughly row count in PostgreSQL.
SELECT reltuples::bigint AS EstimatedCount
WHERE oid = 'public.TableName'::regclass;
,n_live_tup AS EstimatedCount
ORDER BY n_live_tup DESC;
Periodically, Please execute Vacuum – Analyze to get the exact row count.