There is an alternative
VACUUM FULL which rewrites all the data from a table into a “new” location on disk and only copies the required data (excluding dead tuples). This allows PostgreSQL to fully clear the space the table occupied. One of the significant drawbacks from
VACUUM FULL is that it’s a slow operation that blocks other operations on the table while it’s working. If you’ve got a large table, this could mean a
VACUUM FULL operation might block other user’s or application’s queries. In a local setting, this may seem trivial, but for production databases, preventing reads and writes on a table for even a few seconds can have lasting effects.
Although a plain
VACUUM won’t reduce table size, plain
VACUUM is designed to be able to run in parallel with normal reading and writing of the table. Let’s work through an example to demonstrate how
VACUUM FULL can aggressively reduce table size.
We have a table (
mock.rand) with 12 million rows of randomly generated values that takes up about 500MB on disk.
SELECT pg_size_pretty( pg_total_relation_size('mock.rand') ) as total_size;
After running a large
UPDATE on the table, re-running the query from above now gives a new total relation size of 559MB. Additionally, a query to
pg_catalog.pg_stat_all_tables shows 1.2M dead tuples.
select relname, n_dead_tup , last_vacuum from pg_catalog.pg_stat_all_tables where relname = 'rand';
VACUUM on this table takes just about 1.5 seconds. Recall from the previous exercises that not much (if any) space will be returned to disk (i.e.
pg_total_relation_size() will still give 559MB), but the table’s dead tuples will be marked for reuse.
Instead, if we use
VACUUM FULL, PostgreSQL takes about 15s to scan the table and fully clear unneeded space. Running
pg_total_relation_size() on the table now shows that the table size has been reduced back to its original size, 507MB.
While this was effective,
VACUUM FULL is quite a heavy operation that should be used sparingly. The best strategy when designing a database maintenance plan is to make sure that
VACUUM runs frequently and autovacuum is enabled. These measures will ensure that table sizes are relatively stable over time.
We have the following table that’s got many dead tuples:
SELECT pg_size_pretty( pg_total_relation_size('mock.orders') ) as total_size;
pg_catalog.pg_stat_all_tables using the query below. Based on the columns
n_dead_tup, what percentage of the table’s data is dead tuples?
select relname, n_live_tup, n_dead_tup , last_vacuum from pg_catalog.pg_stat_all_tables where relname = 'orders';
VACUUM FULL on the table.
Check the total relation size of the table, does the size reduction is size after
VACUUM FULL match your estimation from part 1?