Learn

There is an alternative VACUUM method, VACUUM FULL which rewrites all the data from a table into a “new” location on disk and only copies the required data (excluding dead tuples). This allows PostgreSQL to fully clear the space the table occupied. One of the significant drawbacks from VACUUM FULL is that it’s a slow operation that blocks other operations on the table while it’s working. If you’ve got a large table, this could mean a VACUUM FULL operation might block other user’s or application’s queries. In a local setting, this may seem trivial, but for production databases, preventing reads and writes on a table for even a few seconds can have lasting effects.

Although a plain VACUUM won’t reduce table size, plain VACUUM is designed to be able to run in parallel with normal reading and writing of the table. Let’s work through an example to demonstrate how VACUUM FULL can aggressively reduce table size.

We have a table (mock.rand) with 12 million rows of randomly generated values that takes up about 500MB on disk.

SELECT pg_size_pretty( pg_total_relation_size('mock.rand') ) as total_size;
total_size
507 MB

After running a large UPDATE on the table, re-running the query from above now gives a new total relation size of 559MB. Additionally, a query to pg_catalog.pg_stat_all_tables shows 1.2M dead tuples.

select relname, n_dead_tup , last_vacuum from pg_catalog.pg_stat_all_tables where relname = 'rand';
relname n_dead_tup last_vacuum
rand 1.2M NULL

Running VACUUM on this table takes just about 1.5 seconds. Recall from the previous exercises that not much (if any) space will be returned to disk (i.e. pg_total_relation_size() will still give 559MB), but the table’s dead tuples will be marked for reuse.

Instead, if we use VACUUM FULL, PostgreSQL takes about 15s to scan the table and fully clear unneeded space. Running pg_total_relation_size() on the table now shows that the table size has been reduced back to its original size, 507MB.

While this was effective, VACUUM FULL is quite a heavy operation that should be used sparingly. The best strategy when designing a database maintenance plan is to make sure that VACUUM runs frequently and autovacuum is enabled. These measures will ensure that table sizes are relatively stable over time.

Instructions

1.

We have the following table that’s got many dead tuples:

SELECT pg_size_pretty( pg_total_relation_size('mock.orders') ) as total_size;
total_size
1024 kb

Check pg_catalog.pg_stat_all_tables using the query below. Based on the columns n_live_tuples and n_dead_tup, what percentage of the table’s data is dead tuples?

select relname, n_live_tup, n_dead_tup , last_vacuum from pg_catalog.pg_stat_all_tables where relname = 'orders';
2.

Run a VACUUM FULL on the table.

3.

Check the total relation size of the table, does the size reduction is size after VACUUM FULL match your estimation from part 1?

Take this course for free

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?