PostgreSQL Storage Management: A Beginner’s Guide to Files, Pages, and Tuples

PostgreSQL Storage Hierarchy Visualization

Imagine you’re organizing a huge library. You have shelves (files), books (pages), and individual words in those books (tuples). Understanding how PostgreSQL organizes data is similar – and just as important for a database administrator.

In this post, I’ll walk you through PostgreSQL’s storage system using simple comparisons and real-world examples, so you can understand how your database actually stores information on disk.

What We’ll Cover

How PostgreSQL organizes data (the big picture)
Files: Where your data lives on disk
Pages: How data is grouped together
Tuples: The individual pieces of information
Why this matters for your daily work
Common problems and simple solutions

The Big Picture: How PostgreSQL Organizes Data

Let’s start with a simple analogy:

Think of PostgreSQL like an office building:

The entire building is your database cluster
Each floor is a separate database
Each room on a floor represents a table
Filing cabinets in those rooms are the files on disk
Each drawer in a cabinet is a page (or block)
The individual folders in each drawer are tuples (your rows of data)

When you run a query, PostgreSQL needs to find the right building, go to the correct floor, enter the right room, open the proper filing cabinet, pull out the right drawer, and finally grab the folders you need.

Understanding this structure helps you organize your data efficiently and troubleshoot problems when they arise.

Files: Where Your Data Lives

What Are Database Files?

Database files are the actual files on your computer’s disk that store all your PostgreSQL data. When you create a table in PostgreSQL, the system creates one or more files to hold that table’s data.

The Filing System

Let’s say you create a simple customer database:

CREATE DATABASE my_store;
CREATE TABLE customers (
    id SERIAL PRIMARY KEY,
    name TEXT,
    email TEXT
);

Behind the scenes, PostgreSQL creates:

A directory for your database (named with a number like 13267)
A file for your customers table (named with a number like 16384)
Additional helper files to track available space and other information

If you could look at your PostgreSQL data directory, you’d see something like:

PostgreSQL_data/
├── base/
│   └── 13267/                  <- Your database directory
│       ├── 16384               <- Your customers table
│       ├── 16384_fsm           <- Helps find free space in your table
│       └── 16384_vm            <- Tracks which parts of your table need cleaning
├── global/                     <- System-wide tables
└── pg_wal/                     <- Transaction logs (in case of crashes)

Real-World Example

Think of it like this: If you run a small bookstore, your filing system might include:

A cabinet for customer information
A cabinet for inventory
A cabinet for sales records

In PostgreSQL, each of these “cabinets” would be a separate file (or set of files) on disk.

Pages: The Building Blocks

What Is a Page?

A page (also called a block) is a fixed-size chunk of data, typically 8KB (8,192 bytes). Think of it as a drawer in your filing cabinet that can hold a certain amount of information.

Why Pages Matter

PostgreSQL reads and writes data one page at a time. It’s like how you might pull out an entire drawer to find a document rather than searching through the cabinet blindly.

Inside a Page

Each page has:

A header (information about what’s in the page)
Pointers (like a table of contents showing where each row is located)
Free space (empty area for new data)
The actual data rows (your information)

Here’s a simple visualization:

+------------------+
| PAGE HEADER      | <- Metadata about this page
+------------------+
| ITEM POINTERS    | <- "Table of contents" for this page
|                  |
+------------------+
| FREE SPACE       | <- Room for more data
|                  |
+------------------+
| ROW 3            | <- Your actual data
+------------------+
| ROW 2            |
+------------------+
| ROW 1            |
+------------------+

Real-World Example

Imagine each page as a form where you can record up to 20 customers (depending on how much information you store for each). If you have 100 customers, you’ll need at least 5 forms (pages).

When you want to find customer #37, PostgreSQL needs to figure out which page contains that customer, then find the specific customer within that page.

Tuples: Your Actual Data

What Is a Tuple?

A tuple is simply a row in your database table. If you have a customers table, each customer record is a tuple.

Inside a Tuple

Each tuple has:

A header (hidden information about the row)
Your actual data (the values you inserted)

The header includes important information like:

Which transaction created this row
Which transaction deleted this row (if applicable)
Information about which columns contain NULL values

Example of a Tuple

Let’s say you insert a customer:

INSERT INTO customers (name, email) 
VALUES ('Alice Smith', 'alice@example.com');

In the database, the tuple might look conceptually like this:

HEADER: [Created by transaction #1234, Not deleted, No NULLs]
DATA: [1, 'Alice Smith', 'alice@example.com']

The TOAST System (Handling Large Values)

What happens if you need to store something big, like a long document or an image?

PostgreSQL uses something called TOAST (The Oversized-Attribute Storage Technique). Think of it like this:

If you’re organizing paper files and come across a document that’s too big for your folder, you might:

Try to fold it up smaller (compression)
If it’s still too big, store it in a separate box and put a note in your folder saying “see box #5” (out-of-line storage)

This is exactly what PostgreSQL does with large values:

Try to compress the data
If still too big, store it separately and keep a reference

Real-World Example

Imagine you’re storing product descriptions. Short descriptions fit directly in your main product table. But for products with very lengthy descriptions, PostgreSQL automatically places them in a separate “storage box” (TOAST table) and leaves a small pointer in your main table.

Why This Matters for Database Administrators

Understanding the storage system helps you:

1. Estimate Storage Needs

If you know each page is 8KB and can hold roughly X number of your typical rows, you can estimate how much disk space a million rows will require.

-- Find the average row size
SELECT avg(pg_column_size(t)) FROM customers t;

-- Find how many rows are likely to fit on a page
-- (Approximately 8KB / avg_row_size)

2. Improve Performance

Knowing that PostgreSQL reads data one page at a time explains why:

Indexes are important (they help PostgreSQL find the right pages quickly)
Sequential scans (reading every page) are slow for large tables
Well-organized tables perform better

3. Understand and Fix Bloat

When you update a row, PostgreSQL usually creates a new version and eventually marks the old one as deleted. But the old version still takes up space until a VACUUM operation reclaims it.

This is like having folders with crossed-out information still taking up space in your filing cabinet. Eventually, you need to clean up – that’s what VACUUM does.

-- Basic cleanup
VACUUM customers;

-- Cleanup and update statistics
VACUUM ANALYZE customers;

4. Diagnose Space Issues

When your database is growing too fast, you can check which tables are taking up the most space:

-- See the size of your biggest tables
SELECT
    tablename,
    pg_size_pretty(pg_table_size(tablename)) AS table_size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_table_size(tablename) DESC
LIMIT 5;

Common Problems and Simple Solutions

Problem 1: Database Growing Too Large

Signs:

Disk space alerts
Slow performance
Backups taking too long

Simple Solutions:

Regular VACUUM to reclaim space from deleted rows
Archive old data you rarely need
Check for unnecessarily large columns (like storing entire files in the database)

Example:

-- Schedule regular cleanup
VACUUM ANALYZE;

-- For tables with many deletes/updates
ALTER TABLE frequently_changed_table SET (autovacuum_vacuum_scale_factor = 0.1);

Problem 2: Slow Queries on Large Tables

Signs:

Queries that were once fast become slow
CPU usage spikes during queries

Simple Solutions:

Add appropriate indexes
Break very large tables into smaller ones (partitioning)
Make sure VACUUM ANALYZE runs regularly to update statistics

Example:

-- Add an index for commonly searched columns
CREATE INDEX customers_email_idx ON customers(email);

-- Update statistics
ANALYZE customers;

Problem 3: Database Bloat

Signs:

Database size much larger than the actual data would suggest
Performance degradation over time

Simple Solutions:

Regular VACUUM
Occasional VACUUM FULL (requires downtime)
Consider the pg_repack extension for online rebuilding

Example:

-- Find tables with bloat
SELECT
    schemaname || '.' || tablename AS table_name,
    pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS size,
    pg_size_pretty((pg_stat_get_live_tuples(relid) * 
                   (SELECT avg(pg_column_size(t)) FROM tablename t))::bigint) AS estimated_actual_size
FROM pg_stat_user_tables
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC
LIMIT 10;

Visualizing the Storage Hierarchy

Let’s tie everything together with a concrete example:

Imagine you have a table with customer orders:

CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INT,
    order_date DATE,
    total_amount DECIMAL(10,2)
);

If you insert 10,000 orders:

Files: PostgreSQL might store this in a single file (e.g., 16385)
Pages: This file would be divided into multiple 8KB pages, perhaps 100-200 pages depending on the row size
Tuples: Each page contains multiple orders, perhaps 40-80 orders per page

When you run a query like:

SELECT * FROM orders WHERE order_id = 5432;

PostgreSQL:

Looks at the index to find which page contains order #5432
Reads that entire page into memory
Finds the specific tuple within the page
Returns the data to you

Understanding this process helps you see why proper indexing is crucial – it’s the difference between checking the right drawer immediately versus opening every drawer in every cabinet!

Practical Tips for Beginners

Monitor Your Database Size

   -- Check sizes of your tables
   SELECT tablename, pg_size_pretty(pg_relation_size(tablename)) 
   FROM pg_tables WHERE schemaname = 'public';

Make VACUUM Part of Your Routine

   -- Basic cleanup (can run while database is being used)
   VACUUM;

   -- More thorough cleanup (requires exclusive lock)
   VACUUM FULL;

Choose Appropriate Data Types

Use the smallest data type that will accommodate your data
For example, use VARCHAR(50) instead of TEXT for short strings if you know they won’t exceed 50 characters

Index Thoughtfully

Add indexes for columns you frequently search or join on
But remember each index increases the storage and slows down writes

Regular Maintenance

Set up autovacuum correctly
Monitor for bloat
Check for fragmentation

Conclusion

Understanding PostgreSQL’s storage system doesn’t need to be complicated. Think of it as an organized filing system with:

Files as your filing cabinets
Pages as the drawers in those cabinets
Tuples as the individual folders in each drawer

By understanding this structure, you can make better decisions about:

How to organize your data
How to optimize queries
How to maintain your database for long-term performance

Even if you’re starting out as a PostgreSQL DBA, this fundamental knowledge will serve as a strong foundation as you develop your skills and take on more complex database challenges.

What aspects of PostgreSQL storage would you like to explore next? Understanding how indexes work with this storage system? Diving deeper into VACUUM operations? Let me know what interests you most!

PostgreSQL Storage Management: A Beginner’s Guide to Files, Pages, and Tuples

What We’ll Cover

The Big Picture: How PostgreSQL Organizes Data

Files: Where Your Data Lives

What Are Database Files?

The Filing System

Real-World Example

Pages: The Building Blocks

What Is a Page?

Why Pages Matter

Inside a Page

Real-World Example

Tuples: Your Actual Data

What Is a Tuple?

Inside a Tuple

Example of a Tuple

The TOAST System (Handling Large Values)

Real-World Example

Why This Matters for Database Administrators

1. Estimate Storage Needs

2. Improve Performance

3. Understand and Fix Bloat

4. Diagnose Space Issues

Common Problems and Simple Solutions

Problem 1: Database Growing Too Large

Problem 2: Slow Queries on Large Tables

Problem 3: Database Bloat

Visualizing the Storage Hierarchy

Practical Tips for Beginners

Conclusion

Related Posts

Leave a Comment Cancel Reply