Backup Strategies and Disaster Recovery in PostgreSQL: A Beginner’s Guide

Outline

  1. Introduction: Why Backups Matter
  2. Understanding the Basics of PostgreSQL Backups
  3. Logical Backups Using pg_dump
  4. Physical Backups: Taking Database Snapshots
  5. Point-in-Time Recovery: The Ultimate Safety Net
  6. Setting Up a Practical Backup Schedule
  7. Disaster Recovery Essentials
  8. Testing Your Backups: The Often Forgotten Step
  9. Automating Your Backup Strategy
  10. Conclusion and Best Practices

Why Backups Matter

Imagine this: It’s Monday morning, you’re sipping your coffee, and suddenly your phone rings. Your company’s database has crashed, and three years of critical customer data is gone. Your heart sinks as you realize you haven’t set up proper backups.

This nightmare scenario happens more often than you’d think. As a PostgreSQL database administrator (DBA) with experience, I’ve seen talented developers and administrators learn this lesson the hard way.

Backups aren’t just a technical requirement—they’re your safety net. They protect against:

  • Hardware failures (that trusty server won’t last forever)
  • Human errors (we’ve all accidentally run that DELETE query without a WHERE clause)
  • Software bugs (even PostgreSQL isn’t perfect)
  • Malicious attacks (ransomware doesn’t care how important your data is)
  • Natural disasters (from power outages to floods)

Let’s walk through how to protect your PostgreSQL databases with proper backup strategies and disaster recovery plans.

Understanding the Basics of PostgreSQL Backups

Before diving into specific techniques, let’s understand the three main types of PostgreSQL backups:

  1. Logical backups: These are SQL statements that recreate your database objects and data. Think of them as detailed instructions to rebuild your database from scratch.
  2. Physical backups: These are direct copies of the database files. Like taking a snapshot of your database exactly as it exists at a moment in time.
  3. Continuous archiving: This involves saving transaction logs continuously, allowing you to recover to any point in time.

Each method has its place in a comprehensive backup strategy. Let me explain when to use each one.

Logical Backups Using pg_dump

Logical backups are the simplest place to start. PostgreSQL provides the pg_dump utility, which creates a file containing all the SQL commands needed to recreate your database.

How to Create a Logical Backup

Here’s a basic example:

pg_dump -U postgres -d mydb > mydb_backup.sql

This command connects as the postgres user to the “mydb” database and saves the output to a file called mydb_backup.sql.

For a more efficient backup, use the custom format:

pg_dump -U postgres -d mydb -Fc -f mydb_backup.custom

The -Fc flag creates a compressed custom format file that’s smaller and can be restored selectively.

When to Use Logical Backups

I recommend logical backups when:

  • Your database is relatively small (under 100GB)
  • You need to migrate data between different PostgreSQL versions
  • You want to back up specific tables or schemas
  • You need to store backups in version control

Restoring a Logical Backup

To restore a plain SQL backup:

psql -U postgres -d mydb < mydb_backup.sql

For custom format backups:

pg_restore -U postgres -d mydb mydb_backup.custom

Physical Backups: Taking Database Snapshots

When I was managing a large e-commerce database that grew to several terabytes, logical backups became too slow. That’s when physical backups saved the day.

Physical backups copy the actual data files PostgreSQL uses to store your database. They’re faster to create and restore for large databases.

Creating a Physical Backup

The simplest way is using pg_basebackup:

pg_basebackup -D /backup/mydb -Ft -z -U postgres

This creates a compressed tar file of your database in the /backup/mydb directory.

When to Use Physical Backups

Physical backups are ideal when:

  • Your database is large (100GB+)
  • You need faster backup and recovery
  • You’re setting up replication
  • You want a complete database copy with minimal overhead

Restoring a Physical Backup

To restore a physical backup:

# Stop PostgreSQL
systemctl stop postgresql

# Clear data directory and extract backup
rm -rf /var/lib/postgresql/data/*
tar -xzf /backup/mydb/base.tar.gz -C /var/lib/postgresql/data/

# Restart PostgreSQL
systemctl start postgresql

Point-in-Time Recovery: The Ultimate Safety Net

Now, let me share what I consider the crown jewel of PostgreSQL backup strategies: Point-in-Time Recovery (PITR).

Imagine this scenario: At 9:15 AM, someone accidentally deletes crucial customer data. You don’t discover this until 2:30 PM. With standard backups from last night, you’d lose all the morning’s transactions. But with PITR, you can recover to exactly 9:14 AM, right before the mistake happened!

How PITR Works

PITR combines two elements:

  1. A base backup (physical backup)
  2. Write-Ahead Log (WAL) archives that record every change

By replaying the WAL archives up to a specific point, PostgreSQL can reconstruct your database to any moment in time.

Setting Up PITR

First, configure WAL archiving in postgresql.conf:

wal_level = replica
archive_mode = on
archive_command = 'cp %p /archive/%f'

Next, take a base backup:

pg_basebackup -D /backup/base -Ft -z -U postgres

PostgreSQL will now archive transaction logs to the /archive directory.

Recovering to a Specific Point in Time

When disaster strikes, you can recover to a specific timestamp:

# Restore base backup first, then create recovery.conf (PostgreSQL 11 or earlier)
echo "restore_command = 'cp /archive/%f %p'" > recovery.conf
echo "recovery_target_time = '2023-06-15 09:14:00'" >> recovery.conf

# For PostgreSQL 12+, use recovery.signal and postgresql.conf
touch /var/lib/postgresql/data/recovery.signal
echo "restore_command = 'cp /archive/%f %p'" >> postgresql.conf
echo "recovery_target_time = '2023-06-15 09:14:00'" >> postgresql.conf

Setting Up a Practical Backup Schedule

Let me share a practical backup schedule I’ve implemented for many clients:

  1. Daily logical backups of critical tables (small, frequently changed data)
  2. Daily full physical backups retained for 7 days
  3. Weekly full physical backups retained for 1 month
  4. Monthly full physical backups retained for 1 year
  5. Continuous WAL archiving with 2 weeks retention

Here’s a simple script to implement daily backups:

#!/bin/bash
# Daily backup script
DATE=$(date +%Y-%m-%d)
BACKUP_DIR="/backup/postgresql"

# Create logical backup of critical tables
pg_dump -U postgres -d mydb -t customers -t orders -Fc -f ${BACKUP_DIR}/critical_${DATE}.custom

# Create physical backup
pg_basebackup -D ${BACKUP_DIR}/full_${DATE} -Ft -z -U postgres

# Clean up old backups (keep 7 days)
find ${BACKUP_DIR} -name "critical_*" -mtime +7 -delete
find ${BACKUP_DIR} -name "full_*" -mtime +7 -delete

Disaster Recovery Essentials

Having backups is only half the battle. You need a disaster recovery plan to use them effectively.

Key Elements of a Disaster Recovery Plan

  1. Documentation: Write down exact recovery steps for different scenarios
  2. Recovery Time Objective (RTO): How quickly must you be back online?
  3. Recovery Point Objective (RPO): How much data can you afford to lose?
  4. Team responsibilities: Who does what during a recovery?
  5. Communication plan: How will you update stakeholders?

Sample Disaster Recovery Scenario

Let’s walk through a common scenario:

Situation: Your production database server has failed completely.

Recovery Steps:

  1. Provision a new server with PostgreSQL installed
  2. Restore the latest physical backup
  3. Apply WAL archives to recover transactions since the backup
  4. Verify data integrity with sample queries
  5. Update application connection settings to point to the new server
  6. Monitor the system for any issues

Testing Your Backups: The Often Forgotten Step

When I conduct database audits, I often find untested backups. This is dangerous! An untested backup is potentially no backup at all.

How to Test Your Backups

  1. Regular restore tests: Monthly restore to a test server
  2. Verification queries: Run data validation queries after restore
  3. Application testing: Have developers verify application functionality
  4. Disaster recovery drills: Simulate failures and practice recovery

Here’s a simple verification script:

#!/bin/bash
# Backup verification script
BACKUP_FILE="/backup/postgresql/mydb_latest.custom"
TEST_DB="mydb_test"

# Create test database
dropdb --if-exists -U postgres ${TEST_DB}
createdb -U postgres ${TEST_DB}

# Restore backup
pg_restore -U postgres -d ${TEST_DB} ${BACKUP_FILE}

# Run verification query
RECORD_COUNT=$(psql -U postgres -d ${TEST_DB} -t -c "SELECT COUNT(*) FROM customers;")
echo "Verified backup contains ${RECORD_COUNT} customer records."

# Clean up
dropdb -U postgres ${TEST_DB}

Automating Your Backup Strategy

As your database grows, manual backups become impractical. Automation is key.

Tools for Backup Automation

These tools can simplify PostgreSQL backup management:

  • pgBackRest: Comprehensive backup management
  • Barman: Backup and recovery manager
  • WAL-G: High-performance archival and restoration
  • Cron: Simple scheduling for scripts

Simple Automation with Cron

Here’s how to schedule our backup script:

# Edit crontab
crontab -e

# Add daily backup at 1 AM
0 1 * * * /path/to/backup_script.sh >> /var/log/pg_backup.log 2>&1

Conclusion and Best Practices

After years of managing PostgreSQL databases, here are my top backup best practices:

  1. Implement multiple backup methods – Don’t rely on just one approach
  2. Store backups in multiple locations – Follow the 3-2-1 rule: 3 copies, 2 different media types, 1 off-site
  3. Automate everything – Human memory fails, scripts don’t
  4. Test regularly – Untested backups may give false security
  5. Monitor your backups – Set up alerts for backup failures
  6. Document your procedures – Make sure others can recover if you’re unavailable
  7. Review and update – As your data grows, your backup strategy should evolve

Remember: The time to set up backups is not after a disaster, but well before it. Your future self will thank you!

What backup strategies are you currently using? Have questions about implementing these approaches? Let me know in the comments below!


Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top