The need for making backups should be self-explanatory. There are several strategies you can use. My own strategy for minimizing the odds that I'll lose all my data is quite simple:
Rather than making remote copies, you might choose to dump to tape or writeable CD media. Whatever strategy you use, it is important to routinely check dumps to make sure they can be reloaded. The strategy outlined above means that in the case of catastrophic failure, I'll lose at most one day's data.
By mirroring disks and using a battery backup, preferably one that can trigger an automatic and controlled shutdown of the system when the battery runs low, you greatly lower the odds of ever having to use your nightly backup. Despite this, it is important to take backups seriously if the data stored at your site is valuable to you or your users.
It is also important that you use the Postgres dump utility, pg_dump, rather than simply copy the files in the database directory unless you stop the Postmaster while making your copy. If you copy the files while the Postmaster is running and updates to the database are being made, you'll end up with inconsistent tables and a potentially unusable database. pg_dump makes a consistent dump even when the database is in use.
I find it convenient to use AOLserver's schedule proc routine to run a Tcl script which generates my nightly backups, rather than use cron. The major benefit is that you can very easily make a web page that calls this script, and link to it from an admin page. If there are problems making your backups (connectivity problems seem to crop up a few times a year, in my experience), you can just click on the link to force a backup with no need to remember where you placed the script, what you called it, any arguments that might be needed, etc.
Currently, Postgres doesn't automatically reclaim space in a database table when an existing row is deleted or updated.
The "vacuum" command must be run periodically to reclaim space. The "vacuum analyze" form additionally collects statistics on the disbursion of columns in the database, which the optimizer uses when it calculates just how to execute queries. The availability of this data can make a tremendous difference in the execution speed of queries. I run this command as part of my nightly backup procedure - if "vacuum" is going to screw up my database, I'd prefer it to happen immediately after (not before!) I've made a backup! The "vacuum" command is very reliable, and has never caused me a problem, but conservatism is the key to good system management.
Here's a sample script based on the one used to back up the database backing my most important personal site, "birdnotes.net". If you're wondering why this procedure doesn't backup the scripts for the site as well, it is because they're developed on a local machine, which is backed up separately.
# Back up the database, scheduled to be run nightly. As written, it # keeps a month's worth of daily backups, cycling over the same files # which are suffixed with the day of the month on which the backup is # created. # This version: ftp only. # NOTE: The indenting gets screwed up during conversion to HTML proc backup {} { # Set these to the appropriate values for your installation. set b "/usr/local/pgsql/bin" set bak "/home/birdnotes_backup/" set db [ns_db gethandle] set sql "select date_part('day','today'::date) as day" set selection [ns_db 1row $db $sql] set_variables_after_query set data "birdnotes_$day.dmp" ns_log Notice "Backup of [ad_system_name] starting." ns_log Notice "pg_dump beginning..." if [catch {append msg [exec "$b/pg_dump" "birdnotes" ">$bak/$data"]} errmsg] { ns_log Error "pg_dump failed: $errmsg" ns_sendmail [ad_system_owner] [ad_system_owner] "[ad_system_name] : pg_dump failed..." "$errmsg" ns_db releasehandle $db return } append msg "\n" ns_log Notice "gzip of data beginning..." if [catch {append msg [exec "gzip" "-f" "$bak/$data"]} errmsg] { ns_log Error "gzip of data failed: $errmsg" ns_sendmail [ad_system_owner] [ad_system_owner] "[ad_system_name] : gzip of data failed..." "$errmsg" ns_db releasehandle $db return } append msg "\n" ns_log Notice "ftp data beginning..." set fd [open "$bak/ftp_data.tmp" w] # Replace "your_username", "your_password", and "your_remotedir" with the values # appropriate for the remote system on which you're keeping backup copies. puts $fd "user your_username your_password\nbinary\nput $bak/$data.gz your_remotedir/$data.gz\nquit\n" close $fd # "your_remoteserver" should be set to the IP of the remote system which stores your # backups. if [catch {append msg [exec "ftp" "-n" "your_remoteserver" "<$bak/ftp_data.tmp"]} errmsg] { \ ns_log Error "ftp data failed: $errmsg" ns_sendmail [ad_system_owner] [ad_system_owner] "[ad_system_name] : ftp data failed..." "$errmsg" ns_db releasehandle $db return } append msg "\n" # Replicate the above code to make remote copies to other systems ns_log Notice "vacuum beginning..." if [catch {append msg [exec "$b/psql" "-q" "-c" "vacuum analyze"]} errmsg] { ns_log Error "vacuum failed: $errmsg" ns_sendmail [ad_system_owner] [ad_system_owner] "[ad_system_name] : vacuum failed..." "$errmsg" ns_db releasehandle $db return } ns_db releasehandle $db ns_log Notice "Backup succeeded." append msg "Backups succeeded" ns_sendmail [ad_system_owner] [ad_system_owner] "[ad_system_name] : backup succeeded" "$msg } ns_share -init {set schedule_backup 0} schedule_backup if {!$schedule_backup} { ns_schedule_daily 0 backup ns_log Notice "Backup has been scheduled." }