Search:

Example One - Part 2 - Archive organisation

Archive structure

As you already saw, the archives for the wcp source are store under the directory $HOME/zz-backup/wcp.

# ls $HOME/zz-backup/wcp
0001/  0002/  0003-test/

All of the three directories will show a complete listing of the wcp source directory.

For each archive version you have one directory named with the archive number followed by the optional version label.

Looking into one of the directories

# ls -l $HOME/zz-backup/wcp/0001
total 148
-rwxr-xr-x  4 wzk wzk   713 2006-01-03 22:11 backup-etc*
-rwxr-xr-x  3 wzk wzk  2379 2006-01-03 12:44 check-copy*
-rw-r--r--  4 wzk wzk  5433 2006-01-04 01:08 config.c
-rw-r--r--  4 wzk wzk  1413 2006-01-03 11:57 config.h
-rw-r--r--  4 wzk wzk  2593 2006-01-03 11:35 io-lib.c
-rw-r--r--  4 wzk wzk  1240 2006-01-03 11:27 io-lib.h
-rw-r--r--  4 wzk wzk  7718 2006-01-04 14:08 lib.c
-rw-r--r--  4 wzk wzk  1947 2006-01-04 14:08 lib.h
-rw-r--r--  2 wzk wzk   654 2006-01-05 13:42 makefile
-rw-r--r--  4 wzk wzk 25725 2006-01-05 14:59 server.c
-rw-r--r--  4 wzk wzk  2158 2006-01-04 01:01 server.h
-rw-r--r--  4 wzk wzk  7436 2006-01-04 23:49 stat.c
-rw-r--r--  4 wzk wzk  1538 2006-01-04 23:49 stat.h
-rwxr-xr-x  4 wzk wzk   133 2006-01-04 01:25 test-handler.sh*
-rw-r--r--  4 wzk wzk  2350 2006-01-03 12:17 version.c
-rw-r--r--  4 wzk wzk  4530 2006-01-05 12:19 wcp.1
-rw-r--r--  4 wzk wzk 25580 2006-01-05 14:00 wcp.c
-rw-r--r--  4 wzk wzk  4267 2006-01-05 15:13 wcpd.1
-rw-r--r--  4 wzk wzk  1329 2006-01-02 19:47 wcp.h
drwxr-xr-x  2 wzk wzk  4096 2006-01-05 15:14 zz-test/

shows the second important thing to note.

Archived files are normal files. There is no special processing.

Disk space

Looking and the used space for the directories shows

# pwd
wcp-1.0.12
# du .
8       ./zz-test
166     .

that the source directory is approximatley 116kByte large

# du $HOME/zz-backup/wcp/000* | awk '/./ { if (split($2, x, "/") == 6) print }'
160     /home/wzk/zz-backup/wcp/0001
160     /home/wzk/zz-backup/wcp/0002
160     /home/wzk/zz-backup/wcp/0003-test

and that the archive directories are approximately of the same size (Note: the awk statement reduces du's output to the main archive directories supressing the zz-tests).

So what exactly is then wcp's feature? This is revealed by

# du $HOME/zz-backup/wcp | awk '/./ { if (split($2, x, "/") == 5) print }'
204     /home/wzk/zz-backup/wcp

The whole archive space consumes only 160kBytes which is less than 3 * 120. Looking at the space consumption of the different archive directories returns

160     /home/wzk/zz-backup/wcp/0003-test
16      /home/wzk/zz-backup/wcp/0001
12      /home/wzk/zz-backup/wcp/0002
204     /home/wzk/zz-backup/wcp/

The numbers look so weird because wcp uses hardlinks instead of file copies when it finds an unmodified file. You can think of an hardlink as a pointer in one directory refering to a file in another directory (this comes close to reality but real files are never "in a directory" in the sense that they are contained there - directories are only reference lists).

According to the output above archive #3 is the largest. This is right and wrong, it depends. For du it's correct, because du finds the 0003-test directory first counting all previously unseen inodes for this directory. Duplicate inodes which are then found in 0001 are not counted again. However, common sense is also right (making du's interpretation wrong) thinking the 0001 is the largest directory because all files were copied here first.

The size allocated by an archive directory is the sum of the sizes of the files changed since the last run plus some overhead for the directories in the archive.

This also means that "unneccessary" archives (archive version where only a few files are modified) consume fewer space.

Archive directories

Let's look again into an archive directory. This shows an almost normal directory listing.

# ls -l $HOME/zz-backup/wcp/0003-test
total 148
-rwxr-xr-x  3 wzk wzk   713 2006-01-03 22:11 backup-etc*
-rwxr-xr-x  3 wzk wzk  2379 2006-01-03 12:44 check-copy*
-rw-r--r--  3 wzk wzk  5433 2006-01-04 01:08 config.c
-rw-r--r--  3 wzk wzk  1413 2006-01-03 11:57 config.h
-rw-r--r--  3 wzk wzk  2593 2006-01-03 11:35 io-lib.c
-rw-r--r--  3 wzk wzk  1240 2006-01-03 11:27 io-lib.h
-rw-r--r--  3 wzk wzk  7718 2006-01-04 14:08 lib.c
-rw-r--r--  3 wzk wzk  1947 2006-01-04 14:08 lib.h
-rw-r--r--  1 wzk wzk   654 2006-01-05 15:15 makefile
-rw-r--r--  3 wzk wzk 25725 2006-01-05 14:59 server.c
-rw-r--r--  3 wzk wzk  2158 2006-01-04 01:01 server.h
-rw-r--r--  3 wzk wzk  7436 2006-01-04 23:49 stat.c
-rw-r--r--  3 wzk wzk  1538 2006-01-04 23:49 stat.h
-rwxr-xr-x  3 wzk wzk   133 2006-01-04 01:25 test-handler.sh*
-rw-r--r--  3 wzk wzk  2350 2006-01-03 12:17 version.c
-rw-r--r--  3 wzk wzk  4530 2006-01-05 12:19 wcp.1
-rw-r--r--  3 wzk wzk 25580 2006-01-05 14:00 wcp.c
-rw-r--r--  3 wzk wzk  4267 2006-01-05 15:13 wcpd.1
-rw-r--r--  3 wzk wzk  1329 2006-01-02 19:47 wcp.h
drwxr-xr-x  2 wzk wzk  4096 2006-01-05 15:16 zz-test/

Almost normal because the values in the second column are not so usual. This is the hardlink count to the file listed. Usually this count is "1" for regular files. Here the link count to e.g. server.c is 3 because the file was unchanged during 3 (we had only 3) wcp runs.

For each wcp run in which a file is found to be unchanged it's link count is increased by one. Hardlinks refer always to the same file. The archives should therefore not be writeable.

Directories like zz-test are never hardlinked by wcp, they are created as "real" directories instead. That is the link count for zz-test is in no way related to it's number of equal archived version.

Archive verification

When you want to compare an archive's contents against your current directory you can do this either on your own or use check-copy for it.

# ./check-copy $HOME/zz-backup/wcp/0003-test
./makefile: ok
./zz-test: ok
./version.c: ok
./stat.c: ok
./lib.c: ok
zz-test/empty.txt: ok
./check-copy: ok
./wcp.c: ok
./server.c: ok
./wcpd.1: ok
./wcp.1: ok
./io-lib.c: ok
./config.c: ok
./stat.h: ok
./lib.h: ok
zz-test/README: ok
./wcp.h: ok
./server.h: ok
./backup-etc: ok
./test-handler.sh: ok
./io-lib.h: ok
./config.h: ok

check-copy prints a line for each file in the current directory (or below) and either "ok" if the file's directory information is equal to those found in the archive directory given as command line argument, or shows what the differences are.

NOTE: If check-copy works or not depends on the output of ls. check-copy expects for each file exactly nine output fields from ls and especially the last modification date might be a problem. To address this issue you can edit the LSOPTS variable in the check-copy script to fit the ls output to this requirement. Let me repeat, if ls does not return exactly nine output fields you'll get nothing.

Let's assume you had to edit check-copy, let's run it again.

# ./check-copy /home/wzk/zz-backup/wcp/0003-test/ | grep -v 'ok$'
./check-copy: size [2386, 2379] mtime [Jan-5-15:24, Jan-3-12:44]

This tells you that the file check-copy differs in size and last modification date from what is in the archive. To be up to date you should run another backup.

# make backup
rm -f *.o cut out tags wcp wcpd wcp-1.0.12.tar.gz
wcp store $HOME/zz-backup/wcp -v
0004 0004
# .
S ./check-copy

# ./zz-test

and now

# ./check-copy /home/wzk/zz-backup/wcp/0004/ | grep -v 'ok$'

we don't get any output of course.