Data Redundancy

Copyright 2008 by Stephen Vermeulen
Last updated: 2008 Oct 12


Introduction

This is part of a series of articles on backing up computers. The top page is Design for an Archiving Backup System.

Redundancy of Source Data

On a network with multiple similar computers there can be a lot of redundant data, particularly in the operating system and applications. This occurs as each machine has much of the same files installed on it. A backup system that recognizes this and can exploit it can reduce both the volume of backup media and the time taken to perform backups (especially of the full backup kind).

The previous solution of using a cryptographic hash function to identify when a particular file has really changed can also be used to tell if a particular file on some other machine is really the same as the file on the current machine, and if so, allow us to avoid redundantly storing it.

I studied 5 machines on a network, these were running NT4.0 workstation, Win2000 Pro, XP Pro and NT 4.0 server. The following were the observed file and byte counts (I also looked at the number of chunks that would be needed for the cases of 8k, 64k, 256k byte, 1M and 4M byte chunks - more on chunks or blocks later).

Drive
Type
bytes
files
8k
64k
256k
1M
4M
b/file
1
boot
1548021913
11605
196261
32436
16056
12421
11665
133392
2
boot
812225898
8286
103925
18407
10128
8491
8232
98023
3
data
1071014685
1576
131623
17443
5332
2409
1724
679577
4
boot
206155734
1363
26076
4191
1980
1500
1392
151251
5
boot
3021169456
26426
384212
65983
34490
27861
26642
114325
6
data
5643107327
4021
690901
88539
24409
8711
4955
1403408
7
boot
4968751300
41790
631104
107273
55004
43865
41903
118898
8
data
28664611402
4429
3501985
441250
113556
31671
11214
6472027
9
boot
1158365835
8587
146393
23497
11388
8995
8602
134897
10
boot
2408981577
33682
314035
62977
39657
34729
33810
71521
11
data
1374408623
1512
168630
21971
6441
2641
1717
909000
12
boot
554979501
7471
72186
14127
8683
7632
7471
74284
13
boot
2932169136
35299
379173
72811
43140
36632 35359
83066
14
data
5376407172
49075
685023
119834
63822
51929
49308
109554
15
data
138341370679
85162
16942392
2177605
601115
207317
110688
1624449
Totals

198081740238
320284
24373919
2177605
1035201
486804
354682
320284





                back to arcvback.com home