Thursday, July 29, 2010

What are the differences between file-level vs. block-level deduplication?

What are the differences between file-level vs. block-level deduplication?: **

"Many different data backup vendors provide data deduplication services, so expect to find subtle differences in how both file-level deduplication and block-level deduplication are implemented. In general, file-level deduplication watches for multiple copies of the same file, stores the first copy, and then just links the other references to the first file. Only one copy gets stored on the disk/tape archive. Ultimately, the space you save on disk relates to how many copies of the file there were in the file system."

Block-level, sometimes called variable block-level deduplication, looks at the data block itself to see if another copy of this block already exists. If so, the second (and subsequent) copies are not stored on the disk/tape, but a link/pointer is created to point to the original copy. For example, John's copy of a file may in fact just be a pointer to Mary's file -- if Mary's file was the first to be archived.

There are pros and cons associated with both file-level and block-level deduplication. For instance, if 1,000 identical attachments are sent out by a benefits coordinator, file-level dedupe will find those 1,000 attachments that are exactly the same, but it won't find the exact duplicate copies you saved (i.e., Benefits_file.Aug, Benefits_file.Sep, Benefits_file.Oct, etc.). Block-level dedupe will find all of the duplicates, even if named differently, and will store the name variations with pointers to the original blocks. Variable block-level will also account for misaligned data sets on disk and would also detect the duplicate files and exact copies with different names.
File-level deduplication will save a relatively small amount of space on your disk/tape archive. Block-level deduplication will save more space on your disk/tape archive, and variable block-level deduplication will save even more space on your disk/tape archive. However, as with any other data storage technology or technique, be advised that your mileage will vary depending on the amount of replicated data you have in your file systems.

File-level deduplication will save a relatively small amount of space on your disk/tape archive. Block-level deduplication will save more space on your disk/tape archive, and variable block-level deduplication will save even more space on your disk/tape archive. However, as with any other data storage technology or technique, be advised that your mileage will vary depending on the amount of replicated data you have in your file systems.

NavigateStorage is in its 11th year specializing in Storage of all types.

**Written by, Dave Ellis, Principal Technologist, Instrumental Inc.Dave Ellis is the Principal Technologist at Instrumental Inc. Dave has over 32 years of high tech experience. As a member of Instrumental Inc.'s CTO staff, Dave tracks, evaluates and implements high performance computing solutions that anticipate and meet changing customer requirements. Previously, Dave was the Director of HPC Architecture for the Engenio Storage Group of LSI Logic, and has held Senior Systems Engineering positions at Applied Information Sciences, Computer Sciences Corp., Data General, Dynamac Corp., Intransa Inc., Silicon Graphics, Sperry and Xyratex. Dave also served as an Aircraft Loadmaster in the U.S. Air Force, was a Master Instructor in both the USAF Aircraft Loadmaster and Computer Programming schools, and is a disabled Vietnam Veteran. His Air Force training also led to degrees in Management from Vernon College in Texas, and in Traffic and Transportation Management from the Community College of the Air Force.

No comments: