Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Data deduplication for data optimization for storage and network systems
Kim D., Song S., Choi B., Springer International Publishing, New York, NY, 2017. 262 pp. Type: Book (978-3-319422-78-7)
Date Reviewed: Apr 25 2017

I have always thought of data deduplication as something I might use on a Windows server to reduce the space taken up by library files. Of course, it is much more than that. Storage deduplication has been used in commercial services like Dropbox, and network deduplication is used in appliances like the Riverbed SteelHead WAN accelerator.

In the first of the book’s four parts, the characteristics of direct-attached storage (DAS), storage area networks (SANs), and network-attached storage (NAS) arrangements are introduced. Software-defined storage (SDS) concepts and in-memory data-grid (IMDG) products are also discussed. There are lots of diagrams, tables, and photographs.

In order to ascertain which chunks of data might be considered as duplicates, one can compute an SHA-1 or other hash-function index for each chunk candidate. Such indexes can then be placed in memory for comparison purposes. Programs for SHA-1 computation, index-table management, and Bloom-filter comparison operations are included in the appendices.

The authors illustrate how deduplication can be performed on data chunks that are files, blocks of fixed size, or blocks of variable size. Hybrid mechanisms and object-level mechanisms can also be used. Rabin fingerprint (for chunk-boundary determination) and chunking programs can be found in the appendices.

Diagrams are used to show how deduplication operations can take place on a server, or on a client that is able to query a server and ascertain which data chunks the latter already has. A Linux bridge program is used to show how network-wide redundancy elimination can be performed on the fly within router devices or appliances.

Part 2 of the book provides details of two storage deduplication systems developed by the authors. The first of these is the hybrid email deduplication system (HEDS). This is a server-side file-level and block deduplication system for email systems that is able to achieve good storage space savings with a low processing overhead. A sendmail filter is used to separate metadata and content components, and the latter are split into chunks if they exceed a threshold size.

The second system described is the structure-aware file and email (SAFE) deduplication system for cloud-based storage systems. This uses a client-based approach that is able to remove redundant objects by employing structure-based granularity. A file parser is used to decompose PDF, document, and spreadsheet files, and a corresponding deduplication technique is then applied. The authors show how a SAFE client can be used as a Dropbox client using a Dropbox REST application programming interface (API). This circumvents the limitations associated with Dropbox’s default 4MB fixed-size block deduplication mechanism.

The software-defined deduplication as a network and storage service (SoftDance) has also been developed by the authors, and its characteristics are outlined in Part 3 of the book. SoftDance employs storage deduplication and network redundancy elimination using a software-defined network (SDN) to realize both storage space and network bandwidth savings. SoftDance middle-boxes (SDMBs) are used to identify packet payloads for encoding and store indexes appropriately.

In the final part (“Future Directions”), the authors discuss some considerations relating to mobile devices. Massive files are often created and used in such devices, and these are obvious candidates for deduplication. The approach taken is to use structure-aware deduplication for files based on their structure formats with strong and fast encryption. It is shown that Blowfish is able to provide superior performance on the ARM systems commonly used in smartphones.

I learned a lot from this book and can recommend it to anyone who believes that his company may benefit from the introduction of storage and/or network deduplication mechanisms. My only issue has been that the programs in the appendices cannot at this time be easily downloaded from the publisher’s website.

Reviewer:  G. K. Jenkins Review #: CR145216 (1707-0411)
Bookmark and Share
  Reviewer Selected
Featured Reviewer
 
 
Network Operations (C.2.3 )
 
 
File Systems Management (D.4.3 )
 
 
Performance of Systems (C.4 )
 
Would you recommend this review?
yes
no
Other reviews under "Network Operations": Date
FDDI networking
Nemzow M., McGraw-Hill, Inc., New York, NY, 1993. Type: Book (9780070463226)
Feb 1 1995
Networking the Macintosh
Woodcock B., McGraw-Hill, Inc., New York, NY, 1993. Type: Book (9780070716841)
Aug 1 1994
Network administration survival guide
Plumley S., John Wiley & Sons, Inc., New York, NY, 1999. Type: Book (9780471296218)
Apr 1 1999
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy