What is Data Deduplication

Companies need to store massive amount of data everyday. They need their data backed up quickly and kept safe and ready for a prompt restore. In now days, this task is becoming harder and harder because data to backup is continualy growing. In fact, currently there are almost 3 ZettaBytes of business data in the world that costs USD 1.1 Trillion to store and secure. But there is a new concept gaining popularity among IT people. It is Data Deduplicacion and it comes to help in a huge problematic: data store, backup and restore!

What is Data Deduplication?

Data Deduplication is a specialized form of data compression whose main purpose is to eliminate coarse-grained redundant data. This technique is primarily used to improve storage utilization, but also helps to reduce network bandwidth consumption during file transfers.

Data Deduplication for Backups

Let’s say you are backing up some files at your office, some of those files changes a lot during a week, while others don’t and you need to make a daily full back up of all of them.

Every time you perform a full backup you copy modified files and duplicated copies of unchanged files. As you can see this scenario has really poor storage efficiency, so you move to Incremental BackUp.

Incremental Backup is better for backup’s storage requirements because you copy only those files that have changed since the last backup, but if somebody performs a tiny modification on any file, those modified files will have to be copied totally. So this is still not optimal.

Data Deduplication optimize data store utilization for backups by storing changed file chunks only. Remember, a file is actually stored in several fragments called file chunks. So, your backup will have changed chunks only and no duplicated files.

As you can see, this technology can increase your storage optimization. Using a very small chunk size of 4KB, HP says you can reduce your disk space for backup up to 1:20 depending on files nature.

Data Deduplication for File Servers

This technology can also help to reduce capital and operative costs in File Servers. In this case, deduplication is file based thus duplicated files are stored once. Currently, there are some developments oriented to use file chunk deduplication for Windows Storage Servers, but this can impact negatively on server’s performance.

In conclusion, Data Deduplication is here to reduce pressure on storage demands and is going to be a fundamental feature inside moderns and optimized data centers.

Jose David Gonzalez's Blog

A window to my world

What is Data Deduplication?