M4 - Lesson 3: Implementing Data Deduplication
M4 - Lesson 3: Implementing Data Deduplication
Volumes must not be a system or boot volume:
Because most files used by an operating system are constantly open, Data Deduplication on system volumes would negatively affect the performance
Volumes might be partitioned by using master boot record (MBR) or GUID partition table (GPT) format and must be formatted by using the NTFS or ReFS file system.
Volumes can be on shared storage, such as Fibre Channel, iSCSI SAN, or SAS array.
Volumes must be attached to the Windows Server and cannot appear as non-removable drives. This means that you cannot use USB or floppy drives for Data Deduplication
Files with extended attributes, encrypted files, files smaller than 32 KB, and reparse point files will not be processed for Data Deduplication.
Data Deduplication is not available for Windows client operating systems.
What is Data Deduplication?
Data Deduplication stores more data in less physical space.
Scale and performance:
Data Deduplication is highly scalable, resource efficient, and nonintrusive.
Reliability and data integrity:
When you apply Data Deduplication to a volume on a server, it maintains the integrity of the data. Data Deduplication uses checksum results, consistency, and identity validation to ensure data integrity. Data Deduplication maintains redundancy to ensure that the data is repaired, or at least recoverable, in the event of data corruption.
Bandwidth efficiency with BranchCache:
Through integration with BranchCache, the same optimization techniques are applied to data transferred over the WAN to a branch office. The result is faster file download times and reduced bandwidth consumption.
Data Deduplication has optimization functionality built into Server Manager and Windows PowerShell.
Enhancements to Data Deduplication Role Service
Support for volume sizes up to 64 TB:
Data Deduplication in Windows Server 2012 R2 does not perform well on volumes greater than 10 TB in size
Support for file sizes up to 1 TB:
In Windows Server 2012 R2, very large files are not good candidates for Data Deduplication.
Simplified deduplication configuration for virtualized backup applications.
Support for Nano Server.
Support for cluster rolling upgrades:
Windows servers in a failover cluster running deduplication can include a mix of nodes running Windows Server 2012 R2 and nodes running Windows Server 2016.
Deploying Data Deduplication
Data Deduplication is designed to be applied on primary – and not to logically extended – data volumes without adding any additional dedicated hardware.
Determine which volumes are candidates for deduplication.
Is duplicate data present?
Does the data access pattern allow for sufficient time for deduplication?
Does the server have sufficient resources and time to run deduplication?
Evaluate savings with the Deduplication Evaluation Tool:
You can use the Deduplication Evaluation Tool,
, to determine the expected savings that you would get if you enable deduplication on a particular volume.
Plan the rollout, scalability, and deduplication policies:
The default deduplication policy settings are usually sufficient for most environments.
Software deployment shares. This includes software binaries, cab files, symbols files, images, and updates.
General file share. This includes a mix of all the types of data identified above.
Data Deduplication Interoperability
When a BranchCache-enabled system communicates over a WAN with a remote file server that is enabled for Data Deduplication
Windows Server 2016 fully supports failover clusters, which means deduplicated volumes will failover gracefully between nodes in the cluster.
Data Deduplication is compatible with Distributed File System (DFS) Replication. Optimizing or unoptimizing a file will not trigger a replication because the file does not change.
Although you should not create a hard quota on a volume root folder enabled for deduplication, using File Server Resource Manager (FSRM), you can create a soft quota on a volume root which is enabled for deduplication.
Backup & Restore considerations
Supported in 2016
Individual file backup/restore
Full volume backup/restore
Optimized file-level backup/restore using VSS writer
The complete set of data deduplication metadata and container files are restored.
The complete set of data deduplication reparse points are restored.
All non-deduplicated files are restored.
Not Supported 2016
Backup or restore of only the reparse points.
Backup or restore of only the chunk store.
Data Duplication components
This component monitors local or remote I/O and handles the chunks of data on the file system by interacting with the various jobs. There is one filter driver for every volume.
Data Deduplication has built-in data integrity features such as checksum validation and metadata consistency checking.
Data Deduplication includes garbage collection jobs to process deleted or modified data on the volume so that any data chunks no longer referenced are cleaned up.
Consisting of multiple jobs, they perform both deduplication and compression of files according to the data deduplication policy for the volume.
This job undoes deduplication on all of the optimized files on the volume. Some of the common scenarios for using this type of job include decommissioning a server with volumes enabled for Data Deduplication, troubleshooting issues with deduplicated data, or migration of data to another system that doesn’t support Data Deduplication.