EMC Corporation, the world leader in information
infrastructure solutions, today announced the EMC Data Domain
Global Deduplication Array (GDA), the industry’s fastest inline
deduplication storage system for enterprise backup applications. The
Global Deduplication Array, based on a new multi-controller extension of
the Data Domain architecture, offers inline global deduplication and a
global namespace for all data stored in the dual controller system.
With throughput up to 12.8 terabytes per hour (TB/hour), it establishes
consistently high benchmarks across the spectrum of common data center
backup metrics. The Global Deduplication Array provides up to 14.2
petabytes (PB) of logical backup capacity, driving new levels of
simplicity for data center backup consolidation across workloads as
diverse as very large databases, VMware images, and unstructured data.
Unlike most multi-controller deduplication systems, the inline Global
Deduplication Array is tightly coupled with backup software, enabling
industry leading inline deduplication performance, dynamic distribution
of load and simplicity of operation. The Global Deduplication Array
distributes parts of the deduplication process to the backup servers to
reduce network load and increase the throughput performance of the GDA
controllers. It offers more than 3x faster backup throughput per
controller than competitive deduplication configurations and is the
fastest inline deduplication system available. This distributed
deduplication processing throughput is anchored by the native speed
advantages of the Intel Xeon multi-core CPUs in the GDA controllers
and the Data Domain SISL
(Stream-Informed Segment Layout) scaling architecture that minimizes the
number of disk accesses required in the deduplication process. At
initial release, the platform supports Symantec NetBackup and Backup
Exec through backup server-based OpenStorage plug-in software. Later in
2010, it will also support EMC NetWorker using integrated software.
The Global Deduplication Array presents a single inline deduplication
storage pool to the backup application across two EMC Data Domain DD880
controllers. Large datacenter backup jobs are dynamically and
transparently load balanced across the controllers, simplifying capacity
management, performance management and backup administration.
— For backup environments with hundreds of terabytes to process, administrators can target their backup policies to a Global Deduplication Array and leverage a common deduplication storage environment for all data protected by those policies.
— The Global Deduplication Array accommodates up to 270 concurrent backup jobs and up to 12.8 TB/hour of throughput, allowing more backups to finish sooner while putting less pressure on limited backup windows.
— Global namespace minimizes the need to reconfigure complex backup policies, while innovative global deduplication technology dynamically load balances policies for performance and capacity management. Consequently, very large data sets can be easily protected with administrative simplicity while maximizing overall deduplication efficiency and therefore minimizing physical storage footprint.
"Figuring out how to get backups done within the allotted period of
time in the face of data growth is still the biggest data protection
challenge that organizations face according to our research," said Brian
Babineau, Senior Consulting Analyst with Enterprise Strategy Group.
"With their Data Domain Global Deduplication Array, EMC has far exceeded
the inline deduplication performance benchmark it set with its previous
top-of-line Data Domain system, but more importantly, the company has
given customers a way to protect more of their data in a shorter period
of time. We expect more companies to evaluate integration between
backup software and deduplication storage to maximize these performance
levels and data reduction results while consolidating administrative
tasks."
"The EMC Data Domain Global Deduplication Array, while very
sophisticated under the hood, builds on the mature foundation of the
existing Data Domain platform and retains its appliance simplicity,"
said Brian Biles, Vice President of Product Management, EMC Backup
Recovery Systems Division. Its deduplication is inline, it’s blistering
fast, and it’s big enough for significant datacenter backup
consolidation, but its dynamic load balancing, single deduplication
storage pool and namespace and tight integration with backup software
means the Global Deduplication Array is easier to operate than
competitors who don’t have its scale. EMC has once again moved the dial
on disk-based data protection."
Unequalled Replication Capabilities
— With the EMC Data Domain Replicator software option, the Global Deduplication Array can automate wide area network (WAN) vaulting for use in disaster recovery (DR), remote office backup, or multi-site tape consolidation.
— A single Global Deduplication Array can support a replication fan-in of up to 270 remote offices using smaller deduplication storage systems such as the Data Domain DD140 or the DD600 series appliances.
— Cross-site deduplication further minimizes the required bandwidth since only the first instance of data is transferred across any of the WAN segments between sites. Additionally, for fast offsite protection and consolidation of tape out operations, the Global Deduplication Array provides up to 54 TB/hour of replication throughput.
Like all Data Domain systems, the new Global Deduplication Array is
simple to install and flexible enough to be implemented into existing
user environments without disruption. Backed by available EMC 24x7x365
enterprise class service, it seamlessly integrates into Symantec
NetBackup and Backup Exec backup environments using the EMC Data Domain
OpenStorage software option.
Why Architecture Matters
The Global Deduplication Array is based on the same CPU-centric
approach to inline data deduplication as all EMC Data Domain systems.
Unlike most deduplication approaches that are added as afterthoughts to
existing disk arrays, Virtual Tape Libraries (VTLs) or backup software,
combined efficiencies of Data Domain include:
— SISL scaling architecture leverages CPU improvements to increase deduplication speed inline while minimizing reliance on disk accesses for performance. Data Domain systems have delivered consistent improvement in throughput performance by nearly 90 times and in capacity by more than 225 times over the last 6 years. Based on Intel’s CPU
roadmap, increased throughput is expected to continue growing significantly in the future.
— High performance inline deduplication for simplicity, to minimize system resources, administration, and internal system process contention.
— Green storage efficiency for a smaller system footprint and lower power consumption.
— Data Domain Data Invulnerability Architecture defends against data integrity issues by providing continuous verification during storage and recovery of data.