Yet harnessing the power of big data is not without challenges. The same massive volumes of structured and unstructured data that create these opportunities for innovation can confound attempts to cost-effectively contain it, let alone extract value from it. And while the strategic questions surrounding big data are indeed difficult- What data do we actually need? How should we analyze and interpret it? What value will we eventually get from it? Perhaps the most difficult question to answer is the most basic: How will we store it?
George DeBono, General Manager, Middle East & Africa at Red Hat says that to take complete advantage of big data, enterprises must take a holistic approach and transform their view of storage from a 'data destination' to a 'data platform'. Unlike conventional storage methods, a data storage platform ultimately enables the fulfillment of the enterprise’s current and foreseeable storage needs and must satisfying five fundamental requirements for managing big data:
1. Deliver cost-effective scale and capacity
The ability to maximize capacity while minimizing cost is critical for a storage platform operating at big data scale. A big data storage system must be readily and cost-effectively scaled even as the enterprise’s storage demands grow dramatically. This requirement stands in stark contrast to what enterprises have come to expect from proprietary scale-up Network Attached Storage (NAS) and Storage Area Network (SAN) systems, whose fixed capacity dwindles away with use until the next data purge or forklift upgrade.
To minimize cost, big data storage platforms take a scale-out, as opposed to a scale up-approach, achieving scale by pooling industry-standard commodity servers and storage devices. This ensures both low costs today and the ability to benefit from increased buying power as hardware gets better, faster, and cheaper over time.
An effective big data storage system must also be scalable in terms of performance, so that
applications experience no degradation as the volume of data in the system increases.
2. Eliminate data migration
Because of the fixed capacity of traditional storage systems and the need to balance future storage needs with current capital expenditures, many businesses are forced to migrate their data to newer
systems every few years. Unfortunately for these enterprises, this migration is expensive and time-consuming.
With enterprise data stores now approaching petabyte sizes, wholesale data migration is no longer logistically or financially feasible. A big data platform must address the requirement for periodic data migration by providing a system with the ability to grow without bound.
3. Bridge legacy storage silos
Over the last few years, storage sprawl- being forced to install entirely new instances of storage systems to keep up with data growth- has become an increasingly common problem within the enterprise. Because these discrete systems are fundamentally disconnected from one another, they immediately become data silos that inhibit an enterprise’s ability to see the big picture.
To be able to fully exploit the opportunities of big data, companies must be able to access and use all of their data without ad-hoc interventions. Unlike conventional storage devices, a big data storage platform must bridge these legacy storage silos, rather than simply add yet another storage solution to the mix.
4. Provide global accessibility of data
Before big data, figuring out how to improve data access for globally-distributed users presented a significant challenge to IT organizations. Now, the data itself has become globally distributed, transforming the IT challenge into one of making it readily available to users and applications across
the global enterprise.
A centralized approach to data management is no longer workable in the age of big data. Data volumes are too large, WAN bandwidth is too limited, and the consequences of a single point of failure are too costly. A big data storage platform must be able to manage data that is distributed across the global enterprise as a single, unified pool.
5. Protect and maintain the availability of data
Conventional storage systems have historically relied on hardware redundancy and external backups to reduce failures and increase data availability. The sheer size and decentralized nature of big data renders a hardware or backup-dependent strategy too cost-prohibitive and inflexible to implement.
Rather than seeking to protect against failure through the use of proprietary, enterprise-grade hardware, a big data storage platform must assume that hardware failure is inevitable and offer data availability and integrity through intelligent software.
Big data holds great promise for the modern enterprise. Yet it comes with a unique set of requirements that place it well beyond the reach of traditional NAS and SAN storage systems. A data storage platform designed with the realities of big data in mind is critical for companies. Without taking steps to transition to a robust and capable data storage platform, efforts to extract value from big data will no doubt be hindered by technical and logistical challenges.
About Red Hat, Inc.
Red Hat is the world's leading provider of open source software solutions, using a community-powered approach to reliable and high-performing cloud, Linux, middleware, storage and virtualization technologies. Red Hat also offers award-winning support, training, and consulting services. As the connective hub in a global network of enterprises, partners, and open source communities, Red Hat helps create relevant, innovative technologies that liberate resources for growth and prepare customers for the future of IT. Learn more at http://www.redhat.com.
Red Hat and JBoss are trademarks of Red Hat, Inc., registered in the U.S. and other countries. Linux® is the registered trademark of Linus Torvalds in the U.S. and other countries.