I was in a meeting the other day with my staff. We were reviewing our current backup strategies and soon discovered several shortcomings in our current backup operations. In summary, our data has grown faster than we anticipated years ago when we created our backup blueprint for the datacenter. In fact, we quickly recognized that we had more data than we realize on a day to day basis. This has come about for a multitude of reasons.
As typical of many organizations, we have more centralized applications that are data driven. We have more servers than ever before due to organizational demands and the fact that creating virtual servers is a snap and cost efficient. We have more users recognizing the benefit of centralized data storage and the value of querying and analyzing that data. Though we have less physical servers thanks to virtualization, that space in the rack is now taken up by SANs. Within one year we have doubled the number of drive shelves in our SAN. What’s more, because our organization is so dependent on all of that centralized data, our current off site backups are terribly insufficient should our datacenter fall victim to a massive power outage or devastating storm. In summary, our data is getting ahead of us and we are working now to find a solution to control it before we are simply overwhelmed.
We are not alone. According to Gartner, 47 percent of the respondents to a recent survey ranked data growth in their top three challenges. Respondents report on average that data in the enterprises is growing at 40 percent to 60 persent year-over-year due to a number of factors, including an explosion in unstructured data, such as e-mail and documents that have to be stored due to “regulatory requirements that continue to evolve and change.”
The most extreme examples of data explosion can be found within Facebook and Youtube. Facebook surpassed a count of 30,000 servers within its datacenter back in 2009. The amount of log data amassed in Facebook’s operations is staggering. Jeff Rothschild, VP of Technology at Facebook said Facebook manages more than 25 terabytes of data per day in logging data, which he said was the equivalent of about 1,000 times the volume of mail delivered daily by the U.S. Postal Service.
Similarly, Youtube reports that 48 hours of video are uploaded each minute, a one hundred percent increase in only one year. In total, 13 million hours of video were uploaded in 2010. In another study done by Oracle, Oracle’s research indicates the scale of data overload, showing that volumes have soared from an estimated 135 Exabytes in 2005 to 2,720 Exabytes by 2012. The staggering prediction is that this will reach 7,910 Exabytes by 2015. “Wrestling big data is going to be the single biggest IT challenge facing businesses over the next two years,” said Oracle senior vice president of systems, Luigi Freguia. “By the end of that period they will either have got it right or they will be very seriously adrift.”
So what are the solutions to this explosion of data that threatens to override our residing data structures? Well, for a problem this big, the solutions to solve the infrastructure challenges are very expensive. Oracle predicts a 40 percent to 60 percent increase in the number of businesses using external data centers by 2015. Cisco similarly predicts that 24 percent all data will reside outside of the internal data center, contributing to a 12-fold increase in the amount of cloud traffic by 2015. Oracle reported that 90 percent of those asked how they were planning to address this data explosion saw the need for more data centers, 60 percent within only two years and one in five within 12 months. In the Gartner study mentioned earlier, 62 percent said they plan to expand hardware capacity at existing data centers by the end of 2011; 30 percent plan to build entirely new data centers.
When comprehending this enormous number of data centers under construction, one wonders if a “Data Bubble” is being born as this type of growth cannot be sustained forever. Just as there is a limit to how many houses the economy can sustain, I am sure there is a limit to the number of data centers as well.
Then there is the issue of how users actually access these continents of data. This is where Data Warehousing is coming into play, in which a single database draws information from multiple sources across the enterprise (other databases in fact), allowing organizations to process information into intelligence and share it with its users.
And finally, there is the issue we are dealing with, how do we backup all of this data for everyday data corruption and loss to disaster recovery. One thing is for sure, tapes aren’t going to cut it. As we recently purchased an HP StorageWorks SAN solution, we are considering the built in the remote copy function which utilizes thin provision snapshots to between the primary and remote locations. This of course requires the acquisition of more SAN units to install at the remote location.
As I said earlier, all of the solutions to our data saturation are expensive, which is why I am confident that someone is right now working on an idea that will shift the entire paradigm backup and disaster recovery. One thing is for sure, we are in need of a great idea.
Posted on April 18, 2012
0