IT Briefcase Exclusive Interview: Transforming Big Data with Robin Schumacher, DataStax
January 8, 2013 No CommentsThere is no doubt that Big Data has exploded over the last 12 months. The real question lies in how many businesses are truly maximizing the opportunities that Big Data can afford. In many cases, the first step is to rise above data management roadblocks that organizations new to Cloud Computing and Big Data inevitably encounter.
In the below interview, Robin Schumacher from DataStax offers expert advice as to how companies can overcome their biggest data management challenges, and capitalize on the ever growing Big Data and Cloud Computing evolution.
- Q. How do you see Big Data transforming the way people view data management today?
A. Being able to effectively handle big data seriously transforms how smart businesses operate and has major effects on how they both make and save money. Let me give you just one personal example.
On New Year’s Eve 2012, I was using my credit card to make some end of year charitable donations, which I usually don’t do; normally I donate by check. After the fourth donation, my credit card was cut off, and I instantly received an email from my credit card company telling me they suspected I was the victim of credit card fraud.
It used to be that card companies pooled people into demographic groups, monitored the group’s buying behaviors and identified fraud by group. Now, using big data technologies, they can go down to the individual’s buying patterns, which they weren’t able to do before because the older technologies just didn’t support that deeper and more granular level of analysis.
The bottom line is now, using big data applications, credit card companies like mine can identify potential fraud situations much quicker and ultimately protect themselves and save lots of money in the process.
- Q. What is the value of Cloud Computing within this Big Data evolution?
A. The cloud promises many things to today’s data professional: transparent scale and elasticity, higher availability and data redundancy, easier data distribution across geographies, simpler manageability, and lower operating costs. At DataStax, we have many customers like Netflix who run their entire operations in the cloud and are very happy with it.
At the same time, though, we’ve had other customers who started in the cloud and then found the costs to be too prohibitive for them, so they’ve moved back to on-premise management. So the cloud isn’t for every use case.
- Q. What pointers can you offer to help businesses function more efficiently within private, hybrid, and public cloud environments?
A. The biggest misconception I see where cloud data management is concerned is the thought that taking a legacy relational database and running it on a cloud provider magically gives you all the benefits that the cloud offers. That just isn’t true. Instead, you need to start with a modern database that’s designed to take advantage of how clouds work; one that’s built from the ground up to easily handle the demands of online elasticity, scale, and worldwide data distribution.
The second pointer would be to ensure that the cloud database you choose seamlessly works in all environments with little to no changes being required when you want to adjust where your data lives. Being able to move from an all on-premise configuration to a hybrid local/cloud design isn’t easy for most databases, but newer NoSQL solutions like Cassandra make it pretty easy.
- Q. What are the biggest data management challenges your clients are bringing to the table, and how is DataStax working to overcome these challenges?
A. We constantly see the key characteristics of big data moving our customers to utilize modern data solutions like Cassandra and DataStax Enterprise. The normal scenario is a customer has either a new big data application or an existing system that is taking on big data attributes. They try and use a traditional RDBMS for management and it quickly falls over. Or, sometimes they try another NoSQL database that isn’t equipped to handle big data and that produces failure equally as fast.
The normal requirements our customers have include the ability to consume data at very high rates (oftentimes in time series format), handle all forms of data (not just structured), and be able to distribute that data across multiple data centers and the cloud. After that comes the need to tackle big volumes of data.
- Q. Can you give us a few examples of how DataStax solutions such as Apache Cassandra work to increase speed, agility, analytic capabilities, and locational independence for enterprises today?
A. All the things that you mention are foundational capabilities in Cassandra. Cassandra’s architecture gives it the edge in the big data marketplace over other solutions that still adhere to older designs of data management, and therefore, end up falling short when push comes to shove in a big data application.
For production environments, we supply DataStax Enterprise, which provides a production ready version of Cassandra along with an integrated Hadoop distribution for batch analytics and Solr for enterprise search. This combination allows our customers to do everything they want to do with their data – consume it, analyze it, search it – in one database cluster that’s very easy to manage and grow.
- Q. How does DataStax OpsCenter, a visual management and monitoring solution, work to increase locational independence for businesses today?
A. One oftentimes forgotten aspect of big data is the management of it, which can be challenging especially if you have multiple, separate systems (e.g. real time, Hadoop, search) that house it all. The nice thing about OpsCenter is it lets you use your browser on your desktop, laptop, or tablet to manage all your big data clusters from a single pane of glass. This includes adding/modifying clusters, monitoring and tuning, backups, and getting proactive alerts on issues that need attention.
- Q. What major trends you foresee emerging within data management over the next year?
A. I’ve been in databases for quite a while, and in my career I’ve never seen such rapid adoption of new database technology like Cassandra and DataStax Enterprise. It surprises me how fast major companies are putting them in the critical path of key systems that run their business. The reason why, I think, all comes down to one key motivating factor: necessity.
In the same way open source databases were used out of necessity when the Web came alive a decade or so ago, we’re seeing the same thing for big data technologies now. However, whereas the OSS databases took a while to move up the application food chain and were confined to department systems for quite a while, big data technologies are being put front-and-center now.
Given what we’re seeing at DataStax and the major ways we’re watching customers use our software, I look for many more companies to embrace big data solutions and utilize a co-existence strategy where legacy RDBM’s are concerned. You’ll see serious growth of NoSQL deployments where they help transform how modern businesses work and how they both save and make money.
Robin Schumacher, DataStax VP of Products
Robin has spent the last 20 years working with databases and big data. He comes to DataStax from EnterpriseDB, where he built and led a market-driven product management group. Previously, Robin started and led the product management team at MySQL for three years before they were bought by Sun (the largest open source acquisition in history), and then by Oracle. He also started and led the product management team at Embarcadero Technologies, which was the #1 IPO in 2000. Robin is the author of three database performance books and frequent speaker at industry events. Robin holds BS, MA, and Ph.D. degrees from various universities.