Big Data, Einstein and the Definition of Madness
July 10, 2017 No CommentsBy Partha Sen, Founder and CEO of Fuzzy Logix
Someone far more intelligent than me apparently once said ‘Insanity is defined as doing something over and over again and expecting different results’. Now, while there’s some debate about whether Einstein did or did not give us this well-worn aphorism, there’s little doubt that whoever did come up with it was NOT talking about Big Data! And yet, I think it’s an entirely appropriate observation to make about the state of the Big Data industry right now.
Whatever the nature of your business’s analytics environment, I am certain that you are being asked to live up to that old adage of delivering more with less; more insights, more efficiency and more simplicity, all while reducing your cost of investment. And to exacerbate matters, the growth of data that you have to manage is exploding. According to Gartner & IDC, the volume of data is currently at 2 trillion GB and doubling each year; stated to reach 40 trillion GB by 2020.
In a recent study by Forbes Insights and Dun & Bradstreet, 24 percent of the respondents cited data quality and accuracy as a major obstacle to the success of their analytics efforts. Only 42 percent of users and analysts reported they were confident in the quality of their data. Survey respondents also cited a lack of budget and issues with technology as top roadblocks in achieving success with their data strategies.
Now, if your analytics strategy is based on traditional approaches like SAS, you may have to significantly increase your investment each year to perform the same analytics on double the volume of data. It can be expensive and time-consuming to manage duplicate storage and network infrastructure for moving data, as well as the need to increase processing power. Even with all of that, the analytics may only be performed on subsets of data, rather than in its entirety, and the time-to-insights will likely not live up to your business requirements.
These traditional approaches to analytics force you to move data from where it is stored to separate analytics servers because the data is in one place, and models run in a different place, and then feed the results back into the database. This results in huge pain points, including:
– Expensive hardware
– Slow insight delivery as two thirds of the time is often spent moving the data
– Sub-par model analysis – due to memory constraints of the analytics servers, models must be built with only what fits into memory, rather than the entire dataset
– Outdated analysis – in several industry verticals, the underlying database might change rapidly, versus the snapshot moved into memory on the analytic servers
And that leads to another problem. The old ‘garbage in, garbage out’ adage is true. According to the Forbes Insights and Dunn & Bradstreet report, the negative impacts of poor quality data can include undermining confidence, missed opportunities, lost revenue and even reputational damage. The better the data quality, the more confidence users will have in the outputs they produce, lowering the risk in the outcomes and increasing efficiency. When outputs are reliable, guesswork and risk in decision making can be mitigated.
But here’s the rub, rather than stepping back and asking, how do we break this cycle, many organizations are like the proverbial hamster on the wheel; just trying to run faster and faster. And, like the hamster, not getting anywhere. Fast. But rather than simply doing the thing they have always done and hoping for a different outcome (because, after all, we do know that is the definition of madness), they need to stop and take a different tack. They need to start with the objective of achieving scalability at speed for their analytics at a lower cost and consequently, look for a different approach.
We turned the problem on its head at Fuzzy Logix with our In-database analytics approach. We move the analytics to data, as opposed to moving data to analytics, and eliminate the need for separate analytics servers. We leverage the full parallelism of today’s massively parallel processing databases. With DB Lytix, data scientists are able to build models using very large amounts of data and many variables. No more sampling, no more waiting for data to move from some other place to your analytics server. These models may run 10X to 100X faster than traditional analytics – bringing you expedited insights at a fraction of the cost.
Let me give you a real-world example; a large US health insurer, moving the data out of the database to the SAS servers meant breaking it into 25 jobs and assembling the results – a process that took over 6 weeks! Using in-database analytics allowed the customer to work on the entire dataset at once, and finish the analytics in less than 10 minutes.
So, if you want to significantly accelerate your data analytics capabilities and see your business achieve phenomenal performance jumps, with potentially huge cost and resource savings, take heed of the message from old Albert. And stop doing whatever you’ve always done. Take a different approach. You never know, you may just return a bit of sanity and accuracy to your big data strategy!
About the Author
Partha Sen is Founder and Chief Executive Officer at Fuzzy Logix. A passion for solving complex business problems using quantitative methods, data mining and pattern recognition began as a hobby before leading Partha Sen to found Fuzzy Logix and develop its flagship product, DB Lytix, in 2007.
Before founding Fuzzy Logix, Partha held senior management positions at Bank of America where his achievements included leading the initiative to build a quantitative model driven credit rating methodology for the entire commercial loan portfolio. In the portfolio strategies group, Partha led a team to devise various strategies for effectively hedging the credit risk for the bank’s commercial loan portfolio and for minimizing the impact of mark-to-market volatility of the portfolio of hedging instruments (Credit Default Swaps, Credit Default Swaptions, and CDS Indexes).
Prior to working at Bank of America, Partha held managerial positions at Ernst and Young and Tata Consultancy Services. He has a Bachelor of Engineering, with a major in computer science and a minor in mathematics from the Indian Institute of Technology. He also has an MBA from Wake Forest University.