Anticipating the Unanticipated – the Next Killer App

August 28, 2012 No Comments

Featured blog by Don Haderl, Aerospike

At the dawn of database technology in the late 1960s through the 1970s hierarchical and network databases emerged. Hardware was really slow and really expensive. To get databases to perform for an application, one denormalized the data and made sure that all of the data you wanted for the application was physically adjacent on the same disk drive. In the end, the database design worked for one and only one specific application. When you had a new application with a slightly different twist on the information it needed, the previous physical design didn’t always perform well. So you either suffered the performance for the new application or you created another replica database with a different physical design to meet its performance needs. This was the age of Cullinet’s IDMS database, IBM’s IMS database and others. The killer application was manufacturing – inventorying parts and assemblies of parts. Banks and other enterprises began using this technology to serve transactions.

Out of this experience came the desire from enterprise management for technology which would allow many applications to share the same database and get reasonable performance. “Data Independence” was born, separating the logical schema from the physical schema, a cornerstone of relational technology. And relational research prototypes (IBM’s System/R, UC Berkeley’s Ingres, and others) introduced the relational model, focusing on making the relational queries respond well given this separation of logical and physical schema (namely, optimization and different methods for accessing and clustering the data). Whereas the initial implementations of relational databases couldn’t match the hierarchical and network databases for a well honed application, they did reasonably well for new, unanticipated applications without the need for another replica database and providing a speedier application development cycle for the business, and, as such, better able to respond to business demands.

By the mid 1980s relational technology was mature enough and the hardware was powerful enough to satisfy the needs of a majority of the applications in enterprises. And these database management systems have served us well even now.

The initial killer application for relational databases was query and report writing. At this time, business users rarely accessed data directly – the tools were too difficult for anyone but technical staff (IT professionals) to use. Instead, users submitted requirements to IT, and programmers manually coded the logic necessary to obtain the desired data. Such logic was typically written in RPG, COBOL, or a similar programming language, although 4GL tools (such as Ramis, Nomad, and Focus) were sometimes used. Generating new reports was a large part of the application demand facing IT leaders. Relational databases responded very nicely to this need.

As time progressed into the late 1980s and early 1990s, business use of computer technology became pervasive, driving most of the business transactions in large enterprises. Every segment (banks, insurance companies, governments, …) became dependent on computer technology to drive the basic transaction services within their domain. High availability (no down time) became a fundamental requirement for database systems. SLAs (Service Level Agreements) became pervasive. Performance for transactions (transaction rates) and complexity (the number of database operations per transaction) rose exponentially. Relational databases were the backbone of enterprise transaction systems.

In the 1990s, the internet pushed the use of relational technology further, pushing transaction rates higher and driving the need for even faster deployment of new applications into production (pushing the technology to respond to the unanticipated applications). Client/server topologies rose up and within the decade we had many variants of distributed topologies, increasing the demand on relational databases to support many varied distributed (codepages, …) needs. Cobbling together many computers to satisfy a single application within the enterprise became the norm and relational databases responded with shared-nothing and shared-everything architectures, together with inter and intra query parallelism to get lots of computers and disks working to respond to a single query over large data. We saw a shift within enterprises from a “build our own applications” to “buy our applications.” SAP, Peoplesoft, and thousands of others provided the basic application needs of most enterprises. These often came with embedded databases. And enterprises began to shift their efforts to integrating applications and databases within the enterprise. A business transaction (e.g., insurance claim) is really a business process which drives many different computer transactions (and non-transaction events) to fulfill the need of that business process. Business Process Modelling (BPM) and Workflow technology responded to this need.

As we entered 2000, the simple ACID transaction of the 1970s was now merely an event within a distributed business process. Applications integrated services and data distributed within and outside enterprise boundaries. Business partners are linked together in satisfying a common business transaction. This pushes heavily on security, privacy, performance, isolation, concurrency, and availability.

Commercial relational databases responded and did well. But they couldn’t handle many of the needs of business systems. It’s not the fault of the relational model. It’s simply the result of too much for one database management system to do, given the decades of adaptations described above. Or as Michael Stonebreaker says, “One Size does not Fit All.” Database management systems that attempt to do this suffer from bloat – everything takes longer, more complexity is added, and with that complexity we get more breakage. To overcome this vendors create a family of database management systems with some hope of sharing technology and maintaining compatible interfaces and application views across the family. I have not seen a single family that could be tailored to Fit All application system needs. Hence, I see a need to address this with multiple families which may or may not conform to the relational model. We’ve recently seen major relational database vendors (IBM, Oracle, …) offer alternatives.

That’s the period we’re in right now and have been in for the past ten years, driven by web-based ventures. The Googles and Amazons of the world had this need for extremely flexible physical designs. They didn’t know if it was going to be 1 terabyte or a 200 exabyte database that they were dealing with. They didn’t know the number of transactions they would have to handle. They had to have 99.9999% service availabiliity. And they had to scale the databases and transaction rates from nothing to near-infinite in the blink of an eye with no blockages. They didn’t know which services they offered would be successful or the degree to which they’d be used. But they knew that if the service was successful, they needed to scale it instantly to meet demand. And the commercial versions of relational databases couldn’t handle this elasticity. They tried to build structures around the relational databases, gluing them together to form one big federated store – but this proved quite complex. So many built there own to provide their basic business transaction services, introducing softer isolation, eventual consistency, and other innovations to meet the extreme demands for availability and elasticity.

Relational databases grew in response to unanticipated needs. Now there’s a host of new needs they haven’t responded to. And that, in fact, is what these new databases, call them NOSQL (Not Only SQL) if you wish, respond to – highly elastic in scaling database sizes and transaction rates offering extreme flexibility in handling real time transactions with the need for super fast data acquisition and having the analytics done outside of the database to keep them lightweight (bloat free). The killer app is real-time information serving – serving information about what’s happening NOW. Businesses want to understand what consumers are doing and provide them services in the context of where they are and what they are doing NOW. They want to monitor sensor-based pipelines and railroads, the stock market and medical care and “fill in the blank” to make decisions based on what’s happening NOW. Business needs historical information and NOW information, analyzed in real-time to make decisions. And that’s the market Aerospike is responding to and Aerospike can do it a hundred times better than the next thing that’s around…