Twitter and Open Source: BackType Storm
August 5, 2011 No CommentsTwitter acquired BackType last month. On the Twitter blog post, they have announced that they will be releasing Storm at the Strange Loop Conference on September 19th.
BackType Storm is a real-time data processing technology. According to Twitter’s Nathan Marz Storm “is a distributed, reliable, and fault-tolerant stream processing system. Its use cases that are so broad that we consider it to be a fundamental new primitive for data processing.”
Here are the key properties of Storm:
- Stream processing: Storm can be used to process a stream of new data and update databases in realtime. Unlike the standard approach of doing stream processing with a network of queues and workers, Storm is fault-tolerant and scalable.
- Continuous computation: Storm can do a continuous query and stream the results to clients in realtime. An example is streaming trending topics on Twitter into browsers. The browsers will have a realtime view on what the trending topics are as they happen.
- Distributed RPC: Storm can be used to parallelize an intense query on the fly. The idea is that your Storm topology is a distributed function that waits for invocation messages. When it receives an invocation, it computes the query and sends back the results. Examples of Distributed RPC are parallelizing search queries or doing set operations on large numbers of large sets.
The blog post on twitter give more details about Strom.