Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Twitter will open-source Storm, BackType's "Hadoop of Real-Time Processing" (readwriteweb.com)
99 points by canistr on Aug 6, 2011 | hide | past | favorite | 14 comments


There's a lot more details about Storm on the actual announcement: http://engineering.twitter.com/2011/08/storm-is-coming-more-...

If you have any questions about Storm, feel free to ask me here or on the Google group (http://groups.google.com/group/storm-user).


I'm interested in hearing about how you guys upgrade topologies in production (assuming you do). It's designed to be able to run forever, but obviously once in awhile you find a bug, want to track a new stat, etc. I guess if you're pulling data off of a queue, you might be able to get away with letting things queue up for a few seconds as everything restarts with the new code and then catching up. Is that how you handle it, or do you do something more clever?


Currently you let things queue up while you redeploy, but I'm working on a new feature that lets you "swap" two topologies. The new one is deployed in an inactive state, and then the two topologies are swapped. This lets you minimize the downtime to almost nothing.


The Twitter Engineering Blog post is way more interesting http://engineering.twitter.com/2011/08/storm-is-coming-more-...

Despite having a "master" node, it sounds like this actually has no single points of failure. Since all the state for the master is in ZooKeeper, I think you could just fail over to another server running the master if your first main one gets messed up. Pretty cool. (I may be totally wrong here ... All my distributed systems knowledge has come from being around people who know about distributed systems.)

Though one thing that's not especially satisfying is that if his answer for when a Bolt needs to store state is "use a database". I guess the Hadoop answer is that if your reducer fails, it just runs it again, and there's no real analog to doing that when your bolts are meant to run infinitely


That's correct, the design will make it easy to cluster the master node later on.


I'll believe when I see it. Not that twitter never open some of it's code (http://twitter.com/about/opensource) but ... in the past they've said they'll open some tools and end up not doing it (I'm thinking about Crane here).

It's nice to even think about opensourcing things and even better when they though :-)


(I am one of the authors of Crane) The thing that happened with Crane is that it grew a lot of Twitter-specific cruft and separating that from the generally-useful bits is a serious undertaking -- and I'm not sure how much useful stuff would be left once we cut the twittery bits out. Now and then I chat with the Sqoop folks about how we can merge the two but that path isn't clear either.. so it's in a bit of a limbo, as far as being open-sourced. Sorry about that.


Not a problem you have to be sorry about, it happens. But the thing here is that announces like that implies expectation. So, IMHO, you shouldn't announce to much time ahead and announce when you know for sure there will be an actual release. Otherwise grumpy sceptics like me criticize ;-)


This is an almighty big deal. I missed it (everyone was talking about distributed computing), but what Hadoop's done is make conventional BI providers look increasingly unimpressive. Systems like Storm are going to do the same - as the Twitter Engineering blog post points out - to CEP.


Frankly, I'm getting tired of the hype. Just release it already. It has been promised for a long time now, with no code in sight. Now new owners have re-promised it and it is a hackernewsworthy item all of a sudden?

Don't get me wrong - the description sounds great and I'll be one of the first users. But this is not techcrunch.


Perhaps this will help quell some of the skepticism about them actually releasing it–the abstract for the talk in which Nathan will be releasing Storm as open-source: https://thestrangeloop.com/sessions/storm-twitters-scalable-...


Twitter acquired a good team with BackType. Too bad for BackType's customers though. Not the end of the world, but yet another example of the perils (in addition to benefits) of using other company's web services.


I think that goes further than 'a companys web services'.

Every business should be looking at their weaknesses. If you are using someone else's service at the core of your business, that could go away at any time- you should have a contingency plan. The same for if you source something for manufacturing from another country, or have your factories in another country. What if that country becomes unstable? What will the impact on your business be?

There is nothing wrong with building your business on something that might disappear, you just need to understand that.


If this is true, I'm glad that Twitter is choosing to honor a promise that Backtype made pre-acquisition.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: