Cluster Management and Task Distribution, Part 3

Summary

This is a followup to my Zookeeper vs. Jgroups post, and the Hazelcast addendum.

I ended up needed caching before I needed cluster management, so after evaluating a few options I implemented a caching facade that could switch between Hazelcast and Infinispan (which is based on Jgroups). I’ve been really impressed with Hazelcast so far – no major problems. We did hit a bug with 1.9.2, but found that a patch release with a fix was already out.

It also has a convenient cluster service built in, so for our cluster management needs, it’s likely to win for us, provided we don’t run across any more problems in testing. The interfaces are easy and intuitive, and I don’t really have anything to complain about – except that there aren’t many heavy hitters that are openly using it.

Conclusion

For our particular set of requirements, Hazelcast appears to be the winner:

  • it provides several things we need (distributed data structures, cache, cluster management)
  • it’s easy to use
  • it’s easy to hide behind a set of facades so that we can switch it out if necessary
  • it’s very simple to configure for different client scenarios (it works on EC2, for example)
  • it’s worked well in testing
But if our requirements were different, I think Zookeeper would have been my choice. Specifically:
  • If I were coordinating more than a handful of servers – say, 50 or more – I’d go with Zookeeper. It was specifically made for coordinating large numbers of servers reliably. And it’s now the de-facto standard for that purpose: Facebook, Twitter, LinkedIn, and a lot of others are all using it for node management across large numbers of nodes. At that scale I don’t want to roll my own if I don’t have to.
  • If I owned installation and maintenance. The deployment complications of Zookeeper are one of the main downsides for our purpose; it’s another server to install and manage, which raises the operational footprint of our app. If we were running the app only within our own datacenter, no big deal – there are Puppet or Chef recipes to install it, and even pre-built EC2 images. That’s a very manageable hurdle. But we’re handing the app over to clients to install and manage, and Zookeeper adds more hardware requirements and an installation process that is good for at least a few more pages in the install guide. Hazelcast, on the other hand, adds just one line to a property file (cluster.nodes=x,y,z). Be kind to your ops teams.
There are probably also cases where JGroups is the right answer. It’s main advantage is its configurability – you can put together a nearly infinite combination of protocols stacks to suit various purposes. Check out the comments of my first post, for example, for sample code to handle node discovery in a non-multicast (e.g. EC2) environment. But if all you need is cluster management and some pretty typical services on top of that, there are simpler options. I’m going to be honest – after looking into Zookeeper closely, I’m looking for excuses to use it. But I don’t think that will happen right away.

Comments