On Configuration

Dec 10th, 2010

I’ve been musing recently on how to scale our configuration system up and out: making it work more dynamically for single nodes, and handle multiple nodes…well, at all.

How it works now

Right now, our configuration is spread across the following places, roughly in order of precedence:

System properties (-D properties set on the command line)
a user-editable property file, for overrides to default properties. This is preserved during system upgrades.
a “system defaults” property file, user-visible but overwritten during upgrades.
a database table: for a while, we were standardizing on properties in the database. More on this later…
some property files buried inside jars, which hold some database queries, some default values, and so on.
log4j.xml

We use a MergedPropertyPlaceholderConfigurer, along with a couple other configuration classes available on github) that merges properties from most of the above locations together, ranks them in order of precedence, and then sets them at startup time using Spring’s standard placeholder syntax (${property.name}). Database properties are loaded from a table using a special Spring property loader. So any property can be set in a higher-precedence location and it will override one set in a lower-precedence location. In practice, new properties tend to get set in the property file. Why? Because database changes require a patch (like a migration in the Rails world) which needs to get migrated to each applicable environment. Deploying the code to a development or test server then requires both a code update and a database update. In practice, the dependency between the code and particular database patches is a bit of a hassle – certainly far more so than just adding it to a property file which gets deployed along with the code. A bad motivation for keeping properties in files? Perhaps… but it is the reality of it. A system that raises the barrier of entry for doing “the right thing” is a bad system. Which brings us to…

Problems with current system

property files are cumbersome in a distributed environment. Many of our deployments are single-node, but more and more they’re distributed, and distributed deployments should be our default going forward.
For properties stored in the DB, adding, removing or updating any property requires a DB task, and then a DB refresh, which has the effect of discouraging parameterizing things. You tend to think, “eh, well… I’ll just hard-code this for now…”
Properties are loaded at startup time only – you can’t change property values without changing each property file and then restarting each node.

Requirements for a new system:

I’d like to borrow requirements from http://jim-mcbeath.blogspot.com/2010/01/reload-that-config-file.html:

Reloading a configuration should be a simple operation for the operator to trigger.
It should not be possible to load an invalid configuration. If the operator tries to do so, the application should continue running with the old configuration.
When reloading a configuration, the application should smoothly switch from the old configuration to the new configuration, ensuring that it is always operating with a consistent configuration. More precisely, an operational sequence that requires a consistent set of configuration parameters for the entire sequence should complete its sequence with the same set of configuration parameters as were active when the sequence started. – For us, this is actually pretty easy. Our app depends on a task distribution framework, meaning that work is defined as a series of tasks with defined beginnings and endings. So, we merely need to load the configuration at the beginning of each discrete unit of work.
The application should provide feedback so that the operator knows what the application is doing. Logging, notification or statistics about configuration reloads should be available.

…and I’d add:
We should be able to set configurations for all nodes at once (this could mean using the database, or perhaps a command-line tool that sprays configurations out to the various nodes, plus a web service to tell nodes to reload..or something else entirely).
We should be able to view the current configuration for each node easily.
We should be able to share configuration between our app and other related applications, again, this could be database, or a web service that exposes our properties to other applications.

Current thoughts

At the code level, I’m thinking of loading properties at the beginning of each task, using a base class or something built into the framework. Reloading and interrogating the configuration could be via a web service (get_configuration / set_configuration). For requirement 3, the easiest option seems to be to use Configgy as a configuration base. As far as centralized configuration goes, I’m up in the air. Some options:

Spraying config files (scp’ing configuration files to each server, which would have to be tied to either an automatic poll of files, or a manual “reload_configuration” web service call)
distributing configuration using a web service (node 2 calls get_all_configuration on node 1, and sets its own configuration accordingly) – but it would need to be saved somewhere in case node 2 restarts when node 1 isn’t available. The database is an option, but has development-time issues as noted above.
saving all configuration in Zookeeper.

What i’d really like, though, is a configuration system that kept properties in an immutable data structure that kept track of where properties came from – so, I could define the locations properties should come from, and then in the application I could say, “config.getProperty(‘foo’)” and get the value with the highest precedence (whether that’s from an override file, a database table, or whatever). But I could also say “config.getPropertyDetails(‘foo’) ” and get a list that said “property ‘foo’ is set to ‘bar’ by local override, is set to ‘groo’ by the central configuration server, and the ‘moo’ as a fallback default.” Now, why do I want that? Mainly for on-site debugging: “I set the property in this property file, but it’s not working!”

Some related (external) links:

http://soupinadeli.com/resilient-software-configuration/ - a good article about how a configuration system should work, and why.
http://jim-mcbeath.blogspot.com/2010/01/reload-that-config-file.html - another good article, cited above.
http://stackoverflow.com/questions/1244455/where-how-to-store-distributed-configuration-data - an interesting solution for all-database configuration, with a “central” table and a “local” table (for “defaults” and “overrides”, like what I’m doing currently for property files). The answerer takes it a couple steps farther, with an interesting SQL analytics query to pull it all in at once.
Apache Commons Configuration comes close to the model I described above, with their Composite Configuration class. I looked at Commons Configuration a year or so ago, and thought it was interesting but not quite what I was looking for (and their Hierarchical Configuration concepts can get pretty hairy). But I’m intrigued by the CompositeConfiguration class, so I need to look into it again. Of course, the project is all but dead – last updated 2008 – but how often does a configuration library really need to change?
Interesting patch to Commons Configuration for Groovy interpolation (i.e. put Groovy in property values, to be evaluated at load-time)
…more to come

I’m open to ideas, as well… Anyone have best-practices for distributed configuration?

Oh, Oracle

Mar 5th, 2010

So, I was responsible for a pretty unfortunate bug today — no way around it, I messed up. It was classic — there was a “TODO” block where I meant to come back and finish some code, and no doubt got distracted by some very valid crisis.

Fortunately, it was caught before it affected production data, but it was in test, and visible, and it was scary that it had gotten that far.

But I couldn’t help but be bitter about how that block of code came to be in the first place: the code that contained the bug was part of an elaborate scheme designed to work around joining to a particularly large table in certain circumstances.

Now, it’s big data (well, at least tens of millions of rows) …we have to do what we can for efficiency. But it made me slightly bitter that the more we optimize for relational databases, trying to eke more and more performance, the farther we move from a clear data model, and the less we’re using the “R” in RDBMS. We go through contortions, and in the process introduce bugs.

I’m not sure who that might be a lesson for, but perhaps if anyone is dead-set on using a tried-and-true RDBMS to avoid bugs in newer systems, whether MapReduce or a “NoSQL” store, it’s worth noting that there is some tradeoff here: bugs in the data store in question vs. bugs you introduce into your own code due to the increased complexity as you contort your data model to make it scale where you need it to go.

On the Cusp of Big Data

Dec 19th, 2009

Alternate title: Episode IV: A New Hope

I’m restarting this blog after a long hiatus – a couple of years, at least. It looks like my old posts were purged in the meantime, but that’s probably for the best¹. My musings about Hibernate from 2007 are probably not that interesting now².

Where I’m coming from:

I work on a team that, in a lot of ways, is on the cusp of Big Data: we deal with gigabytes, but not terabytes of data, and we don’t have endless racks of commodity servers. We have a homegrown task framework that follows a typical master-worker pattern and allows for tasks to be distributed among nodes on different servers. That works well, and I like the framework in general – it could use some cleaning up, but it’s simple, clean and functional.

Data, though, is all stored inside an Oracle database, and we’re knocking at the edge of it’s capabilities. We haven’t entirely maxed it out yet, but each performance gain has been harder to come by, and we can easily see the time approaching where it will be cheaper to rearchitect how we’re storing and serving data rather than eke more performance by smarter partitioning, better queries, or a faster SAN.

So over the past several months I’ve been reading about some of the competitors in the big-data field, and sketching ideas for what I’d like our architecture to look like going forward. Things like MapReduce (Hadoop), HBase, Cassandra, or Terracotta, or a number of other ideas – different types of products, all with the goal of scaling data beyond a single server. But unlike a lot of folks looking at these options, we have an existing product in production, based on a framework that does 80% of what we need. So I find myself on a seesaw, going between the newest, coolest thing I’ve read about, and then the pain of rewriting what we have on a non-existent timeline when what we have works so well – at today’s data volumes.

I decided to reinstate this blog to collect what I’ve learned so far.

¹ Does anyone else have the problem of starting blogs like new years’ resolutions and then losing track of them? I probably have three out there, tied to some forgotten username on goodness knows which host, and they’re probably saying very insightful things about Hibernate 2.0.

² Turns out, I did recover some old posts from a Typo blog from 2006. A couple of the most boring just didn’t make the move over, but most of them are here for morbid curiosity about what seemed interesting at the time.

SSL Curiosity

Sep 13th, 2006

Ok, perhaps this isn’t a curiosity to you all. Perhaps everyone knew this. But I didn’t, and I didn’t find it in a quick Google search, so I thought I’d put it out here.

I’m working on a project that uses 2-way (client-auth) SSL; in doing so, I’ve created a test root CA and used it to generate an array of test certs for clients and servers. I’ve done a bit of testing on my local PC, and it’s working well; the server uses client certificates for authentication and authorization of web service requests, and so on. Acegi is our friend.

However, another developer on the project tried accessing the site not long ago and told me it didn’t work. “I just get ‘Cannot find server’”, he said. Turns out that he was using Internet Explorer, and I had been using Firefox. I could replicate his results with IE – the browser would give the prompt saying “This site’s certificate is not trusted” or something to that extent, and then when you said “accept the certificate”, it would throw up a “Cannot find server” error. The same series of steps in Firefox would work without a problem.

After fiddling a bit, it turns out that adding the test CA to my Trusted Root Certificate Authorities list fixed the problem. Honestly, I’m not sure why this would be necessary, if Internet Explorer had already asked me whether to accept the site’s certificate, but my best guess is that IE was simply ignoring my answer and refusing to load the site whose certificate was signed with an unknown root CA.

Again, maybe this is a known bug, but hey, I thought I’d throw it out there. Adding the test CA to IE’s “trusted root certificates” list fixed the problem.

Another Library

Jun 16th, 2006

I’ve been teased that my current project has more external libraries than it does actual lines of code. Now, I’m not convinced that’s a bad thing – I’ve said before that the hardest thing about leaving Java for some other language would be giving up all of the 3rd-party libraries and projects that do 90% of your work for you.

Today’s is JAMon, a small library with a really simple purpose: keep track of performance statistics. I’ve written static classes or singletons that keep track of min, max and average times for various things on several projects; this just keeps me from having to write it again. Not saving a ton of time, but saving some. And here is a simple step-by-step for integrating it into Spring as an interceptor. There are several discussions out there for how to use JAMon with Spring, and whether Spring’s built-in JAMon interceptor (is there anything Spring doesn’t have built in?) should use logging semantics to activate it, but the one above is simply the XML to cut and paste into Appfuse’s Spring config, which is all I really want. I understand what it does, it’s just saving me the five minutes of thinking about it. Which is what I’m after. It’s just a beautiful feeling when you want something done, no matter how small, and find that it’s already been done for you. Like realizing you want another cup of coffee and finding a steaming mug already on your desk.

With that said, I’m going to go make some coffee now…

Perhaps I’m Missing Something…

May 19th, 2006

I’ve been playing with Axis2 for a couple of days. One of the features that they boast (and added to XFire’s SOAP Stack Comparison) is hot-deployable services.

Has anyone ever wanted to hot-deploy a web service?

Really now…

Granted, I’m sure someone could point out the features in some of my projects that no one really wanted. Or, perhaps, someone should point out why in fact this is a killer feature and I’ve just missed it. But it seems particularly weird to me…

For the Record…

May 16th, 2006

For the record, a custom STaX serializer and deserializer for a moderately small object graph will run you about 1000 lines of code.

Just in case you were curious.

update, 2006-08-23:

Correction: it’s now up to 2000. It also supports another (rather different) schema that is being mapped to the same domain object, which accounts for much of the increase.

Just thought you might be interested.

The Love-Hate of JiBX

May 1st, 2006

Ok, I love JiBX’s flexibility. I have two different versions of an existing schema that I’m mapping to the same Java object, which is the same Java object I’m then storing in Hibernate. This is the first time I’ve been able to use the same domain object throughout the application, and I love it. No more relying on framework-generated objects, and no more translation layers.

Framework-generated objects are bad for my schemas, which have a lot of nested anonymous complex types (hey, I didn’t write them) which frameworks tend to handle badly. XMLBeans, for instance, creates nested inner types, which gets very ugly. It may be personal preference, but that seems very messy; it means one very large object. Hibernate, I imagine, would also be unhappy with that setup.

So the alternative is a translation layer – a class that takes in a WebserviceFriendlyDomainObject and spits back a HibernateFriendlyDomainObject. This is a pain to write, but mostly just smells bad – it just shouldn’t be necessary. I’ve done it as a workaround so far, but it seems like there should be a way around it.

JiBX lets me use the same object from front to back. But that brings me to one of the hates – I have to write a JiBX binding file instead. It’s funny, because for whatever reason this passes under the “ugly architecture” radar, but it’s actually not much different than writing a translation layer. It’s just writing your translation layer in XML instead of in Java. Which, if anything, simply adds difficulty, because I find myself trying to do complex logic in the binding file that could really benefit from a Turing-complete language. Which is ironic, since I’m writing it to save me from writing the same thing in Java.

Now, it’s still a bit better than a translation layer. The typical translator is converting one object to another object to be turned into XML. The JiBX binding is going straight to XML. It’s certainly faster than the translation step. But from the coder’s point of view, the XMLBeans-generated domain object is one method call away from being XML, so it doesn’t feel substantially different.

My other complaint about JiBX so far is web service support – it’s pretty new, and support in Axis2 (another topic entirely) and XFire is either brand-new or still in CVS. I’m finding that my data binding layer is informing my choice in web service stacks, which I resent. I haven’t tried XFire’s JiBX support yet (since Codehaus’s SVN server is down) but they were planning on getting it into 1.1 and it didn’t make it, which isn’t a good sign.

The annoyances of bytecode manipulation are already documented elsewhere; I’ve found that every now and again I have to run an ant task that inserts the JiBX bindings into Eclipse’s class files, and that’s the extent of it. It irks some of people, but doesn’t bother me.

So I’m not sure if JiBX will stay around in this project. That’s in spite of substantial positives – at this point, it could easily fall either way.

Update, 2006-05-16:

It didn’t stay around. In the end, expressing logic (“if this element… else if this element…”) in XML was too much. I moved over to a hand-rolled STaX serializer and deserializer – see “for the record” for a quick peek into the main downside there.

Why I went this route instead of XMLBeans or JAXB merits more discussion. Briefly, JAXB didn’t meet the performance requirements for this part of the system, and XMLBeans refuses to fully parse the schemas in question. It tries, but something in the deeply nested anonymous complex types gives it fits. Writing code against those types as they’re generated in XMLBeans is also ugly, but then so is the custom parser.

All in all, I’m coming to terms with the translation layer. But it still feels like it oughtn’t be necessary…

Blog Archives Newer →