On Configuration

I've been musing recently on how to scale our configuration system up and out: making it work more dynamically for single nodes, and handle multiple nodes...well, at all.

How it works now

Right now, our configuration is spread across the following places, roughly in order of precedence:
  • System properties (-D properties set on the command line)
  • a user-editable property file, for overrides to default properties.  This is preserved during system upgrades.
  • a "system defaults" property file, user-visible but overwritten during upgrades.
  • a database table:  for a while, we were standardizing on properties in the database.  More on this later...
  • some property files buried inside jars, which hold some database queries, some default values, and so on.
  • log4j.xml
We use a MergedPropertyPlaceholderConfigurer, along with a couple other configuration classes available on github) that merges properties from most of the above locations together, ranks them in order of precedence, and then sets them at startup time using Spring's standard placeholder syntax (${property.name}). Database properties are loaded from a table using a special Spring property loader. So any property can be set in a higher-precedence location and it will override one set in a lower-precedence location. In practice, new properties tend to get set in the property file. Why? Because database changes require a patch (like a migration in the Rails world) which needs to get migrated to each applicable environment.  Deploying the code to a development or test server then requires both a code update and a database update. In practice, the dependency between the code and particular database patches is a bit of a hassle -- certainly far more so than just adding it to a property file which gets deployed along with the code.  A bad motivation for keeping properties in files?  Perhaps... but it is the reality of it. A system that raises the barrier of entry for doing "the right thing" is a bad system.  Which brings us to...

Problems with current system

  1. property files are cumbersome in a distributed environment.  Many of our deployments are single-node, but more and more they're distributed, and distributed deployments should be our default going forward.
  2. For properties stored in the DB, adding, removing or updating any property requires a DB task, and then a DB refresh, which has the effect of discouraging parameterizing things. You tend to think, "eh, well... I'll just hard-code this for now..."
  3. Properties are loaded at startup time only – you can't change property values without changing each property file and then restarting each node.

Requirements for a new system:

I'd like to borrow requirements from http://jim-mcbeath.blogspot.com/2010/01/reload-that-config-file.html:
  1. Reloading a configuration should be a simple operation for the operator to trigger.
  2. It should not be possible to load an invalid configuration. If the operator tries to do so, the application should continue running with the old configuration.
  3. When reloading a configuration, the application should smoothly switch from the old configuration to the new configuration, ensuring that it is always operating with a consistent configuration. More precisely, an operational sequence that requires a consistent set of configuration parameters for the entire sequence should complete its sequence with the same set of configuration parameters as were active when the sequence started. – For us, this is actually pretty easy.  Our app depends on a task distribution framework, meaning that work is defined as a series of tasks with defined beginnings and endings.  So, we merely need to load the configuration at the beginning of each discrete unit of work.
  4. The application should provide feedback so that the operator knows what the application is doing. Logging, notification or statistics about configuration reloads should be available.

    ...and I'd add:
  5. We should be able to set configurations for all nodes at once (this could mean using the database, or perhaps a command-line tool that sprays configurations out to the various nodes, plus a web service to tell nodes to reload..or something else entirely).
  6. We should be able to view the current configuration for each node easily.
  7. We should be able to share configuration between our app and other related applications, again, this could be database, or a web service that exposes our properties to other applications.

Current thoughts

At the code level, I'm thinking of loading properties at the beginning of each task, using a base class or something built into the framework. Reloading and interrogating the configuration could be via a web service (get_configuration / set_configuration). For requirement 3, the easiest option seems to be to use Configgy as a configuration base. As far as centralized configuration goes, I'm up in the air. Some options:
  • Spraying config files (scp'ing configuration files to each server, which would have to be tied to either an automatic poll of files, or a manual "reload_configuration" web service call)
  • distributing configuration using a web service (node 2 calls get_all_configuration on node 1, and sets its own configuration accordingly) – but it would need to be saved somewhere in case node 2 restarts when node 1 isn't available. The database is an option, but has development-time issues as noted above.
  • saving all configuration in Zookeeper.
What i'd really like, though, is a configuration system that kept properties in an immutable data structure that kept track of where properties came from -- so, I could define the locations properties should come from, and then in the application I could say, "config.getProperty('foo')" and get the value with the highest precedence (whether that's from an override file, a database table, or whatever). But I could also say "config.getPropertyDetails('foo') " and get a list that said "property 'foo' is set to 'bar' by local override, is set to 'groo' by the central configuration server, and the 'moo' as a fallback default." Now, why do I want that? Mainly for on-site debugging: "I set the property in this property file, but it's not working!"

Some related (external) links:

I'm open to ideas, as well... Anyone have best-practices for distributed configuration?