If you follow my blog, you're probably all too aware that Kynetx provides a cloud-based execution engine for a programming language called the Kynetx Rule Language, or KRL. The execution engine is a big, programmable event-loop in the cloud and KRL is the language that is used to program it. As with any language, the first step in executing it is to parse it into an abstract syntax tree.
The KRL parser has had it's ups and downs. A while ago, I reported on changing out the HOP parser for Damian Conway's RecDescent (note that the system is written in Perl). The KRL parser is also what promted me, a long time ago, to write a large set of tests for the engine code, including the parser.
When you're building a parser for a language to be compiled on the developer's machine, performance is an issue to be sure, but you are saved from problems you encounter when the parser is the first step in a service that needs to run quick. The most important of these is to make sure you cache the abstract syntax tree so that you only have to parse things when they change.
Whenever there's a cache miss the KRL parsing process basically works like this:
- Retrieve the KRL source for the ruleset from the rule repository.
- Parse the ruleset
- Optimize the ruleset
- Compile event expressions into state machines
- Cache the result
The optimization does things like perform dependency analysis on variables to move them outside loops and so on.
The caching presents it's own problems. Flushing a single ruleset from the cache when it changes is a simple enough procedure. But when the abstract syntax tree (AST) changes, they all need to be flushed. My preference is to do this in an emergent way so that we're not dependent on a step in the code deployment process that might get forgotten or messed up. So, when a ruleset is optimized, an AST version number is placed in the AST. When the engine retrieves a ruleset from the cache, it checks the AST version number against the current number and if it's old, automatically flushes the ruleset and proceeds with the process given above to recreate the AST.
So far, so good. When I make a change to the parser that would change the format of the AST, I update the version number and when the code is deployed, the rulesets will automatically flush and get regenerated. This is, unfortunately, where my intuition failed me.
You can imagine that the engine is a number of individual servers behind a load balancer. When multiple requests come in they are given to various server to process. Like most people, I've been conditioned by the physical world to expect guassian distributions around an average, but intuition developed around guassian distributions often leads to poor decisions in the world of computers where distributions are much more likely to follow a power curve.
Here's what happened the last time I made a change to the AST. We deployed code and immediately the system stopped responding to requests. On further examination, we discovered that the machines we all busy parsing--the same ruleset. Because there are some rulesets that are much more likely to be executed than others, the first requests seen by the new code were, for the most part, all for the same ruleset and every machine was busy parsing. Worse, the requests started to stack up cause the machines to thrash. Everything stopped working.
The answer to this problem has two parts:
- The parser needs to be faster--especially for large rulesets. In the short term we'll do some point imporvements to the existing Perl-based parser. In the long term we'll replace it with a C-based parser.
- The rule retieval process detailed above needs to use semaphores so that only one parsing task for any given ruleset is launched at one time.
We've put the short term fixes and the semaphore in place and things are looking pretty good so far. We're partly down this road already and the fact that we have good specifications of what the parser has to do and tests to check it out thoroughly, I'm not worried about the change.
I used the cache as an interprocess commmunication system to create the semaphores. It's not strickly speaking atomic, but it's close enough and was dirt simple to implement.