Javascript Parsing and Ruby Hacking

Well, I finally got my primitive JavaScript parser hobbling along – it runs my small collection of time unit dsl test cases with operator precedence and associativity. This is good enough to post the work in progress, so I can feel like I accomplished something.

This version is based on Compilers: Principles, Techniques, and Tools (aka the dragon book). It covers the high end techniques for compilers expected to run for years on huge codebases. These techniques are about seven kinds of overkill for the time unit dsl, which I did in an evening by regex hacking. The book techniques took me a couple of weeks of scratching my head a lot.

But, you see, I have a lot of ideas for languages, both in Javascrpt – where a generic engine could be quite handy – and out of it where I simply want to understand the territory. I actually intend to run this together with my Silly JavaScript project, and try out a couple of different methods of JS parsing.

The big hold up was due to a rather big error on my part. Flipping back through the book, I found a reference to the problems with left-recursive grammars in top-down parsers. So, in an expression language, if “E -> E/E” is a possibe production, the parse will go “Well, an expression might start with E, so lets try that. Oh, look at that, it might be E, lets try that. Oh, look at that…” and so on. The resolution is to convert that two rules, “E -> /EX” and “X -> EX (or) (nothing)” This is greatly oversimplifying, but basically it makes sure that there is always something the left to match, in order to make sure that the parse is making progress and not going around in loops.

There’s just one little problem: the automaton parser I pulled out of later sections of the book isn’t top down, it’s bottom up, and it also builds it’s expressions from the right, not the left. Not only is left-recursion not an issue, it really quite messes things up by forcing it parse the entire input before it can reduce anything, and also generally obscures the relation between constructs when trying to implement new features. Unfortunately it wasn’t some much realizing my mistake, as “all my problems seem to come from this one operation, what if I just remove it” And then I read back to find out what law I broke, only to discover I was doing thing wrong in the first place.

Since I had this realization Saturday, I went back to work on my Ruby based board game domain language for a bit. A lot of it was getting back into the meta-programming tricks I’ve been using, which I haven’t really talked about yet, and probably should before I forget.

I started the project once and got some things mostly working, but things started to get messy when it got actually running the game. The notation was also still very… programmy. You have to expect that a bit when using an embedded language – a legal program in the host language, in contrast to an external language, wherein one enters the interpreter or compiler tarpit I was dipping my toes in above. However ruby offers a quite lax and varied syntax (or enough rope to hang yourself and the rest of the neighborhood, take your pick) and I decided the end result would be much better if I used more of it.

So, I started over, this time writing an acceptable semi-structured form first, which I am slowly going through and converting in a Ruby-acceptable syntax, writing the necessary back end code as I go. Making a text that read well in two language takes a few ugly/beautiful hacks, and, though this may be a bit premature, here is a survey of some of them.

method_missing

Ruby is object oriented, where object methods can be thought of as message sends. In fact, you can just use object.send if you have a string you’d like to use as a method name. If that string isn’t method, you (or rather, the object) gets a chance to do something creative with it before an not found exception is generated. Forward to another object, create a property, look it up in hash table, or whatever suits your fancy. Another fun trick is mixing a module I call Prototype into my classes, causing them to emit a message for missing methods, but not halt the program.

Class.new()

This is one of those things where one either says “Huh?”/”Oh my god”/”So?” I was most recently in middle category. In a sense I should have known this, having read The Art of the Metaobject Protocol, however I didn’t really get into it enough to fully understand what was going. It might have moved from the first to the second category, however. In my case, I wanted to create a class for each resource type in the game, which might then be instanced for each player. Class.new() is the object creation method of the class of classes; it returns a runtime instance of a class. You can hide this in an api, add properties and methods to the new class, and then use it just like normal.

instance_eval

One of the more popular and dangerous language techniques, this can run a string or block as if belongs to the object. This raises a lot of issues about scoping and especially instance variables, but it can enable some very nice notation when used carefully. I’m not sure if my case should be called careful: I’m not flinging them around willy-nilly, but I am using a trick recommended by my dear friend Google to run my entire domain language file in the context of the ‘game object’.

Posted Tuesday, October 21st, 2008 under Devlog.

Comments are closed.