Sunday, April 3, 2011

Expression parsing and evaluation...

Something which has been annoying for me for a while was the lack of proper expression parsing in libdecl, my engine's declaration parsing library.

This library is responsible for parsing the sound and table/material declarations as defined by id Software, and making them available to the renderer (or other backend: sound engine, etc) in easy to read structures guaranteed to have sane and correct values. libdecl is the component which will complain about your syntax, the backend doesn't have to care about any of this. Really makes the code look a lot nicer. :-)

I was debating the best way to parse expressions, because they can either be very simple, or quite complicated. I realized that I could not rely on line terminators, because it's very common to see an expression span multiple lines with all kinds of tabs and white-space in the file... Of course the lexer ignores white-space, but keeps track of the line count.

After a bit of thinking about this, I decided to rip out the current shunting yard algorithm from the renderer, move it into libdecl, clean it up and rework it quite a bit. Now, it does the following:
  1. Get a token from the lexer,
  2. Check this token matches our expected rules,
    1. the token is an operator or operand, or
    2. the token is a table name. (Detected from 1st-pass.)
  3. If our checks pass, go ahead with the normal algorithm, otherwise,
  4. Push the token back to the lexer, and return the completed expression in Postfix notation.
Here's a hypothetical example that I wrote with more comments than necessary...

/* "material" keyword is optional when inside the material directory */
material textures/screenBlur
        sort postProcess

        /* first stage */
                if glslPrograms != 0 &&
                        !isMultiplayer  /* don't blur the screen in multi-player */
                map textures/blur       /* any image format supported */

        /* ... */

Obviously simply parsing the "if" expression until the end of the line would fail, furthermore, "if" expressions may be written as if ... or if (...), so matching on parentheses won't work either.

It turns out the easiest thing to do is first a pre-pass which only looks at tables (so that we know "foobarTable" is valid), then a secondary pass which looks at everything (sounds, tables, materials.)

In this example "glslPrograms" and "isMultiplayer" are built-in variables which the parser correctly detects as operands. Therefore we'll only stop parsing after reading the "map" token, realize it's not a valid operator or operand, push it back to the lexer, and return the expression (after some processing.)

This solution seems to be quite elegant, despite the 2-pass algorithm, and handles all the cases correctly.

The next step is to get rid of static stack allocation. Some expressions are very long, so the stack is setup for 64 elements, but this is ridiculous for "time * 0.001" (for example.)

1 comment:

  1. Actually, thinking about this, I should implement a basic expression optimizer. There is no need to keep the expression "glslPrograms != 0" in memory when it can be replaced by a static 0 or 1 at parse time.

    Of course this requires that the renderer (and other engine) subsystems free the declarations and re-parse them during a restart, but this is the current situation anyway. :-)

    I should also optimize away "foo == 1.0", "foo == 0.0", and "foo != 1.0", "foo == 0.0" into "!foo" and "foo" as appropriate.