Librarian of Alexandria

2011-03-12

Matzo Prototype

I've had an idea kicking around in the back of my head for a while, and I finally got around to making a working prototype (which I will probably end up deciding is the final version, at some point.) It has to do with generating random words based on grammar rules, with an emphasis on making certain things more likely and generating 'average' words. It came out of early D&D playing, where I wanted to have a consistent set of phonologies for every fake language that D&D had.

It is called Matzo, because I asked a friend for a random name and that's what she said. She also suggested, among other things, Yoko, but I decided against that.

Here's the basic idea: the following is verbatim a Matzo source file, which I have saved as aquan.mtz for testing:

word := syllable . syllable . (6 @ (syllable));
use word;
syllable := 2: vowel
          |    vowel . "'"
          | 4: consonant . vowel
          |    consonant . vowel . "'";
consonant ::= p t k h wh l m n ng r w;
vowel ::= a i u e o;

These statements can be in any order, and whitespace isn't significant except as a separator for certain tokens, so you can format/arrange the lines however you want. There are three kinds of statements:

  • use statements tell you which rule is going to start. If you omit one, it won't run. If you have two, it picks the first one.
  • Normal assignments (represented by :=) take an expression that contains a mixture of disjunction (a | b), concatenation (a . b), weighting (5: b) and repetition (5@(b)) and which can be parenthesized. Expressions also include both literals, which are surrounded by quotation marks, and identifiers, which refer to other rules.
  • Literal assignments (represented by ::=) which differ in that they are assumed to have a space-separated list of literals, possibly with weighting. This was for the common case where you want one rule to contain a simple disjunction of literals, such as consonant and vowel above.

A lot of these things are shorthand for other things—really, all you need is concatenation, disjunction, and literals, and you can do most everything. The syntax 5: foo is shorthand for foo foo foo foo foo, so it is used to make an option more prevalent in a disjunction. For example,

vowel ::= i u;

chooses i and u about as often, whereas

vowel ::= 9:i u;

will choose i nine times out of ten.

The syntax 5@(foo) is another shorthand, somewhat less useful. The statement

word := 3@(syllable);

is equivalent to

word := syllable | syllable . syllable | syllable . syllable . syllable;

and is used in many various circumstances in my grammars, although it's less useful in other circumstances.

There are still problems with my implementation, which I am going to fix and put up on Github, but the grammar shown above (and others) work correctly. This is an idea I've had for ages; I can't believe it's taken me so long to sit down and write it, especially as it took no time at all. (I did the whole thing while an autograder was running for something I was grading.)

Here's an example of running the above file:

[getty@arjuna matzo]$ ./matzo aquan.mtz
nu'melamio'o
[getty@arjuna matzo]$ ./matzo aquan.mtz
hopuwho
[getty@arjuna matzo]$ ./matzo aquan.mtz
iloa
[getty@arjuna matzo]$ ./matzo aquan.mtz
nenopungau

It's not necessarily limited to random words, but it's lacking a lot of utility that would make it sufficient for other purposes, which I will eventually add. Still, as an example of what it could also be used for:

gender ::= man woman;
hair-color ::= black brown red blonde pink;
build ::= fat fit skinny;
job := "doctor" | "lawyer" | "janitor" | "systems analyst";
description := "This " . gender . " is a " . job . " with "
             . hair-color . " hair and a " . build . " build.";
use description;

Running this yields:

[getty@arjuna matzo]$ ./matzo description.mtz
This man is a systems analyst with brown hair and a skinny build.
[getty@arjuna matzo]$ ./matzo description.mtz
This woman is a janitor with blonde hair and a fit build.
[getty@arjuna matzo]$ ./matzo description.mtz
This woman is a lawyer with blonde hair and a fat build.
[getty@arjuna matzo]$ ./matzo description.mtz
This woman is a janitor with red hair and a fat build.

In the future: variables (e.g. reusing a single generated value) and predicates (e.g. pronoun(man) := he, pronoun(woman) := she for use in complicated expressions.) Still, I'm proud of how far it is with so little work.