I've had an idea kicking around in the back of my head for a while, and I finally got around to making a working prototype (which I will probably end up deciding is the final version, at some point.) It has to do with generating random words based on grammar rules, with an emphasis on making certain things more likely and generating 'average' words. It came out of early D&D playing, where I wanted to have a consistent set of phonologies for every fake language that D&D had.
It is called Matzo, because I asked a friend for a random name and that's what she said. She also suggested, among other things, Yoko, but I decided against that.
Here's the basic idea: the following is verbatim a Matzo source file, which I have saved as
aquan.mtz for testing:
word := syllable . syllable . (6 @ (syllable)); use word; syllable := 2: vowel | vowel . "'" | 4: consonant . vowel | consonant . vowel . "'"; consonant ::= p t k h wh l m n ng r w; vowel ::= a i u e o;
These statements can be in any order, and whitespace isn't significant except as a separator for certain tokens, so you can format/arrange the lines however you want. There are three kinds of statements:
usestatements tell you which rule is going to start. If you omit one, it won't run. If you have two, it picks the first one.
- Normal assignments (represented by
:=) take an expression that contains a mixture of disjunction (
a | b), concatenation (
a . b), weighting (
5: b) and repetition (
5@(b)) and which can be parenthesized. Expressions also include both literals, which are surrounded by quotation marks, and identifiers, which refer to other rules.
- Literal assignments (represented by
::=) which differ in that they are assumed to have a space-separated list of literals, possibly with weighting. This was for the common case where you want one rule to contain a simple disjunction of literals, such as
A lot of these things are shorthand for other things—really, all you need is concatenation,
disjunction, and literals, and you can do most everything. The syntax
5: foo is shorthand
foo foo foo foo foo, so it is used to make an option more prevalent in a disjunction.
vowel ::= i u;
chooses i and u about as often, whereas
vowel ::= 9:i u;
will choose i nine times out of ten.
5@(foo) is another shorthand, somewhat less useful. The statement
word := 3@(syllable);
is equivalent to
word := syllable | syllable . syllable | syllable . syllable . syllable;
and is used in many various circumstances in my grammars, although it's less useful in other circumstances.
There are still problems with my implementation, which I am going to fix and put up on Github, but the grammar shown above (and others) work correctly. This is an idea I've had for ages; I can't believe it's taken me so long to sit down and write it, especially as it took no time at all. (I did the whole thing while an autograder was running for something I was grading.)
Here's an example of running the above file:
[getty@arjuna matzo]$ ./matzo aquan.mtz nu'melamio'o [getty@arjuna matzo]$ ./matzo aquan.mtz hopuwho [getty@arjuna matzo]$ ./matzo aquan.mtz iloa [getty@arjuna matzo]$ ./matzo aquan.mtz nenopungau
It's not necessarily limited to random words, but it's lacking a lot of utility that would make it sufficient for other purposes, which I will eventually add. Still, as an example of what it could also be used for:
gender ::= man woman; hair-color ::= black brown red blonde pink; build ::= fat fit skinny; job := "doctor" | "lawyer" | "janitor" | "systems analyst"; description := "This " . gender . " is a " . job . " with " . hair-color . " hair and a " . build . " build."; use description;
Running this yields:
[getty@arjuna matzo]$ ./matzo description.mtz This man is a systems analyst with brown hair and a skinny build. [getty@arjuna matzo]$ ./matzo description.mtz This woman is a janitor with blonde hair and a fit build. [getty@arjuna matzo]$ ./matzo description.mtz This woman is a lawyer with blonde hair and a fat build. [getty@arjuna matzo]$ ./matzo description.mtz This woman is a janitor with red hair and a fat build.
In the future: variables (e.g. reusing a single generated value) and predicates (e.g.
pronoun(man) := he, pronoun(woman) := she for use in complicated expressions.)
Still, I'm proud of how far it is with so little work.