Sunday, October 25, 2015

Parsing in ZIL, part 2: Noun phrases

This is the second in a series of posts describing ZILF's parser. Read part 1 here, or read part 3 here.

In the first post, we covered the basic structure of a command: verbs, prepositions, and noun phrases. Recognizing a verb or preposition is easy; it's just a word tagged with that part of speech. Once the parser has the verb and prepositions, it uses them to find a matching grammar line. But how does it recognize noun phrases, and what does it do with them?

What is a noun phrase, anyway?

Here are some sample commands, with the noun phrases in bold:
throw axe at dwarf
get shiny brass lamp, tasty food, and bottle
drop any
get set of keys
get all except sign
put all cubes except red and blue in the chute
A noun phrase is anything the player can type to identify the object of a command. It can refer to a single object, like "axe"; a list of objects, like "lamp, bottle, and food"; a set of objects answering to the same name, like "cubes"; or a set of objects implicitly defined by what the player can see, like "all" or "any". It can exclude objects with "except" or "but". And it can quantify what you mean when there's more than one similar object: do you want all of them, a specific one, or any random one?

The grammar of an acceptable noun phrase is, more or less:
[all/any] [article] [adjective...] [noun]
The individual parts are optional, but at least one must be given. Larger phrases can also be made by stringing small ones together with commas or words like "and", "except", "but", and "of":
[noun phrase], [noun phrase] and [noun phrase] of [noun phrase] except [noun phrase] and [noun phrase]
When the parser encounters a noun phrase, it records the important parts in a data structure called NOUN-PHRASE, which it uses later to find matches within the set of objects available to the player. The structure itself is very simple, containing only a "mode", a list of adjective/noun pairs (called OBJSPECs) to include, and another list of OBJSPECs to exclude.

Let's see what the parser does with some of the noun phrases from above:

"axe"

Mode default
Include (*, AXE)
Exclude none

Simple enough: we're just looking for an object that lists AXE in its SYNONYM property. We're using the default match mode, which means if there's more than one "axe" in sight, the parser will ask which one you mean.

"shiny brass lamp, tasty food, and bottle"

Mode default
Include (SHINY, LAMP)
(TASTY, FOOD)
(*, BOTTLE)
Exclude none

This time we have a few pairs in the Include list, and two of them have adjectives. The parser will look for objects with SHINY in their ADJECTIVE properties and LAMP in their SYNONYM properties. Likewise for TASTY and FOOD. Notice that BRASS is nowhere to be seen -- the parser only remembers one adjective per OBJSPEC.

But BOTTLE can be surprising in a game like Adventure where "bottle" and "bottled water" are both objects. Because of Z-machine limits, "bottle" and "bottled" are treated as the same word: if the player types "get bottled", referring to the bottled water by its adjective, to the parser it looks the same as "get bottle", referring to the bottle by its noun. The parser needs to support both. So even though the structure lists BOTTLE as a noun, the parser will see there's no adjective paired with it and notice that BOTTLE can also be used as an adjective, and it'll look for it in the ADJECTIVE property as well. This is a lower-quality match, so if another nearby object has the word in its SYNONYM property, the parser will pick that one instead.

"set of keys"

Mode default
Include (*, KEYS)
Exclude none

Hey, what happened to SET?

Object names containing "of" are special. SET and KEYS are both in the object's SYNONYM property, but the parser only thinks about one adjective and one noun at a time, so they can't both be used. Adjacent nouns normally belong to separate noun phrases, e.g. "give troll money".

However, when the parser sees "of" in a noun phrase, it forgets the last noun it saw! So "set of keys" becomes simply "keys". This makes the parser's job easier, but at the cost of being unable to know whether the player wants the set of keys, the pile of keys, or the stack of boxes of pictures of keys. (It can still tell them apart by asking questions, as we'll see in the post on orphaning.)

"any"

Mode any
Include none
Exclude none

This time, we didn't give any nouns or adjectives, only a mode. Since the Include list is empty, the parser will choose from all available objects, and since the match mode is "any", it'll choose a random object if there's more than one match.

"all except sign"

Mode all
Include none
Exclude (*, SIGN)

Again, the Include list is empty, so the parser chooses from all available objects, then excludes any object with the word SIGN in its SYNONYM property. Since the match mode is "all", if there's more than one "sign" available, the parser will include exclude all of them.

"all cubes except red and blue"

Mode all
Include (*, CUBES)
Exclude (RED, *)
(BLUE, *)

This time, the Include and Exclude lists are both present. The parser will find all available objects with CUBES in their SYNONYM property, then exclude any that have RED or BLUE in their ADJECTIVE property.

1 comment:

  1. Whoa, thanks, this could be very useful information some day.

    ReplyDelete