Sunday, September 20, 2015

Parsing in ZILF, part 1: The ideal sentence

This is the first in a series of posts describing ZILF's parser. Read part 2 here.

Writing one's own interactive fiction language is a popular pastime, but people who attempt it quickly find that the language itself is the easy part. The hard part is fleshing out the parser and world model to the standards that players have come to expect.

We've come a long way since Colossal Cave and its two-word sentences ("get lamp", "throw axe"). Most modern games understand things like:
  • Prepositions (the "up" in "pick up lamp")
  • Different actions for the same verb word ("look at" vs. "look in" vs. "look under")
  • Commands with two or more noun phrases ("put coin in slot")
  • Complex noun phrases ("take key, lamp, and all cubes except the red cube")
  • Multiple commands per line ("north. west. open door then go in.")
  • Ordering ("watson, take the prints to the crime lab")
  • Disambiguation ("take polish" => "Which do you mean, the shoe polish or the Polish sausage?")
  • Implied nouns ("take" by itself when there's only one thing to pick up)
  • Correcting mistakes ("oops") and repeating commands ("again")
In fact, even in the 80s, Infocom's games understood those, so ZILF really should as well.

In this series, I'll explain how ZILF deals with all of the above, but for now let's look at a simple command like "pick up lamp". Here's the source code that defines the syntax for that command:
<SYNTAX PICK UP OBJECT (FIND TAKEBIT) (MANY ON-GROUND IN-ROOM) = V-TAKE>
Look at the parts in turn:
  • PICK: The verb word. For this syntax to take effect, the verb in the player's command has to be "pick" (or one of its synonyms, if it had any). The compiler assigns a new verb number to "pick", which is given the constant name ACT?PICK; this would also become the verb number of any synonyms.
  • UP: A preposition. The compiler assigns a preposition number to "up", the constant PR?UP.
  • OBJECT: Shows how many objects the command wants. OBJECT can appear zero, one, or two times in the syntax line; since it appears once here, this is a one-object command.
  • (FIND TAKEBIT): The "find flag", hinting at what kind of objects the command prefers.
  • (MANY ON-GROUND IN-ROOM): The "search options", hinting at where and how the command prefers to find its objects.
  • V-TAKE: The name of the action routine, which contains the code that implements taking. The compiler assigns an action number based (mostly) on the name of the action routine, in this case the constant V?TAKE; the action number identifies what the player wants to do. There might be many different syntaxes — "get lamp", "take lamp" — that all use different verbs to represent the same action. (And yes, unfortunately, V for action numbers and ACT for verb numbers is the opposite of what you'd expect.)
Notice that nothing in that line says exactly which words have to appear in which order, and in fact the parser will accept "lamp pick up"! Unlike Inform, ZILF doesn't define a precise grammar for each command; it has a general idea of what makes a sentence, and it uses its rudimentary knowledge of English to extract the parts of a command and fit them into a pattern.

So, what makes a sentence? A verb, a preposition, a noun phrase, another preposition, and another noun phrase. Everything except the verb is optional, but there are a few rules: you can't have a second noun phrase without a first noun phrase, each preposition needs a corresponding noun phrase, and two or more prepositions in a row collapse into one (so "look up in air" becomes "look up air").

The parser looks for a verb, extracts the prepositions and noun phrases from the player's command to fit them into the ideal sentence pattern, then searches through all the syntax lines for that verb, hoping to find one that has the same prepositions and number of noun phrases that the player typed. If it finds one, it can move on to identifying which "lamp" the player is talking about.

If that doesn't work, the parser may still be able to find a syntax line that can be made to work with more information, in which case it'll ask a clarifying question ("What do you want to pick up?") and orphan the command, keeping it around just long enough to see if the next input gives more information. Or it may decide that none of the syntaxes can possibly match and issue an error.

[Update: Corrected the name of the preposition number constant.]

Saturday, May 31, 2014

Generating better code in ZILF 0.5

I've done a lot of work on the code generator and optimizer for the upcoming version of ZILF. Here's one example of how the generated code has been improved.

Parser.zil has a routine called TAKE-CONT-SEARCH, which helps V-TAKE determine whether the object being taken is in a container:
<ROUTINE TAKE-CONT-SEARCH (A "AUX" H)  
    <OBJECTLOOP I .A
        <COND (<OR <FSET? .I ,CONTBIT> <==? .I ,WINNER>>
                <COND (<IN? ,PRSO .I>
                        <SET H .I>
                        <RETURN>)
                    (ELSE
                        <SET H <TAKE-CONT-SEARCH .I>>
                        <AND .H <RETURN>>)>)>>
    <AND .H <RETURN .H>>>
In ZILF 0.4, that turned into 14 instructions:
      .FUNCT TAKE-CONT-SEARCH,A,H,I FIRST? A >I /?L1 ?L1: ZERO? I /?L3 FSET? I,CONTBIT /?L8 EQUAL? I,WINNER \?L12 ?L8: IN? PRSO,I \?L9 SET 'H,I JUMP ?L3 ?L9: CALL TAKE-CONT-SEARCH,I >H ZERO? H \?L3 ?L12: NEXT? I >I /?L1 JUMP ?L1 ?L3: ZERO? H \?L18 RETURN H ?L18: RETURN H 
In ZILF 0.5, it's only 11 after eliminating a ZERO?, a JUMP, and the duplicate RETURN:
.FUNCT TAKE-CONT-SEARCH,A,H,I
FIRST? A >I \?L3
?L19: FSET? I,CONTBIT /?L8
EQUAL? I,WINNER \?L12
?L8: IN? PRSO,I \?L9
SET 'H,I
JUMP ?L3
?L9: CALL TAKE-CONT-SEARCH,I >H
ZERO? H \?L18
?L12: NEXT? I >I /?L19
?L3: ZERO? H /FALSE
?L18: RETURN H
Here's another example. The REFERS? routine checks whether a noun, or adjective-noun pair, matches the vocabulary of some object:
<ROUTINE REFERS? (A N O)
    <AND <OR <0? .A> <IN-PB/WTBL? .O ,P?ADJECTIVE .A>>
         <IN-PWTBL? .O ,P?SYNONYM .N>>>
The ZIL code is pretty simple, but ZILF 0.4 handled it poorly, needing 12 instructions and a temporary variable:
.FUNCT REFERS?,A,N,O,?TMP
ZERO? A /?L5
PUSH 0
JUMP ?L6
?L5: PUSH 1
?L6: POP '?TMP
ZERO? ?TMP /?L4
PUSH ?TMP
JUMP ?L3
?L4: CALL IN-PBTBL?,O,P?ADJECTIVE,A >STACK
?L3: ZERO? STACK /FALSE
CALL IN-PWTBL?,O,P?SYNONYM,N >STACK
RSTACK
Nasty! ZILF 0.5 does it in five instructions, eliminating the temp variable and a whole lot of nonsense:
.FUNCT REFERS?,A,N,O
ZERO? A /?L2
CALL IN-PBTBL?,O,P?ADJECTIVE,A >STACK
ZERO? STACK /FALSE
?L2: CALL IN-PWTBL?,O,P?SYNONYM,N >STACK
RSTACK
Much of the benefit comes from better understanding the relationship between values and conditions, such as knowing when a value is immediately going to be tested against zero, or which branch instructions are testing conditions that have already been tested. The Cloak example is smaller by more than 200 bytes thanks to these optimizations.

Wednesday, May 7, 2014

Inform 7 version 6L02 is out

After a long wait, the new version of Inform 7 is out. Pick it up at inform7.com.

Friday, May 31, 2013

ZLR extended version support

ZLR is slowly but surely gaining support for more Z-machine versions. The latest code now works, at least partially, with all versions except V6.

The differing screen models are always fun. V3 has a top window that you can't do much with, and the status line is not counted as one of its lines. V4 is pretty much like V5 except the bottom window scrolls from the bottom instead of the top. V6 has an entirely different model that, if I choose to implement it, will stretch the practicality of using a single IZMachineIO interface for all of these -- so far I've only had to add a few new members to support the other versions.

I've also thought about adding some interpreter functionality, in order to (1) make ZLR work on platforms where Reflection.Emit isn't supported, and (2) possibly move toward a tracing JIT. No movement on that yet, though.

Monday, May 14, 2012

Guncho development update

I'm taking the plunge and separating the code into pluggable modules, if only so that I can test one part of this massive refactoring at a time.

The introduction of Playfic has encouraged me to step up my game. Supporting I7 extensions is a great idea, and while automatically importing them into Guncho is a while off, my design already includes a way to share extensions between realms, so I can at least import them by hand.

Monday, March 26, 2012

Guncho development update

I've resumed work on Guncho and am basically working along the same roadmap I outlined in December. The first big changes, developed in parallel, will be:
  1. A new persistence layer based on NHibernate, which will store realm and player data in a relational database in real time and keep track of revision history.
  2. A REST API layer based on WCF, which will replace the .NET Remoting channel for communication between the control panel and game engine, and will eventually be exposed to the public.
  3. Decoupling of "Realm" from "Instance", allowing multiple copies of a game to run side by side.

Monday, March 12, 2012

ZLR bug fixes

I've pushed(*) a few changes to the ZLR Subversion repository to make Infocom games more playable. So far I'm only testing with their V5 games, since ZLR only supports V5/V8. But an eventual goal is for ZLR to be able to run and debug everything ZILF can compile, so it'll be getting V3/V4 as well, and maybe V6.

Anyway, I believe these changes will fix every Infocom V5 game -- except Beyond Zork, where the parser is still broken under ZLR. If you're brave enough to download the latest code from SVN, let me know if I missed anything in the other games!

(* yes, pushed, because I'm using Mercurial locally)