DM4 §33: Helping the parser out of trouble

§33 Helping the parser out of trouble

Once you begin programming the parser on a large scale, you soon reach the point where the parser's ordinary error messages no longer appear sensible. The ParserError entry point can change the rules even at this last hurdle: it takes one argument, the error type, and should return true to tell the parser to shut up, because a better error message has already been printed, or false, to tell the parser to print its usual message. The error types are defined as constants:

`STUCK_PE`	I didn't understand that sentence.
`UPTO_PE`	I only understood you as far as…
`NUMBER_PE`	I didn't understand that number.
`CANTSEE_PE`	You can't see any such thing.
`TOOLIT_PE`	You seem to have said too little!
`NOTHELD_PE`	You aren't holding that!
`MULTI_PE`	You can't use multiple objects with that verb.
`MMULTI_PE`	You can only use multiple objects once on a line.
`VAGUE_PE`	I'm not sure what ‘it’ refers to.
`EXCEPT_PE`	You excepted something not included anyway!
`ANIMA_PE`	You can only do that to something animate.
`VERB_PE`	That's not a verb I recognise.
`SCENERY_PE`	That's not something you need to refer to…
`ITGONE_PE`	You can't see ‘it’ (the whatever) at the moment.
`JUNKAFTER_PE`	I didn't understand the way that finished.
`TOOFEW_PE`	Only five of those are available.
`NOTHING_PE`	Nothing to do!
`ASKSCOPE_PE`	whatever the scope routine prints

Each unsuccessful grammar line ends in one of these conditions. By the time the parser wants to print an error, every one of the grammar lines in a verb will have failed. The error message chosen it prints is the most “interesting” one: meaning, lowest down this list.

If a general parsing routine you have written returns GPR_FAIL, then the grammar line containing it normally ends in plain STUCK_PE, the least interesting of all errors (unless you did something like calling the library's ParseToken routine before giving up, which might have set a more interesting error like CANTSEE_PE). But you can choose to create a new error and put it in the parser's variable etype, as in the following example:

[ Degrees d;
  d = TryNumber(wn++);
  if (d == -1000) return GPR_FAIL;
  if (d <= 360) { parsed_number = d; return GPR_NUMBER; }
  etype = "There are only 360 degrees in a circle.";
  return GPR_FAIL;
];

This parses a number of degrees between 0 and 360. Although etype normally only holds values like VERB_PE, which are numbers lower than 100, here we've set it equal to a string. As this will be a value that the parser doesn't recognise, we need to write a ParserError routine that will take care of it, by reacting to a string in the obvious way – printing it out.

[ ParserError error_type;
  if (error_type ofclass String) print_ret (string) error_type;
  rfalse;
];

This will result in conversation like so:

>steer down
I didn't understand that sentence.
>steer 385
There are only 360 degrees in a circle.

In the first case, Degrees failed without setting any special error message on finding that the second word wasn't a number; in the second case it gave the new, specific error message.

· · · · ·

The VAGUE_PE and ITGONE_PE errors apply to all pronouns (in English, “it”, “him”, “her” and “them”). The variable vague_word contains the dictionary address of whichever pronoun is involved ('it', 'him' and so on).

You can find out the current setting of a pronoun using the library's PronounValue routine: for instance, PronounValue('it') gives the object which “it” currently refers to, possibly nothing. Similarly SetPronoun('it', magic_ruby) would set “it” to mean the magic ruby object. You might want this because, when something like a magic ruby suddenly appears in the middle of a turn, players will habitually call it “it”. A better way to adjust the pronouns is to call PronounNotice(magic_ruby), which sets whatever pronouns are appropriate. That is, it works out if the object is a thing or a person, of what number and gender, which pronouns apply to it in the parser's current language, and so on. In code predating Inform 6.1 you may see variables called itobj, himobj and herobj holding the English pronoun values: these still work properly, but please use the modern system in new games.

· · · · ·

▲ The Inform parser resolves ambiguous object names with a pragmatic algorithm which has evolved over the years (see below). Experience also shows that no two people ever quite agree on what the parser should “naturally” do. Designers have an opportunity to influence this by providing an entry point routine called ChooseObjects:

ChooseObjects(obj, code)

is called in two circumstances. If code is false or true, the parser is considering including the given obj in an “all”: false means the parser has decided against, true means it has decided in favour. The routine should reply

0	to accept the parser's decision;
1	to force the object to be included; or
2	to force the object to be excluded.

It may want to decide using verb_word (the variable storing the current verb word, e.g., 'take') and action_to_be, which is the action which would happen if the current line of grammar were successfully matched.

The other circumstance is when code is 2. This means the parser is choosing between a list of items which made equally good matches against some text, and would like a hint. ChooseObjects should then return a number from 0 to 9 to give obj a score for how appropriate it is.

For instance, some designers would prefer “take all” not to attempt to take scenery objects (which Inform, and the parsers in most of the Infocom games, will do). Let us code this, and also teach the parser that edible things are more likely to be eaten than inedible ones:

[ ChooseObjects obj code;
  if (code < 2) { if (obj has scenery) return 2; rfalse; }
  if (action_to_be == ##Eat && obj has edible) return 3;
  if (obj hasnt scenery) return 2;
  return 1;
];

Scenery is now excluded from “all” lists; and is further penalised in that non-scenery objects are always preferred over scenery, all else being equal. Most objects score 2 but edible things in the context of eating score 3, so “eat black” will now always choose a Black Forest gateau in preference to a black rod with a rusty iron star on the end.

•▲ EXERCISE 105
Allow “lock” and “unlock” to infer their second objects without being told, if there's an obvious choice (because the player's only carrying one key), but to issue a disambiguation question otherwise. (Use Extend, not ChooseObjects.)

•▲ EXERCISE 106
Joyce Haslam's Inform edition of the classic Acornsoft game ‘Gateway to Karos’ requires a class called FaintlyLitRoom for rooms so dimly illuminated that “take all” is impossible. How might this work?

· · · · ·

▲▲ Suppose we have a set of objects which have all matched equally well against the textual input, so that some knowledge of the game world is needed to resolve which of the objects – possibly only one, possibly more – is or are intended. Deciding this is called “disambiguation”, and here in full are the rules used by library 6/10 to do it. The reader is cautioned that after six years, these rules are still evolving.

(1)

Call an object “good” according to a rule depending on what kind of token is being matched:

`held`	Good if its parent is the actor.
`multiheld`	Good if its parent is the actor.
`multiexcept`	Good if not also the second object, if that's known yet.
`multiinside`	Good if not inside the second object, if that's known yet.
`creature`	Good if `animate`, or if the proposed action is `Ask`, `Answer`, `Tell` or `AskFor` and the object is `talkable`.
other tokens	All objects are good.

If only a single object is good, this is immediately chosen.

(2) If the token is creature and no objects are good, fail the token altogether, as no choice can make sense.

(3)

Objects which don't fit “descriptors” used by the player are removed:

if “my”, an object whose parent isn't the actor is discarded;

if “that”, an object whose parent isn't the actor's location is discarded;

if “lit”, an object which hasn't light is discarded;

if “unlit”, an object which has light is discarded;

if “his” or some similar possessive pronoun, an object not owned by the person implied is discarded.

Thus “his lit torches” will invoke two of these rules at once.

(4) If there are no objects left, fail the token, as no choice can make sense.

(5)

It is now certain that the token will not fail. The remaining objects are assigned a score as follows:

1000 × C points, where C is the return value of ChooseObjects(object,2). (0 ≤ C ≤ 9. If the designer doesn't provide this entry point at all then C = 0.)
500 points for being “good” (see (1) above).
100 points for not having concealed.

P points depending on the object's position:

P =	{	A	if object belongs to the actor,
		L	if object belongs to the actor's visibility ceiling,
		20	if object belongs anywhere else except the compass,
		0	if object belongs to the compass.

(Recall that “visibility ceiling” usually means “location” and that the objects belonging to the compass are exactly the compass directions.) The values A and L depend on the token being parsed:

{	A = 60 L = 40 for `held` or `multiheld` tokens,
	A = 40 L = 60 otherwise.

10 points for not having scenery.
5 points for not being the actor object.
1 point if the object's gender, number and animation (GNA) matches one possibility implied by some pronoun typed by the player: for instance “them” in English implying plural, or “le” in French implying masculine singular.

(6d) In “definite mode”, such as if the player has typed a definite article like “the”, if any single object has highest score, choose that object.

(7ip)

The following rule applies only in indefinite mode and provided the player has typed something definitely implying a plural, such as the words “all” or “three” or “coins”. Here the parser already has a target number of objects to choose: for instance 3 for “three”, or the special value of 100, meaning “an unlimited number”, for “all” or “coins”.
Go through the list of objects in “best guess” order (see below). Mark each as “accept” unless:

it has worn or concealed;
or the action is Take or Remove and the object is held by the actor;
or the token is multiheld or multiexcept and the object isn't held by the actor;
or the target number is “unlimited” and S/20 (rounded down to the nearest integer) has fallen below its maximum value, where S is the score of the object.

The entry point ChooseObjects(object,accept_flag) is now called and can overrule the “accept”/“reject” decision either way. We keep accepting objects like this until the target is reached, or proves impossible to reach.

(8) The objects are now grouped so that any set of indistinguishable objects forms a single group. “Indistinguishable” means that no further text typed by the player could clarify which is meant (see §29). Note that there is no reason to suppose that two indistinguishable objects have the same score, because they might be in different places.

(9d) In definite mode, we know that there's a tie for highest score, as otherwise a choice would have been made at step (6d). If these highest-scoring objects belong to more than one group, then ask the player to choose which group:

You can see a bronze coin and four gold coins here.
>get coin
Which do you mean, the bronze coin or a gold coin?
>gold

The player's response is inserted textually into the original input and the parsing begins again from scratch with “get gold coin” instead of “get coin”.

(10) Only two possibilities remain: either (i) we are in indefinite but singular mode, or (ii) we are in definite mode and there is a tie for highest-scoring object and all of these equal-highest objects belong to the same group. Either way, choose the “best guess” object (see below). Should this parsing attempt eventually prove successful, print up an “inference” on screen, such as

>get key
(the copper key)

only if the number of groups found in (8) is more than 1.

(BG) It remains to define “best guess”. From a set of objects, the best guess is the highest-scoring one not yet guessed; and if several objects have equal highest scores, it is the earliest one to have been matched by the parser. In practice this means the one most recently taken or dropped, because the parser tries to match against objects by traversing the object-tree, and the most recently moved object tends to be the first in the list of children of its parent.

• REFERENCES
See ‘Balances’ for a usage of ParserError. •Infocom's parser typically produces error messages like “I don't know the word ‘tarantula’.” when the player types a word not in the game's dictionary, and some designers prefer this style to Inform's give-nothing-away approach (Inform tries not to let the player carry out experiments to see what is, and what is not, in the dictionary). Neil Cerutti's "dunno.h" library extension restores the Infocom format. •The library extension "calyx_adjectives.h", which resolves ambiguities in parsing by placing more weight on matches with nouns than with adjectives, works by using ChooseObjects.