The Perl INTERCAL compiler

... Parsers

CLC-INTERCAL 1.-94 no longer includes a parser. Instead, it contains a parser generator. The source language for the parser generator is a includes the CREATE statement and the ability to assign to special registers (to control the runtime operating mode). This document describes the syntax of a CREATE statement, and shows an example of a simple parser for a subset of INTERCAL-72

Syntax

The CREATE statement has the form:
    CREATE grammar class template AS code
The grammar is optional, and defaults to the contents of the value contained in special register %GR. If a grammar is specified, it must be a constant (computed grammar are not yet supported at the syntax level, although fully supported if you CREATE the appropriate syntax), specified by a crawling horror ("_") followed by a number. For example, assuming that %GR contains #2, the following are equivalent:
    CREATE _2 THE ,UNIVERSE, AS GOD
    CREATE THE ,UNIVERSE, AS GOD
(Please forgive us the minor poetic licence: there is actually no code for GOD in the distribution, although you can always add it to the assembler, asm.iacc).

The class specifies a syntactic class (some other languages might call it a nonterminal). Usually, this takes the form of a what ("?") followed by some alphanumerics, although anything which evaluates to a number will do. Please note that this in CLC-INTERCAL the what does not introduce a unary logical operator as in C-INTERCAL, and it always produces a named constant (of sorts).

The template is composed of a sequence of terminals and nonterminals which vaguely resemble the syntax you are trying to define. Nonterminals are specified as a special type of constant, usually introduced by the "what" discussed before. Terminals are specified as "array slices", that is sequences of numbers enclosed in tails or hybrids and representing a 16- or 32-bit array. The text which would be produced by READing OUT these arrays specifies the syntax to define. For example, consider the following line from 1972.iacc:

    DO CREATE _1 ?VERB ?LVALUE ,#91 + #91 + #79, ,#95 + #91 + #67,
       ?EXPRESSION AS GER + #1 + STO + ?EXPRESSION #1 + ?LVALUE #1
Ignore for just now the bit after the AS. We are creating something in class ?VERB of grammar _1 (which by default contains the compiler - the compiler compiler is in grammar _2). There is a nonterminal (?LVALUE), which presumably will match a register and anything else you can assign to; then there are two terminals. To see what these are, you can run the following program:
    PLEASE ,1 <- #3
    DO ,1 SUB #1 <- #91
    DO ,1 SUB #2 <- #91
    DO ,1 SUB #3 <- #72
    PLEASE ,2 <- #3
    DO ,2 SUB #1 <- #95
    DO ,2 SUB #2 <- #91
    DO ,2 SUB #3 <- #67
    PLEASE READ OUT ,1 + ,2
    DO GIVE UP
We assign the values to 16-bit arrays because the slices are encloses in tails. If they had been hybrids, we'd use 32-bit arrays. Running the program, we get "<-". This is not a coincidence: the above statement CREATEs the assignment. It should not surprise anybody that the next nonterminal is called ?EXPRESSION.

ICBM

A suitable sequence of CREATE statements can define the syntax for all the known variants of INTERCAL. The semantics is defined by the code part of the CREATE statements. Before detailing how the semantics is specified, we must take a detour.

The INTERCAL Common Bytecode Machine (ICBM) is an assembler-like language which is used by the CLC-INTERCAL compiler as an intermediate language. It can be interpreted directly by the bytecode interpreter, or run directly on any computer with the INTERCAL Operating System; alternatively, a second level compiler will be made available to convert ICBM programs to other languages, for example assembler, Perl, or DD/SH. Please note that it is no longer possible to create a code generator for C, FORTRAL, COBOL or Pascal because the target language must allow self-modifying programs.

ICBM assumes a memory model with multiple stashes: any value can be stashed and retrieved on any of the stashes, which function independently of each other. There is no limit to the size of an element on a stash, or to the number of elements in each stash. Memory allocated to a stash element is automatically freed when the element is removed from the stash.

ICBM programs are inherently multi-threaded. Memory between the threads is shared unless otherwise specified. When a thread terminates, it remains in memory until it is absolutely clear that the thread cannot came back to life. This is important, since threads in quantum programs can have their behaviour modified after they have completed execution.

ICBM programs also contain a number of compilers. Usually, there will be a "program compiler" and a "compiler compiler", provided by the system. When compiling a compiler, however, just the "compiler compiler" is provided: the "program compiler" is built during the execution. A compiler compiler is a special type of compiler, which replaces the original compiler compiler after (re-)building itself. In other words, the compiler compiler coincides with the compiler compiler compiler; this is necessary to avoid infinite regression.

Each block of ICBM code in a program has an associated source code and a compiler used to compile it. This allows to avoid to recompile a block if the compiler has not changed since the last time it has been compiled. A separate document describes this "just too late" compiler in some more detail.

ICBM code is described in detail in the documentation provided with the Perl module Language::INTERCAL::ByteCode. Use the perldoc program to access it.

Semantics

The semantics of a compiled object is specified in the CREATE statement after the AS keyword. It consists of a list of mnemonics (separated by intersection symbols). If the template refers to n other templates, the codes generated by these templates will be available in the code by repeating the template identifier (with its own "what" or whatever was used to introduce it) followed by an expression. If there is only one instance of each different template, the expression must be #1, otherwise it indicates which instance to use. For example, considering again the assignment statement from 1972.iacc:
    DO CREATE _1 ?VERB ?LVALUE ,#91 + #91 + #79, ,#95 + #91 + #67,
       ?EXPRESSION AS GER + #1 + STO + ?EXPRESSION #1 + ?LVALUE #1
We see that we refer to both the ?EXPRESSION and the ?LVALUE: since there is one of each, these references are followed by #1. Suppose the statement to execute is DO .1 <- #2. Then the ?LVALUE will generate SPO + #1 and the ?EXPRESSION generates #2. So, replacing what needs to be replaced, the assignment produces:
    GER + #1 + STO + #2 + SPO + #1
Executing it, we first check whether gerund #1 is to be ABSTAINed FROM; if it is not, we STOre #2 to register SPOt #1. Just as we required.

Example

See the various compiler parsers provided with the distribution for plenty of examples.

These examples should be self-explanatory. If not, reread this document.


Back