TDF Notation Compiler

January 1998

1 - Introduction

2 - Input classes

2.1 - Delimiters
2.2 - White space
2.3 - Comments
2.4 - Identifiers
2.5 - Numbers
2.6 - Strings
2.7 - Blanks
2.8 - Bars

3 - Input syntax

3.1 - Basic syntax
3.2 - Sorts
3.3 - Numbers and strings
3.4 - Tokens, tags, alignment tags and labels
3.5 - Outer level syntax
3.6 - Included files
3.7 - Internal and external names
3.8 - Token declarations
3.9 - Token definitions
3.10 - Alignment tag declarations
3.11 - Alignment tag definitions
3.12 - Tag declarations
3.13 - Tag definitions

4 - Shape checking

5 - Remarks

6 - Limitations

7 - Manual Page for tnc

1. Introduction

The TDF notation compiler, tnc, is a tool for translating TDF capsules to and from text. This paper gives a brief introduction to how to use this utility and the syntax of the textual form of TDF. The version here described is that supporting version 3.1 of the TDF specification.

tnc has four modes, two input modes and two output modes. These are as follows:

decode - translate an input TDF capsule into the tnc internal representation,
read - translate an input text file into the internal representation,
encode - translate the internal representation into an output TDF capsule,
write - translate the internal representation into an output text file.

Due to the modular nature of the program it is possible to form versions of tnc in which not all the modes are available. Passing the -version flag to tnc causes it to report which modes it has implemented.

Any application of tnc consists of the composite of an input mode and an output mode. The default action is read-encode, i.e. translate an input test file into an output TDF capsule. Other modes may be specified by passing the following command line options to tnc:

-decode or -d,
-read or -r,
-encode or -e,
-write or -w.

The only other really useful action is decode-write, i.e. translate an input TDF capsule into an output text file. This may also be specified by the -print or -p option. The actions decode-encode and read-write are not precise identities, they do however give equivalent input and output files.

In addition, the decode mode may be modified to accept a TDF library as input rather than a TDF capsule by passing the addition flag:

-lib or -l,

to tnc.

The overall syntax for tnc is as follows:

	tnc [ options ... ] input_file [ output_file ]

If the output file is not specified, the standard output is used.

2. Input classes

The rest of this paper is concerned with the form required of the input text file. The input can be divided into eight classes.

2.1. Delimiters

The characters ( and ) are used as delimiters to impose a syntactic structure on the input.

2.2. White space

White space comprises sequences of space, tab and newline characters, together with comments (see below). It is not significant to the output (TDF notation is completely free-form), and serves only to separate syntactic units. Every identifier, number etc. must be terminated by a white space or a delimiter.

2.3. Comments

Comments may be inserted in the input at any point. They begin with a # character and run to the end of the line.

2.4. Identifiers

An identifier consists of any sequence of characters drawn from the following set: upper case letters, lower case letters, decimal digits, underscore (_), dot (.), and tilde (~), which does not begin with a decimal digit. tnc generates names beginning with double tilde (~~) for unnamed objects when in decode mode, so the use of such identifiers is not recommended.

2.5. Numbers

Numbers can be given in octal (prefixed by 0), decimal, or hexadecimal (prefixed by 0x or 0X). Both upper and lower case letters can be used for hex digits. A number can be preceded by any number of + or - signs.

2.6. Strings

A string consists of a sequence of characters enclosed in double quotes ("). The following escape sequences are recognised:

\n represents a newline character,
\t represents a tab character,
\xxx, where xxx consists of three octal digits, represents the character with ASCII code xxx.

Newlines are not allowed in strings unless they are escaped. For all other escaped characters, \x represents x.

2.7. Blanks

A single minus character (-) has a special meaning. It may be used to indicate the absence of an optional argument or optional group of arguments.

2.8. Bars

A single vertical bar (|) has a special meaning. It may be used to indicate the end of a sequence of repeated arguments.

3. Input syntax

3.1. Basic syntax

The basic input syntax is very simple. A construct consists of an identifier followed by a list of arguments, all enclosed in brackets in a Lisp-like fashion. Each argument can be an identifier, a number, a string, a blank, a bar, or another construct. There are further restrictions on this basic syntax, described below.

	construct	: ( identifier arglist )

	argument	: construct
			| identifier
			| number
			| string
			| blank
			| bar

	arglist		: (empty)
			| argument arglist

The construct ( identifier ), with an empty argument list, is equivalent to the identifier argument identifier. The two may be used interchangeably.

3.2. Sorts

Except at the outermost level, which forms a special case discussed below, every construct and argument has an associated sort. This is one of the basic TDF sorts: access, al_tag, alignment, bitfield_variety, bool, callees, error_code, error_treatment, exp, floating_variety, label, nat, ntest, procprops, rounding_mode, shape, signed_nat, string, tag, transfer_mode, variety, tdfint or tdfstring.

Ignoring for the moment the shorthands discussed below, the ways of creating constructs of sort exp say, correspond to the TDF constructs delivering an exp. For example, contents takes a shape and an exp and delivers an exp. Thus:

	( contents arg1 arg2 )

where arg1 is an argument of sort shape and arg2 is an argument of sort exp, is a sort-correct construct. Only constructs which are sort correct in this sense are allowed.

As another example, because of the rule concerning constructs with no arguments, both

	( true )

and

	false

are valid constructs of sort bool.

TDF constructs which take lists of arguments are easily dealt with. For example:

	( make_nof arg1 ... argn )

where arg1, ..., argn are all arguments of sort exp, is valid. A vertical bar may be used to indicate the end of a sequence of repeated arguments.

Optional arguments should be entered normally if they are present. Their absence may be indicated by means of a blank (minus sign), or by simply omitting the argument.

The vertical bar and blank should be used whenever the input is potentially ambiguous. Particular care should be taken with apply_proc (which is genuinely ambiguous) and labelled.

The TDF specification should be consulted for a full list of valid TDF constructs and their argument sorts. Alternatively the tnc help facility may be used. The command:

	tnc -help cmd1 ... cmdn

prints sort information on the constructs or sorts cmd1, ..., cmdn. Alternatively:

	tnc -help

prints this information for all constructs. (To obtain help on the sort alignment as opposed to the construct alignment use alignment_sort. This confusion cannot occur elsewhere.)

3.3. Numbers and strings

Numbers can occur in two contexts, as the argument to the TDF constructs make_nat and make_signed_nat. In the former case the number must be positive. The following shorthands are understood by tnc:

	number for ( make_nat number )
	number for ( make_signed_nat number )

depending on whether a construct of sort nat or

signed_nat

is expected.

Strings are nominally of sort tdfstring. They are taken to be simple strings (8 bits per character). Multibyte strings (those with other than 8 bits per character) may be represented by means of the multi_string construct. This takes the form:

	( multi_string b c1 ... cn )

where b is the number of bits per character and c1, ...,cn are the codes of the characters comprising the string. These multibyte strings cannot be used as external names.

In addition, a simple (8 bit) string can be used as a shorthand for a TDF construct of sort string, as follows:

	string for ( make_string string )

3.4. Tokens, tags, alignment tags and labels

In TDF simple tokens, tags, alignment tags and labels are represented by numbers which may, or may not, be associated with external names. In tnc however they are represented by identifiers. This brings the problem of scoping which does not occur in TDF. The rules are that all tokens, tags, alignment tags and labels must be declared before they are used. Externally defined objects have global scope, and the scope of a formal argument in a token definition is the definition body. For those constructs which introduce a local tag or label - for example, identify, make_proc, make_general_proc and variable for tags and conditional, labelled and repeat for labels - the scope of the object is as set out in the TDF specification.

The following shorthands are understood by tnc, according to the argument sort expected:

	tag_id for ( make_tag tag_id )
	al_tag_id for ( make_al_tag al_tag_id )
	label_id for ( make_label label_id )

The syntax for token applications is as follows:

	( apply_construct ( token_id arg1 ... argn ) )

where apply_construct is the appropriate TDF token application construct, for example, exp_apply_token for tokens declared to deliver exp's. The token arguments arg1, ..., argn must be of the sorts indicated in the token declaration or definition. For tokens without any arguments the alternative form:

	( apply_construct token_id )

is allowed.

The token application above may be abbreviated to:

	( token_id arg1 ... argn )

the result sort being known from the token declaration. This in turn may be abbreviated to:

	token_id

when there are no token arguments.

Care needs to be taken with these shorthands, as they can lead to confusion, particularly when, due to optional arguments or lists of arguments, tnc is not sure what sort is coming next. The five categories of objects represented by identifiers - TDF constructs, tokens, tags, alignment tags and labels - occupy separate name spaces, but it is a good idea to try to avoid duplication of names.

By default all these shorthands are used by tnc in write mode. If this causes problems, the -V flag should be passed to tnc.

3.5. Outer level syntax

At the outer level tnc is expecting a sequence of constructs of the following forms:

an included file,
a token declaration,
a token definition,
an alignment tag declaration,
an alignment tag definition,
a tag declaration,
a tag definition.

3.6. Included files

Included files may be of three types - text, TDF capsule or TDF library. For TDF capsules and libraries there are two include modes. The first just decodes the given capsule or set of capsules. The second scans through them to extract token declaration information. These declarations appear in the output file only if they are used elsewhere.

The syntax for an included text file is:

	( include string )

where string is a string giving the pathname of the file to be included. tnc applies read to this sub-file before continuing with the present file.

Similarly, the syntaxes for included TDF capsules and libraries are:

	( code string )
	( lib string )

respectively. tnc applies decode to this capsule or set of capsules (provided this mode is available) before continuing with the present file.

The syntaxes for extracting the token declaration information from a TDF capsule or library are:

	( use_code string )
	( use_lib string )

Again, these rely on the decode mode being available.

3.7. Internal and external names

All tokens, tags and alignment tags have an internal name, namely the associated identifier, but this name does not necessarily appear in the corresponding TDF capsule. There must firstly be an associated declaration or definition at the outer level - tags internal to a piece of TDF do not have external names. Even then we may not wish this name to appear at the outer level, because it is local to this file and is not required for linking purposes. Alternatively we may wish a different external name to be associated with it in the TDF capsule.

As an example of how tnc allows for this, consider token declarations (although similar remarks apply to token definitions, alignment tag definitions etc.). The basic form of the token declaration is:

	( make_tokdec token_id ... )

This creates a token with both internal and external names equal to token_id. Alternatively:

	( local make_tokdec token_id ... )

creates a token with internal name token_id but no external name. This allows the creation of tokens local to the current file. Again:

	( make_tokdec ( string_extern string ) token_id ... )

creates a token with internal name token_id and external name given by the string string. For example, to create a token whose external name is not a valid identifier, it would be necessary to use this construct. Finally:

	( make_tokdec ( unique_extern string1 ... stringn ) token_id ... )

creates a token with internal name token_id and external name given by the unique name consisting of the strings string1, ..., stringn.

The local quantifier should be used consistently on all declarations and definitions of the token, tag or alignment tag. The alternative external name should only be given on the first occasion however. Thereafter the object is identified by its internal name.

3.8. Token declarations

The basic form of a token declaration is:

	( make_tokdec token_id ( arg1 ... argn ) res )

where the token token_id is declared to take argument sorts arg1, ..., argn and deliver the result sort res. These sorts are given by their sort names, al_tag, alignment, bitfield_variety etc. For a token with no arguments the declaration may be given in the form:

	( make_tokdec token_id res )

A token may be declared any number of times, provided the declarations are consistent.

This basic declaration may be modified in the ways outlined above to specify the external token name.

3.9. Token definitions

The basic form of a token definition is:

	( make_tokdef token_id ( arg1 id1 ... argn idn ) res def )

where the token token_id is defined to take formal arguments id1, ..., idn of sorts arg1, ..., argn respectively and have the value def, which is a construct of sort res. The scope of the tokens id1, ..., idn is def.

For a token with no arguments the definition may be given in the form:

	( make_tokdef token_id res def )

A token may be defined more than once. All definitions must be consistent with any previous declarations and definitions (the renaming of formal arguments is allowed however).

This basic definition may be modified in the ways outlined above to specify the external token name.

3.10. Alignment tag declarations

The basic form of an alignment tag declaration is:

	( make_al_tagdec al_tag_id )

where the alignment tag al_tag_id is declared to exist.

This basic declaration may be modified in the ways outlined above to specify the external alignment tag name.

3.11. Alignment tag definitions

The basic form of an alignment tag definition is:

	( make_al_tagdef al_tag_id def )

where the alignment tag al_tag_id is defined to be def, which is a construct of sort alignment. An alignment tag may be declared or defined more than once, provided the definitions are consistent.

This basic definition may be modified in the ways outlined above to specify the external alignment tag name.

3.12. Tag declarations

The basic forms of a tag declaration are:

	( make_id_tagdec tag_id info dec )
	( make_var_tagdec tag_id info dec )
	( common_tagdec tag_id info dec )

where the tag tag_id is declared to be an identity, variable or common tag with access information info, which is an optional construct of sort access, and shape dec, which is a construct of sort shape. A tag may be declared more than once, provided all declarations and definitions are consistent (including agreement of whether the tag is an identity, a variable or common).

These basic declarations may be modified in the ways outlined above to specify the external tag name.

3.13. Tag definitions

The basic forms of a tag definition are:

	( make_id_tagdef tag_id def )
	( make_var_tagdef tag_id info def )
	( common_tagdef tag_id info def )

where the tag tag_id is defined to be an identity, variable or common tag with value def, which is a construct of sort exp. Non-identity tag definitions also have an optional access construct, info. A tag must have been declared before it is defined, but may be defined any number of times. All declarations and definitions must be consistent (except that common tags may be defined inconsistently) and agree on whether the tag is an identity, a variable, or common.

These basic definitions may be modified in the ways outlined above to specify the external tag name.

4. Shape checking

The input in read (and to a lesser extent decode) mode is checked for shape correctness if the -check or -c flag is passed to tnc. This is not guaranteed to pick up all shape errors, but is better than nothing.

When in write mode the results of the shape checking may be viewed by passing the -cv flag to tnc. Each expression is associated with its shape by means of the:

	( exp_with_shape exp shape ) -> exp

pseudo-construct. Unknown shapes are indicated by ....

5. Remarks

The target independent TDF capsules produced by the C -> TDF compiler, tcc, do not contain declarations or definitions for all the tokens they use. Thus tnc cannot fully decode them as they stand. However the necessary token declaration information may be made available to tnc by using the use_lib construct. The commands:

	( use_lib library )
	( code capsule )

will decode the TDF capsule capsule which uses tokens defined in the TDF library library.

6. Limitations

The main limitations in the current version of tnc are as follows:

There is no error recovery,
There is no support for foreign sorts,
The support for tokenised tokens is limited and undocumented.

In addition, far more of the checks (scopes, shape checking, checking of consistency of declarations and definitions etc.) are implemented for read mode rather than decode mode. To shape check a TDF capsule, it will almost certainly be more effective to translate it into text and check that.

Another limitation is that the scoping rules for local tags do not allow such tags to be accessed outside their scopes using env_offset.

7. Manual Page for tnc

Here is the manual page for tnc.

NAME: tnc - TDF notation compiler

SYNOPSIS: tnc [ options ] input-file [ output-file ]

DESCRIPTION: tnc translates TDF capsules to and from text. It has two input modes, read and decode. In the first, which is default, input-file is a file containing TDF text. In the second input-file is a TDF capsule. There are also two output modes, encode and write. In the first, which is default, a TDF capsule is written to output-file (or the standard output if this argument is absent). In the second, TDF text is written to output-file.

Combination of these modes give four actions: text to TDF (which is default), TDF to text, text to text and TDF to TDF. The last two actions are not precise identities, but they do give equivalent files.

The form of the TDF text format and more information about tnc can be found in the document The TDF Notation Compiler.

OPTIONS:

-c or -cv or -check Specifies that tnc should apply extra checks to input-file. For example, simple shape checking is applied. These checks are more efficient in read mode than in decode mode. If the -cv option is used in write mode, all the information gleaned from the shape checking appears in output-file.

-d or -decode Specifies that tnc should be in decode mode. That is, that input-file is a TDF capsule.

-e or -encode Specifies that tnc should be in encode mode. That is, that output-file is a TDF capsule.

-help subject ... Makes tnc print its help message on the given subject(s). If no subject is given, all the help messages are printed.

-Idir Adds the directory dir to the search path used by tnc to find included files in read mode.

-l or -lib In decode mode, specifies that input-file is not a TDF capsule, but a TDF library. All the capsules comprising the library are decoded.

-o output-file Gives an alternative method of specifying the output file.

-p or -print Specifies that tnc should be in decode and write modes. That is, that input-file is a TDF capsule and output-file should consist of TDF text. This option makes tnc into a TDF pretty-printer.

-q Specifies that tnc should not check duplicate tag declarations etc for consistency, but should use the first declaration given.

-r or -read Specifies that tnc should be in read mode. That is, that input-file should consist of TDF text.

-V In write mode, specifies that the output should be in the "verbose" form, with no shorthand forms.

-version Makes tnc print its version number.

-w or -write Specifies that tnc should be in write mode. That is, that output-file should consist of TDF text.

SEE ALSO: tdf(1tdf).