The TDF notation compiler,
tnc, is a tool for translating
TDF capsules to and from text. This paper gives a brief introduction
to how to use this utility and the syntax of the textual form of TDF.
The version here described is that supporting version 3.1 of the TDF
tnc has four modes, two input modes and two output modes.
These are as follows:
Due to the modular nature of the program it is possible to form versions
tnc in which not all the modes are available. Passing
the -version flag to
tnc causes it to report which
modes it has implemented.
Any application of
tnc consists of the composite of an
input mode and an output mode. The default action is read-encode,
i.e. translate an input test file into an output TDF capsule. Other
modes may be specified by passing the following command line options
The only other really useful action is decode-write, i.e. translate an input TDF capsule into an output text file. This may also be specified by the -print or -p option. The actions decode-encode and read-write are not precise identities, they do however give equivalent input and output files.
In addition, the decode mode may be modified to accept a TDF library as input rather than a TDF capsule by passing the addition flag:
The overall syntax for
tnc is as follows:
tnc [ options ... ] input_file [ output_file ]If the output file is not specified, the standard output is used.
The rest of this paper is concerned with the form required of the input text file. The input can be divided into eight classes.
) are used as delimiters
to impose a syntactic structure on the input.
White space comprises sequences of space, tab and newline characters, together with comments (see below). It is not significant to the output (TDF notation is completely free-form), and serves only to separate syntactic units. Every identifier, number etc. must be terminated by a white space or a delimiter.
Comments may be inserted in the input at any point. They begin with
# character and run to the end of the line.
An identifier consists of any sequence of characters drawn from the
following set: upper case letters, lower case letters, decimal digits,
_), dot (
.), and tilde (
which does not begin with a decimal digit.
names beginning with double tilde (
~~) for unnamed objects
when in decode mode, so the use of such identifiers is not
Numbers can be given in octal (prefixed by
or hexadecimal (prefixed by
upper and lower case letters can be used for hex digits. A number
can be preceded by any number of
A string consists of a sequence of characters enclosed in double quotes
"). The following escape sequences are recognised:
\n represents a newline character,
\t represents a tab character,
xxx consists of three octal
digits, represents the character with ASCII code
Newlines are not allowed in strings unless they are escaped. For all
other escaped characters,
A single minus character (
-) has a special meaning. It
may be used to indicate the absence of an optional argument or optional
group of arguments.
A single vertical bar (
|) has a special meaning. It may
be used to indicate the end of a sequence of repeated arguments.
The basic input syntax is very simple. A construct consists of an identifier followed by a list of arguments, all enclosed in brackets in a Lisp-like fashion. Each argument can be an identifier, a number, a string, a blank, a bar, or another construct. There are further restrictions on this basic syntax, described below.
construct : ( identifier arglist ) argument : construct | identifier | number | string | blank | bar arglist : (empty) | argument arglist
( identifier ), with an empty argument
list, is equivalent to the identifier argument
The two may be used interchangeably.
Except at the outermost level, which forms a special case discussed
below, every construct and argument has an associated sort. This is
one of the basic TDF sorts:
Ignoring for the moment the shorthands discussed below, the ways of
creating constructs of sort
exp say, correspond to the
TDF constructs delivering an
exp. For example,
shape and an
exp and delivers
( contents arg1 arg2 )where
arg1 is an argument of sort
arg2 is an argument of sort
a sort-correct construct. Only constructs which are sort correct in
this sense are allowed.
As another example, because of the rule concerning constructs with no arguments, both
( true )and
falseare valid constructs of sort
TDF constructs which take lists of arguments are easily dealt with. For example:
( make_nof arg1 ... argn )where
argn are all arguments
exp, is valid. A vertical bar may be used to
indicate the end of a sequence of repeated arguments.
Optional arguments should be entered normally if they are present. Their absence may be indicated by means of a blank (minus sign), or by simply omitting the argument.
The vertical bar and blank should be used whenever the input is potentially
ambiguous. Particular care should be taken with
(which is genuinely ambiguous) and
The TDF specification should be consulted for a full list of valid
TDF constructs and their argument sorts. Alternatively the
help facility may be used. The command:
tnc -help cmd1 ... cmdnprints sort information on the constructs or sorts
tnc -helpprints this information for all constructs. (To obtain help on the sort
alignment as opposed to the construct
alignment_sort. This confusion cannot occur elsewhere.)
Numbers can occur in two contexts, as the argument to the TDF constructs
make_signed_nat. In the former
case the number must be positive. The following shorthands are understood
number for ( make_nat number ) number for ( make_signed_nat number )depending on whether a construct of sort
Strings are nominally of sort
tdfstring. They are taken
to be simple strings (8 bits per character). Multibyte strings (those
with other than 8 bits per character) may be represented by means
multi_string construct. This takes the form:
( multi_string b c1 ... cn )where
b is the number of bits per character and
cn are the codes of the characters comprising the
string. These multibyte strings cannot be used as external names.
In addition, a simple (8 bit) string can be used as a shorthand for
a TDF construct of sort
string, as follows:
string for ( make_string string )
In TDF simple tokens, tags, alignment tags and labels are represented
by numbers which may, or may not, be associated with external names.
tnc however they are represented by identifiers. This
brings the problem of scoping which does not occur in TDF. The rules
are that all tokens, tags, alignment tags and labels must be declared
before they are used. Externally defined objects have global scope,
and the scope of a formal argument in a token definition is the definition
body. For those constructs which introduce a local tag or label -
variable for tags and
repeat for labels - the scope
of the object is as set out in the TDF specification.
The following shorthands are understood by
to the argument sort expected:
tag_id for ( make_tag tag_id ) al_tag_id for ( make_al_tag al_tag_id ) label_id for ( make_label label_id )
The syntax for token applications is as follows:
( apply_construct ( token_id arg1 ... argn ) )where
apply_construct is the appropriate TDF token application
construct, for example,
exp_apply_token for tokens declared
exp's. The token arguments
argn must be of the sorts indicated in the token
declaration or definition. For tokens without any arguments the alternative
( apply_construct token_id )is allowed.
The token application above may be abbreviated to:
( token_id arg1 ... argn )the result sort being known from the token declaration. This in turn may be abbreviated to:
token_idwhen there are no token arguments.
Care needs to be taken with these shorthands, as they can lead to
confusion, particularly when, due to optional arguments or lists of
tnc is not sure what sort is coming next.
The five categories of objects represented by identifiers - TDF constructs,
tokens, tags, alignment tags and labels - occupy separate name spaces,
but it is a good idea to try to avoid duplication of names.
By default all these shorthands are used by
tnc in write
mode. If this causes problems, the -V flag should be passed
At the outer level
tnc is expecting a sequence of constructs
of the following forms:
Included files may be of three types - text, TDF capsule or TDF library. For TDF capsules and libraries there are two include modes. The first just decodes the given capsule or set of capsules. The second scans through them to extract token declaration information. These declarations appear in the output file only if they are used elsewhere.
The syntax for an included text file is:
( include string )where
string is a string giving the pathname of the file
to be included.
tnc applies read to this sub-file
before continuing with the present file.
Similarly, the syntaxes for included TDF capsules and libraries are:
( code string ) ( lib string )respectively.
tnc applies decode to this capsule
or set of capsules (provided this mode is available) before continuing
with the present file.
The syntaxes for extracting the token declaration information from a TDF capsule or library are:
( use_code string ) ( use_lib string )Again, these rely on the decode mode being available.
All tokens, tags and alignment tags have an internal name, namely the associated identifier, but this name does not necessarily appear in the corresponding TDF capsule. There must firstly be an associated declaration or definition at the outer level - tags internal to a piece of TDF do not have external names. Even then we may not wish this name to appear at the outer level, because it is local to this file and is not required for linking purposes. Alternatively we may wish a different external name to be associated with it in the TDF capsule.
As an example of how
tnc allows for this, consider token
declarations (although similar remarks apply to token definitions,
alignment tag definitions etc.). The basic form of the token declaration
( make_tokdec token_id ... )This creates a token with both internal and external names equal to
( local make_tokdec token_id ... )creates a token with internal name
token_id but no external
name. This allows the creation of tokens local to the current file.
( make_tokdec ( string_extern string ) token_id ... )creates a token with internal name
token_id and external
name given by the string
string. For example, to create
a token whose external name is not a valid identifier, it would be
necessary to use this construct. Finally:
( make_tokdec ( unique_extern string1 ... stringn ) token_id ... )creates a token with internal name
token_id and external
name given by the unique name consisting of the strings
local quantifier should be used consistently on all
declarations and definitions of the token, tag or alignment tag. The
alternative external name should only be given on the first occasion
however. Thereafter the object is identified by its internal name.
The basic form of a token declaration is:
( make_tokdec token_id ( arg1 ... argn ) res )where the token
token_id is declared to take argument
argn and deliver the result
res. These sorts are given by their sort names,
etc. For a token with no arguments the declaration may be given in
( make_tokdec token_id res )A token may be declared any number of times, provided the declarations are consistent.
This basic declaration may be modified in the ways outlined above to specify the external token name.
The basic form of a token definition is:
( make_tokdef token_id ( arg1 id1 ... argn idn ) res def )where the token
token_id is defined to take formal arguments
idn of sorts
argn respectively and have the value
which is a construct of sort
res. The scope of the tokens
For a token with no arguments the definition may be given in the form:
( make_tokdef token_id res def )A token may be defined more than once. All definitions must be consistent with any previous declarations and definitions (the renaming of formal arguments is allowed however).
This basic definition may be modified in the ways outlined above to specify the external token name.
The basic form of an alignment tag declaration is:
( make_al_tagdec al_tag_id )where the alignment tag
al_tag_id is declared to exist.
This basic declaration may be modified in the ways outlined above to specify the external alignment tag name.
The basic form of an alignment tag definition is:
( make_al_tagdef al_tag_id def )where the alignment tag
al_tag_id is defined to be
which is a construct of sort
alignment. An alignment
tag may be declared or defined more than once, provided the definitions
This basic definition may be modified in the ways outlined above to specify the external alignment tag name.
The basic forms of a tag declaration are:
( make_id_tagdec tag_id info dec ) ( make_var_tagdec tag_id info dec ) ( common_tagdec tag_id info dec )where the tag
tag_id is declared to be an identity, variable
or common tag with access information
info, which is
an optional construct of sort
access, and shape
which is a construct of sort
shape. A tag may be declared
more than once, provided all declarations and definitions are consistent
(including agreement of whether the tag is an identity, a variable
These basic declarations may be modified in the ways outlined above to specify the external tag name.
The basic forms of a tag definition are:
( make_id_tagdef tag_id def ) ( make_var_tagdef tag_id info def ) ( common_tagdef tag_id info def )where the tag
tag_id is defined to be an identity, variable
or common tag with value
def, which is a construct of
exp. Non-identity tag definitions also have an optional
info. A tag must have
been declared before it is defined, but may be defined any number
of times. All declarations and definitions must be consistent (except
that common tags may be defined inconsistently) and agree on whether
the tag is an identity, a variable, or common.
These basic definitions may be modified in the ways outlined above to specify the external tag name.
The input in read (and to a lesser extent decode) mode
is checked for shape correctness if the -check or -c
flag is passed to
tnc. This is not guaranteed to pick
up all shape errors, but is better than nothing.
When in write mode the results of the shape checking may be
viewed by passing the -cv flag to
tnc. Each expression
is associated with its shape by means of the:
( exp_with_shape exp shape ) -> exppseudo-construct. Unknown shapes are indicated by
The target independent TDF capsules produced by the C -> TDF compiler,
tcc, do not contain declarations or definitions for all
the tokens they use. Thus
tnc cannot fully decode them
as they stand. However the necessary token declaration information
may be made available to
tnc by using the use_lib
construct. The commands:
( use_lib library ) ( code capsule )will decode the TDF capsule
capsule which uses tokens
defined in the TDF library
The main limitations in the current version of
In addition, far more of the checks (scopes, shape checking, checking of consistency of declarations and definitions etc.) are implemented for read mode rather than decode mode. To shape check a TDF capsule, it will almost certainly be more effective to translate it into text and check that.
Another limitation is that the scoping rules for local tags do not
allow such tags to be accessed outside their scopes using
Here is the manual page for tnc.
tnc - TDF notation compiler
tnc [ options ] input-file [ output-file ]
tnc translates TDF capsules to and
from text. It has two input modes, read and decode.
In the first, which is default,
input-file is a file
containing TDF text. In the second
input-file is a TDF
capsule. There are also two output modes, encode and write.
In the first, which is default, a TDF capsule is written to
(or the standard output if this argument is absent). In the
second, TDF text is written to
Combination of these modes give four actions: text to TDF (which is default), TDF to text, text to text and TDF to TDF. The last two actions are not precise identities, but they do give equivalent files.
The form of the TDF text format and more information about
can be found in the document The TDF Notation Compiler.
-c or -cv or -check
tnc should apply extra checks to
. For example, simple shape checking is applied. These checks
are more efficient in read mode than in decode mode.
If the -cv option is used in write mode, all the information
gleaned from the shape checking appears in
-d or -decode
tnc should be in decode mode. That
input-file is a TDF capsule.
-e or -encode
tnc should be in encode mode. That
output-file is a TDF capsule.
subject ... Makes
its help message on the given subject(s). If no subject is given,
all the help messages are printed.
Adds the directory
dir to the search path used by
to find included files in read mode.
-l or -lib
In decode mode, specifies that
input-file is not
a TDF capsule, but a TDF library. All the capsules comprising the
library are decoded.
Gives an alternative method of specifying the output file.
-p or -print
tnc should be in decode and write
modes. That is, that
input-file is a TDF capsule and
output-file should consist of TDF text. This option makes
tnc into a TDF pretty-printer.
tnc should not check duplicate tag declarations
etc for consistency, but should use the first declaration given.
-r or -read
tnc should be in read mode. That
input-file should consist of TDF text.
-V In write mode, specifies that the output should be in the "verbose" form, with no shorthand forms.
tnc print its version number.
-w or -write
tnc should be in write mode. That
output-file should consist of TDF text.
SEE ALSO: tdf(1tdf).
Part of the TenDRA Web.
Crown Copyright © 1998.