TDF Guide, Issue 4.0

January 1998

2.1 - Token applications and first-class SORTs
2.2 - Token definitions and declarations
2.3 - A simple use of a TOKEN
2.4 - Second class SORTs

2 SORTs and TOKENs

In the syntax of language like C or Pascal, we find various syntactic units like <Expression>, <Identifier> etc. A SORT bears the same relation to TDF as these syntactic units bear to the language; roughly speaking, the syntactic unit <Expression> corresponds to the SORT EXP and <Identifier> to TAG . However, instead of using BNF to compose syntactic units from others, TDF uses explicit constructors to compose its SORTs; each constructor uses other pieces of TDF of specified SORTs to make a piece of its result SORT. For example, the constructor plus uses an ERROR_TREATMENT and two EXPs to make another EXP.

At the moment, there are 58 different SORTS, from ACCESS to VARIETY given in tables 1 and 2. Some of these have familiar analogues in standard language construction as with EXP and TAG above. Others will be less familiar since TDF must concern itself with issues not normally addressed in language definitions. For example, the process of linking together TDF programs is at the root of the architecture neutrality of TDF and so must form an integral part of its definition. On the other hand, TDF is not meant to be a language readily accessible to the human reader or writer; computers handle it much more easily. Thus a great many choices have been made in the definition which would be intolerable in a standard language definition for the human programmer but which, paradoxically enough, make it much simpler for a computer to produce and analyse TDF

The SORTs and constructors in effect form a multi-sorted algebra. There were two principal reasons for choosing this algebraic form of definition. First, it is easy to extend - a new operation on existing constructs simply requires a new constructor. Secondly, the algebraic form is highly amenable to the automatic construction of programs. Large parts of both TDF producers and TDF translators have been created by automatic transformation of the text of the specification document itself, by extracting the algebraic signature and constructing C program which can read or produce TDF. To this extent, one can regard the specification document as a formal description of the free algebra of TDF SORTs and constructors. Of course, most of the interesting parts of the definition of TDF lies in the equivalences of parts of TDF, so this formality only covers the easy bit.

Another distinction between the TDF definition and language syntactic description is that TDF is to some extent conscious of its own SORTs so that it can specify a new construction of a given SORT. The analogy in normal languages would be that one could define a new construction with new syntax and say this is an example of an <Expression>, for example; I don't know of any standard language which permits this, although those of you with a historical bent might remember Algol-N which made a valiant attempt at it. Of course, the algebraic method of description makes it much easier to specify, rather than having to give syntax to provide the syntax for the new construction in a language.

2.1. Token applications and first-class SORTs

A new construction is introduced by the SORT TOKEN; the constructors involving TOKENs allow one to give an expansion for the TOKEN in terms of other pieces of TDF, possibly including parameters. We can encapsulate a (possibly parameterised) fragment of TDF of a suitable SORT by giving it a TOKEN as identification. Not all of the SORTs are available for this kind of encapsulation - only those which have a SORTNAME constructor (from access to variety). These are the "first-class" SORTs given in table 1 on page 8. Each of these have an appropriate _apply_token constructor (e.g. exp_apply_token ) give the expansion. Most of these also have _cond constructors (e.g.see exp_cond in section 9.1 on page 43) which allows translate time conditional expansion of the SORT.

Every TOKEN has a result SORT, i.e. the SORT of its resulting expansion and before it can be expanded, one must have its parameter SORTs. Thus, you can regard a TOKEN as having a type defined by its result and parameter SORTs and the _apply_token as the operator which expands the encapsulation and substitutes the parameters.

However, if we look at the signature of exp_apply_token:

	token_value:	TOKEN 
	token_args:	BITSTREAM param_sorts(token_value)
		   -> EXP x

we are confronted by the mysterious BITSTREAM where one might expect to find the actual parameters of the TOKEN.

To explain BITSTREAMs requires a diversion into the bit-encoding of TDF. Constructors for a particular SORT are represented in a number of bits depending on the number of constructors for that SORT; the context will determine the SORT required, so no more bits are required. Thus since there is only one constructor for UNITs, no bits are required to represent make_unit; there are about 120 different constructors for EXPs so 7 bits are required to cover all the EXPs. The parameters of each constructor have known SORTs and so their representations are just concatenated after the representation of the constructor *. While this is a very compact representation, it suffers from the defect that one must decode it even just to skip over it. This is very irksome is some applications, notably the TDF linker which is not interested detailed expansions. Similarly, in translators there are places where one wishes to skip over a token application without knowledge of the SORTs of its parameters. Thus a BITSTREAM is just an encoding of some TDF, preceded by the number of bits it occupies. Applications can then skip over BITSTREAMs trivially. Similar considerations apply to BYTESTREAMs used elsewhere; here the encoding is preceded by the number of bytes in the encoding and is aligned to a byte boundary to allow fast copying.

2.2. Token definitions and declarations

Thus the token_args parameter of exp_apply_token is just the BITSTREAM formed from the actual parameters in the sequence described by the definition of the token_value parameter. This will be given in a TOKEN_DEFN somewhere with constructor token_definition:

	result_sort:	SORTNAME
	tok_params:	LIST(TOKFORMALS)
	body:	result_sort
		   -> TOKEN_DEFN

The result_sort is the SORT of the construction of body; e.g. if result_sort is formed from exp then body would be constructed using the EXP constructors and one would use exp_apply_token to give the expansion. The list tok_params gives the formal parameters of the definition in terms of TOKFORMALS constructed using make_tok_formals:

	sn:	SORTNAME
	tk:	TDFINT
		   -> TOKFORMALS

The TDFINT tk will be the integer representation of the formal parameter expressed as a TOKEN whose result sort is sn (see more about name representation in section 3.1 on page 13). To use the parameter in the body of the TOKEN_DEFN, one simply uses the _apply_token appropriate to sn.Note that sn may be a TOKEN but the result_sort may not.

Hence the BITSTREAM param_sorts(token_value) in the actual parameter of exp_apply_token above is simply formed by the catenation of constructions of the SORTs given by the SORTNAMEs in the tok_params of the TOKEN being expanded.

Usually one gives a name to a TOKEN_DEFN using to form a TOKDEF using make_tokdef :

	tok:	TDFINT
	signature:	OPTION(STRING)
	def:	BITSTREAM TOKEN_DEFN
		   -> TOKDEF

Here, tok gives the name that will be used to identify the TOKEN whose expansion is given by def. Any use of this TOKEN (e.g. in exp_apply_token) will be given by make_token(tok ) . Once again, a BITSTREAM is used to encapsulate the TOKEN_DEFN.

The significance of the signature parameter is discussed in section 3.2.2 on page 18.

Often, one wishes a token without giving its definition - the definition could, for example, be platform-dependent. A TOKDEC introduces such a token using make_tokdec:

	tok:	TDFINT
	signature:	OPTION(STRING)
	s:	SORTNAME
		   -> TOKDEC

Here the SORTNAME, s, is given by token:

	result:	SORTNAME
	params:	LIST(SORTNAME)
		   -> SORTNAME

which gives the result and parameter SORTs of tok.

One can also use a TOKEN_DEFN in an anonymous fashion by giving it as an actual parameter of a TOKEN which itself demands a TOKEN parameter. To do this one simply uses use_tokdef :

	tdef:	BITSTREAM TOKEN_DEFN
		   -> TOKEN

2.3. A simple use of a TOKEN

The crucial use of TOKENs in TDF is to provide abstractions of APIs (see section 10 on page 45) but they are also used as shorthand for commonly occurring constructions. For example, given the TDF constructor plus, mentioned above, we could define a plus with only two EXP parameters more suitable to C by using the wrap constructor as the ERROR_TREATMENT:

make_tokdef (C_plus, empty,
	token_definition( 
		exp(), 
		(make_tokformals(exp(), l), make_tokformals(exp(), r)),
		plus(wrap(), exp_apply_token(l, ()), exp_apply_token(r,())
	)
)

2.4. Second class SORTs

Second class SORTs (given in table 2 on page 11) cannot be TOKENised. These are the "syntactic units" of TDF which the user cannot extend; he can only produce them using the constructors defined in core-TDF.

Some of these constructors are implicit. For example, there are no explicit constructors for LIST or SLIST which are both used to form lists of SORTs; their construction is simply part of the encoding of TDF. However, it is forseen that LIST constructors would be highly desireable and there will probably extensions to TDF to promote LIST from a second-class SORT to a first-class one. This will not apply to SLIST or to the other SORTs which have implicit constructions. These include BITSTREAM, BYTESTREAM, TDFINT, TDFIDENT and TDFSTRING.