TDF Specification, Issue 4.0

January 1998

7.1 - Binding

7.2 - Character codes

7.3 - Constant evaluation

7.4 - Division and modulus

7.5 - Equality of EXPs

7.6 - Equality of SHAPEs

7.7 - Equality of ALIGNMENTS

7.8 - Exceptions and jumps

7.13 - Memory Model

7.13.1 - Simple model
7.13.2 - Comparison of pointers and offsets
7.13.3 - Circular types in languages
7.13.4 - Special alignments
7.13.5 - Atomic assignment

7.14 - Order of evaluation

7.15 - Original pointers

7.16 - Overlapping

7.17 - Incomplete assignment

7.18 - Representing integers

7.19 - Overflow and Integers

7.20 - Representing floating point

7.21 - Floating point errors

7.22 - Rounding and floating point

7.23 - Floating point accuracy

7.24 - Representing bitfields

7.25 - Permitted limits

7.26 - Least Upper Bound

7.27 - Read-only areas

7.28 - Tag and Token signatures

7.29 - Dynamic initialisation

7. Notes

7.1. Binding

The following constructions introduce TAGs: identify, variable, make_proc, make_general_proc, make_id_tagdec, make_var_tagdec, common_tagdec.

During the evaluation of identify and variable a value, v, is produced which is bound to the TAG during the evaluation of an EXP or EXPs. The TAG is "in scope" for these EXPs. This means that in the EXP a use of the TAG is permissible and will refer to the declaration.

The make_proc and make_general_proc construction introduces TAGs which are bound to the actual parameters on each call of the procedure. These TAGs are "in scope" for the body of the procedure.

If a make_proc or make_general_proc construction occurs in the body of another make_proc or make_general_proc, the TAGs of the inner procedure are not in scope in the outer procedure, nor are the TAGs of the outer in scope in the inner.

The apply_general_proc construction permits the introduction of TAGs whose scope is the postlude argument. These are bound to the values of caller parameters after the evaluation of the body of the procedure.

The make_id_tagdec, make_var_tagdec and common_tagdec constructions introduce TAGs which are "in scope" throughout all the tagdef UNITs. These TAGs may have values defined for them in the tagdef UNITs, or values may be supplied by linking.

The following constructions introduce LABELs: conditional, repeat, labelled.

The construction themselves define EXPs for which these LABELs are "in scope". This means that in the EXPs a use of the LABEL is permissible and will refer to the introducing construction.

TAGs, LABELs and TOKENs (as TOKEN parameters) introduced in the body of a TOKEN definition are systematically renamed in their scope each time the TOKEN definition is applied. The scope will be completely included by the TOKEN definition.

Each of the values introduced in a UNIT will be named by a different TAG, and the labelling constructions will use different labels, so no visibility rules are needed. The set of TAGs and LABELs used in a simple UNIT are considered separately from those in another simple UNIT, so no question of visibility arises. The compound and link UNITs provide a method of relating the items in one simple UNIT to those in another, but this is through the intermediary of another set of TAGs and TOKENs at the CAPSULE level.

7.2. Character codes

TDF does not have a concept of characters. It transmits integers of various sizes. So if a producer wishes to communicate characters to an installer, it will usually have to do so by encoding them in some way as integers.

An ANSI C producer sending a TDF program to a set of normal C environments may well choose to encode its characters using the ASCII codes, an EBCDIC based producer transmitting to a known set of EBCDIC environments might use the code directly, and a wide character producer might likewise choose a specific encoding. For some programs this way of proceeding is necessary, because the codes are used both to represent characters and for arithmetic, so the particular encoding is enforced. In these cases it will not be possible to translate the characters to another encoding because the character codes will be used in the TDF as ordinary integers, which must not be translated.

Some producers may wish to transmit true characters, in the sense that something is needed to represent particular printing shapes and nothing else. These representations will have to be transformed into the correct character encoding on the target machine.

Probably the best way to do this is to use TOKENs. A fixed representation for the printing marks could be chosen in terms of integers and TOKENs introduced to represent the translation from these integers to local character codes, and from strings of integers to strings of local character codes. These definitions could be bound on the target machine and the installer should be capable of translating these constructions into efficient machine code. To make this a standard, unique TOKENs should be used.

But this raises the question, who chooses the fixed representation and the unique TOKENs and their specification? Clearly TDF provides a mechanism for performing the standardisation without itself defining a standard.

Here TDF gives rise to the need for extra standards, especially in the specification of globally named unique TOKENs.

7.3. Constant evaluation

Some constructions require an EXP argument which is "constant at install time". For an EXP to satisfy this condition it must be constructed according to the following rules after substitution of token definitions and selection of exp_cond branches.

If it contains obtain_tag then the tag will be introduced within the EXP, or defined with make_id_tagdef within the current capsule.

It may not contain any of the following constructions: apply_proc, apply_general_proc, assign_with_mode, contents_with_mode, continue, current_env, error_jump, goto_local_lv, make_local_lv, move_some, repeat, round_as_state.

Unless it is the EXP argument of a TAGDEF, a "constant at install time" may not contain env_offset or env_size.

Any use of contents or assign will be applied only to POINTERs derived from variable constructions.

If it contains labelled there will only be jumps to the LABELs from within starter, not from within any of the places.

Any use of obtain_tag defined with make_id_tagdef will occur after the end of the make_id_tagdef.

Note specifically that a constant EXP forming the defining value of a TAGDEF construct may contain env_offset and/or env_size.

7.4. Division and modulus

Two classes of division (D) and remainder (M) construct are defined. The two classes have the same definition if both operands have the same sign. Neither is defined if the second argument is zero.

Class 1:

	p D1 q = n

where:

	p = n*q + (p M1 q)
	sign(p M1 q) = sign(q)
	0 <= |p M1 q| < |q|

Class 2:

	p D2 q = n

where:

	p = n*q + (p M2 q)
	sign(p M2 q) = sign(p)
	0 <= |p M2 q| < |q|

7.5. Equality of EXPs

A definition of equality of EXPs would be a considerable part of a formal specification of TDF, and is not given here.

7.6. Equality of SHAPEs

Equality of SHAPEs is defined recursively:

Two SHAPEs are equal if they are both BOTTOM, or both TOP or both PROC.
Two SHAPEs are equal if they are both integer or both floating, or both bitfield, and the corresponding parameters are equal.
Two SHAPEs are equal if they are both NOF, the numbers of items are equal and the SHAPE parameters are equal.
Two OFFSETs or two POINTERs are equal if their ALIGNMENT parameters are pairwise equal.
Two COMPOUNDs are equal if their OFFSET EXPs are equal.
No other pairs of SHAPEs are equal.

7.7. Equality of ALIGNMENTs

Two ALIGNMENTs are equal if and only if they are equal sets.

7.8. Exceptions and jumps

TDF allows simply for labels and jumps within a procedure, by means of the conditional, labelled and repeat constructions, and the goto, case and various test constructions. But there are two more complex jumping situations.

First there is the jump, known to stay within a procedure, but to a computed destination. Many languages have discouraged this kind of construction, but it is still available in Cobol (implicitly), and it can be used to provide other facilities (see below). TDF allows it by means of the POINTER({code}). TDF is arranged so that this can usually be implemented as the address of the label. The goto_local_lv construction just jumps to the label.

The other kind of construction needed is the jump out of a procedure to a label which is still active, restoring the environment of the destination procedure: the long jump. Related to this is the notion of exception. Unfortunately long jumps and exceptions do not co-exist well. Exceptions are commonly organised so that any necessary destruction operations are performed as the stack of frames is traversed; long jumps commonly go directly to the destination. TDF must provide some facility which can express both of these concepts. Furthermore exceptions come in several different versions, according to how the exception handlers are discriminated and whether exception handling is initiated if there is no handler which will catch the exception.

Fortunately the normal implementations of these concepts provide a suggestion as to how they can be introduced into TDF. The local label value provides the destination address, the environment (produced by current_env) provides the stack frame for the destination, and the stack re-setting needed by the local label jumps themselves provides the necessary stack information. If more information is needed, such as which exception handlers are active, this can be created by producing the appropriate TDF.

So TDF takes the long jump as the basic construction, and its parameters are a local label value and an environment. Everything else can be built in terms of these.

The TDF arithmetic constructions allows one to specify a LABEL as destination if the result of the operation is exceptional. This is sufficient for the kind of explicit exception handling found in C++ and, in principle, could also be used to implement the kind of "automatic" exception detection and handling found in Ada, for example.

However many architectures have facilities for automatically trapping on exceptions without explicit testing. To take advantage of this, there is a trap ERROR_TREATMENT with associated ERROR_CODEs. The action taken on an exception with trap ERROR_TREATMENT will be to "throw" the ERROR_CODE. Since each language has its own idea of how to interpret the ERROR_CODE and handle exceptions, the onus is on the producer writer to describe how to throw an ERROR_CODE.

The producer writer must give a definition of a TOKEN ~Throw : NAT -> EXP where the NAT will be the error_val of some ERROR_CODE. The expansion of this token will be consistent with the interpretation of the relevant ERROR_CODE and the method of handling exceptions. Usually this will consist of decoding the ERROR_CODE and doing a long_jump on some globals set up by the procedure in which the exception handling takes place.

The translator writer will provide a parameterless EXP TOKEN, ~Set_signal_handler. This TOKEN will use ~Throw and must be applied before any possible exceptions. This implies that the definition of both ~Throw and ~Set_signal_handler must be bound before translation of any CAPSULE which uses them, presumeably by linking with some TDF libraries.

These tokens are specified in more detail in the companion document, TDF Token Register.

7.9. Procedures

The var_intro of a make_proc, if present, may be used under one of two different circumstances. In one circumstance, the POINTER TAG provided by the var_intro is used to access the actual var_param of an apply_proc. If this is the case, all uses of apply_proc which have the effect of calling this procedure will have the var_param option present, and they will all have precisely the same number of params as params_intro in the make_proc. The body of the make_proc can access elements of the var_param by using OFFSET arithmetic relative to the POINTER TAG. This provides a method of supplying a variable number of parameters, by composing them into a compound value which is supplied as the var_param.

However, this has proved to be unsatisfactory for the implementation of variable number of parameters in C - one cannot choose the POINTER alignment of the TAG a priori in non-prototype calls.

An alternative circumstance for using var_intro is where all uses of apply_proc which have the effect of calling this procedure may have more params present than the number of params_intro, and none of them will have their var_param option present. The body of the make_proc can access the additional params by using installer-defined TOKENs specified in the companion document TDF Token Register, analogous to the use of variable argument lists in C. A local variable v of shape ~va_list must be initialised to ~__va_start(p), where p is the POINTER obtained from the var_intro. Successive elements of the params list can then be obtained by successive applications of ~va_arg(v,s) where s is the SHAPE of element obtained. Finally, ~va_end(v) completes the use of v.

The definition of caller parameters in general procedures addesses this difficulty in a different way, by describing the layout of caller parameters qualified by PROCPROPS var_callers. This allows both the call and the body to have closely associated views of the OFFSETs within a parameter set, regardless of whether or not the particular parameter has been named. The installer-defined TOKEN ~next_caller_offset provides access to successive caller parameters, by using OFFSETs relative to the current frame pointer current_env, adjusting for any differences there may be between the closely associated views. The caller_intro list of the make_general_proc must not be empty, then the sequence of OFFSETs can start with an appropriate env_offset. Similar consideration applies to accessing within the callee parameters, using the installer-defined TOKEN ~next_callee_offset.

All uses of return, untidy_return and tail_call in a procedure will return values of the same SHAPE, and this will be the result_shape specified in all uses of apply_proc or apply_general_proc calling the procedure.

The use of untidy_return gives a generalisation of local_alloc. It extends the validity of pointers allocated by local_alloc within the immediatly enclosing procedure into the calling procedure. The original space of these pointers may be invalidated by local_free just as if it had been generated by local_alloc in the calling procedure.

The PROCPROPS check_stack may be used to check that limit set by set_stack_limit is not exceeded by the allocation of the static locals of a procedure body to be obeyed. If it is exceeded then the producer-defined TOKEN ~Throw: NAT -> EXP will be invoked as ~Throw(error_val(stack_overflow)). Note that this will not include any space generated by local_alloc; an explicit test is required to do check these.

Any SHAPE is permitted as the result_shape in an apply_proc or apply_general_proc.

7.10. Frames

TDF states that while a particular procedure activation is current, it is possible to create a POINTER, by using current_env, which gives access to all the declared variables and identifications of the activation which are alive and which have been marked as visible. The construction env_offset gives the OFFSET of one of these relative to such a POINTER. These constructions may serve for several purposes.

One significant purpose is to implement such languages as Pascal which have procedures declared inside other procedures. One way of implementing this is by means of a "display", that is, a tuple of frame pointers of active procedures.

Another purpose is to find active variables satisfying some criterion in all the procedure activations. This is commonly required for garbage collection. TDF does not force the installer to implement a frame pointer register, since some machines do not work best in this way. Instead, a frame pointer is created only if required by current_env. The implication of this is that this sort of garbage collection needs the collaboration of the producer to create TDF which makes the correct calls on current_env and env_offset and place suitable values in known positions.

Programs compiled especially to provide good diagnostic information can also use these operations.

In general any program which wishes to manipulate the frames of procedures other than the current one can use current_env and env_offset to do so.

A frame consists of three components, the caller parameters, callee parameters and locals of the procedure involved. Since each component may have different internal methods of access within the frame, each has a different special frame alignment associated with pointers within them. These are callers_alignment, callees_alignment and locals_alignment. The POINTER produced by current_env will be some union of these special alignments depending on how the procedure was defined.

Each of these frame alignments are considered to contain any ALIGNMENT produced by alignment from any SHAPE. Note that this does not say that they are the set union of all such ALIGNMENTs. This is because the interpretation of pointer and offset operations (notably add_to_pointer) may be different depending on the implementation of the frames; they may involve extra indirections.

Accordingly, because of the constraints on add_to_ptr, an OFFSET produced by env_offset can only be added to a POINTER produced by current_env. It is a further constraint that such an OFFSET will only be added to a POINTER produced from current_env used on the procedure which declared the TAG.

7.11. Lifetimes

TAGs are bound to values during the evaluation of EXPs, which are specified by the construction which introduces the TAG. The evaluation of these EXPs is called the lifetime of the activation of the TAG.

Note that lifetime is a different concept from that of scope. For example, if the EXP contains the application of a procedure, the evaluation of the body of the procedure is within the lifetime of the TAG, but the TAG will not be in scope.

A similar concept applies to LABELs.

7.12. Alloca

The constructions involving alloca (last_local, local_alloc, local_free, local_free_all) as well as the untidy_return construction imply a stack-like implementation which is related to procedure calls. They may be implemented using the same stack as the procedure frames, if there is such a stack, or it may be more convenient to implement them separately. However note that if the alloca mechanism is implemented as a stack, this may be an upward or a downward growing stack.

The state of this notional stack is referred to here as the alloca state. The construction local_alloc creates a new space on the alloca stack, the size of this space being given by an OFFSET. In the special case that this OFFSET is zero, local_alloc in effect gives the current alloca state (normally a POINTER to the top of the stack).

A use of local_free_all returns the alloca state to what it was on entry to the current procedure.

The construction last_local gives a POINTER to the top item on the stack, but it is necessary to give the size of this (as an OFFSET) because this cannot be deduced if the stack is upward growing. This top item will be the whole of an item previously allocated with local_alloc.

The construction local_free returns the state of the alloca machine to what it was when its parameter POINTER was allocated. The OFFSET parameter will be the same value as that with which the POINTER was allocated.

The ALIGNMENT of the POINTER delivered by local_alloc is alloca_alignment. This shall include the set union of all the ALIGNMENTs which can be produced by alignment from any SHAPE.

The use of alloca_alignment arises so that the alloca stack can hold any kind of value. The sizes of spaces allocated must be rounded up to the appropriate ALIGNMENT. Since this includes all value ALIGNMENTs a value of any ALIGNMENT can be assigned into this space. Note that there is no necessary relation with the special frame alignments (see section 7.10) though they must both contain all the ALIGNMENTs which can be produced by alignment from any SHAPE.

Stack pushing is local_alloc. Stack popping can be performed by use of last_local and local_free. Remembering the state of the alloca stack and returning to it can be performed by using local_alloc with a zero OFFSET and local_free.

Note that stack pushing can also be achieved by the use of a procedure call with untidy_return.

A transfer of control to a local label by means of goto, goto_local_lv, any test construction or any error_jump will not change the alloca stack.

If an installer implements identify and variable by creating space on a stack when they come into existence, rather than doing the allocation for identify and variable at the start of a procedure activation, then it may have to consider making the alloca stack into a second stack.

7.13. Memory Model

The layout of data in memory is entirely determined by the calculation of OFFSETs relative to POINTERs. That is, it is determined by OFFSET arithmetic and the add_to_ptr construction.

A POINTER is parameterised by the ALIGNMENT of the data indicated. An ALIGNMENT is a set of all the different kinds of basic value which can be indicated by a POINTER. That is, it is a set chosen from all VARIETYs, all FLOATING_VARIETYs, all BITFIELD_VARIETYs, proc, code, pointer and offset. There are also three special ALIGNMENTs, frame_alignment, alloca_alignment and var_param_alignment.

The parameter of a POINTER will not consist entirely of BITFIELD_VARIETYs.

The implication of this is that the ALIGNMENT of all procedures is the same, the ALIGNMENT of all POINTERs is the same and the ALIGNMENT of all OFFSETs is the same.

At present this corresponds to the state of affairs for all machines. But it is certainly possible that, for example, 64-bit pointers might be aligned on 64-bit boundaries while 32-bit pointers are aligned on 32-bit boundaries. In this case it will become necessary to add different kinds of pointer to TDF. This will not present a problem, because, to use such pointers, similar changes will have to be made in languages to distinguish the kinds of pointer if they are to be mixed.

The difference between two POINTERs is measured by an OFFSET. Hence an OFFSET is parameterised by two ALIGNMENTs, that of the starting POINTER and that of the end POINTER. The ALIGNMENT set of the first must include the ALIGNMENT set of the second.

The parameters of an OFFSET may consist entirely of BITFIELD_VARIETYs.

The operations on OFFSETs are subject to various constraints on ALIGNMENTs. It is important not to read into offset arithmetic what is not there. Accordingly some rules of the algebra of OFFSETs are given below.

offset_add is associative.
offset_mult corresponds to repeated offset_addition.
offset_max is commutative, associative and idempotent.
offset_add distributes over offset_max where they form legal expressions.
offset_test(prob, >= , a, b) continues if offset_max(a,b) = a.

7.13.1. Simple model

An example of the representation of OFFSET arithmetic is given below. This is not a definition, but only an example. In order to make this clear a machine with bit addressing is hypothesized. This machine is referred to as the simple model.

In this machine ALIGNMENTs will be represented by the number by which the bit address of data must be divisible. For example, 8-bit bytes might have an ALIGNMENT of 8, longs of 32 and doubles of 64. OFFSETs will be represented by the displacement in bits from a POINTER. POINTERs will be represented by the bit address of the data. Only one memory space will exist. Then in this example a possible conforming implementation would be as follows.

add_to_ptr is addition.
offset_add is addition.
offset_div and offset_div_by_int are exact division.
offset_max is maximum.
offset_mult is multiply.
offset_negate is negate.
offset_pad(a, x) is ((x + a - 1) / a) * a
offset_subtract is subtract.
offset_test is integer_test.
offset_zero is 0.
shape_offset(s) is the minimum number of bits needed to be moved to move a value of SHAPE s.

Note that these operations only exist where the constraints on the parameters are satisfied. Elsewhere the operations are undefined.

All the computations in this representation are obvious, but there is one point to make concerning offset_max, which has the following arguments and result.

	arg1:		EXP OFFSET(x, y)
	arg2:		EXP OFFSET(z, y)
		   -> EXP OFFSET(unite_alignments(x, z), y)

The SHAPEs could have been chosen to be:

	arg1:		EXP OFFSET(x, y)
	arg2:		EXP OFFSET(z, t)
		   -> EXP OFFSET(unite_alignments(x, z),
				 intersect_alignments(y, t))

where unite_alignments is set union and intersect_alignments is set intersection. This would have expressed the most general reality. The representation of unite_alignments(x, z) is the maximum of the representations of x and z in the simple model. Unfortunately the representation of intersect_alignments(y, t) is not the minimum of the representations of y and t. In other words the simple model representation is not a homomorphism if intersect_alignments is used. Because the choice of representation in the installer is an important consideration the actual definition was chosen instead. It seems unlikely that this will affect practical programs significantly.

The ALIGNMENT of a parameter of SHAPE s is given by parameter_alignment(s) which will always contain alignment(s).

The other five special ALIGNMENTs are alloca_alignment, callees_alignment, callers_alignment, locals_alignment and var_param_alignment. Each of these contains the set union of all the ALIGNMENTs which can be produced by alignment from any SHAPE. But they need not be equal to that set union, nor need there be any relation between them.

In particular they are not equal (in the sense of Equality of ALIGNMENTs).

Each of these five refer to alignments of various components of a frame.

Notice that pointers and offsets such as POINTER(callees_alignment(true)) and OFFSET(callees_alignment(true), x) etc. can have some special representation and that add_to_ptr and offset_add can operate correctly on these representations. However it is necessary that

	alignment(POINTER(A))={pointer}

for any ALIGNMENT A.

7.13.5. Atomic assignment

At least one VARIETY shall exist such that assign and assign_with_mode are atomic operations. This VARIETY shall be specified as part of the installer specification. It shall be capable of representing the numbers 0 to 127.

Note that it is not necessary for this to be the same VARIETY on each machine. Normal practice will be to use a TOKEN for this VARIETY and choose the definition of the TOKEN on the target machine.

7.14. Order of evaluation

The order of evaluation is specified in certain constructions in terms of equivalent effect with a canonical order of evaluation. These constructions are conditional, identify, labelled, repeat, sequence and variable. Let these be called the order-specifying constructions.

The constructions which change control also specify a canonical order. These are apply_proc, apply_general_proc, case, goto, goto_local_lv, long_jump, return, untidy_return, return_to_label, tail_call, the test constructions and all instructions containing the error_jump and trap ERROR_TREATMENTs.

The order of evaluation of the components of other constructions is as follows. The components may be evaluated in any order and with their components - down to the TDF leaf level - interleaved in any order. The constituents of the order specifying constructions may also be interleaved in any order, but the order of the operations within an order specifying operation shall be equivalent in effect to a canonical order.

Note that the rule specifying when error_jumps or traps are to be taken (error_jump) relaxes the strict rule that everything has to be "as if" completed by the end of certain constructions. Without this rule pipelines would have to stop at such points, in order to be sure of processing any errors. Since this is not normally needed, it would be an expensive requirement. Hence this rule. However a construction will be required to force errors to be processed in the cases where this is important.

7.15. Original pointers

Certain constructions are specified as producing original pointers. They allocate space to hold values and produce pointers indicating that new space. All other pointer values are derived pointers, which are produced from original pointers by a sequence of add_to_ptr operations. Counting original pointers as being derived from themselves, every pointer is derived from just one original pointer.

A null pointer is counted as an original pointer.

If procedures are called which come from outside the TDF world (such as calloc) it is part of their interface with TDF to state if they produce original pointers, and what is the lifetime of the pointer.

As a special case, original pointers can be produced by using current_env and env_offset (see current_env).

Note that:

	add_to_ptr(p, offset_add(q, r))

is equivalent to:

	add_to_ptr(add_to_ptr(p, q), r)

In the case that p is the result of current_env and q is the result of env_offset:

	add_to_ptr(p, q)

is defined to be an original pointer. For any such expression q will be produced by env_offset applied to a TAG introduced in the procedure in which current_env was used to make p.

7.16. Overlapping

In the case of move_some, or assign or assign_with_mode in which arg2 is a contents or contents_with_mode, it is possible that the source and destination of the transfer might overlap.

In this case, if the operation is move_some or assign_with_mode and the TRANSFER_MODE contains overlap, then the transfer shall be performed correctly, that is, as if the data were copied from the source to an independent place and then to the destination.

In all cases, if the source and destination do not overlap the transfer shall be performed correctly.

Otherwise the effect is undefined.

7.17. Incomplete assignment

If the arg2 component of an assign or assign_with_mode operation is left by means of a jump, the question arises as to what value is in the destination of the transfer.

If the TRANSFER_MODE complete is used, the destination shall be left unchanged if the arg2 component is left by means of a jump. If complete is not used and arg2 is left by a jump, the destination may be affected in any way.

7.18. Representing integers

Integer VARIETYs shall be represented by a range of integers which includes those specified by the given bounds. This representation shall be twos-complement.

If the lower bound of the VARIETY is non-negative, the representing range shall be from 0 to 2⁸ⁿ-1 for some n. n is called the number of bytes in the representation. The number of bits in the representation is 8n.

If the lower bound of the VARIETY is negative the representing range shall be from -2^8n-1 to 2^8n-1-1 for some n. n is called the number of bytes in the representation. The number of bits in the representation is 8n

Installers may limit the size of VARIETY that they implement. A statement of such limits shall be part of the specification of the installer. In no case may such limits be less than 64 bits, signed or unsigned.

It is intended that there should be no upper limit allowed at some future date.

Operations are performed in the representing VARIETY. If the result of an operation does not lie within the bounds of the stated VARIETY, but does lie in the representation, the value produced in that representation shall be as if the VARIETY had the lower and upper bounds of the representation. The implication of this is usually that a number in a VARIETY is represented by that same number in the representation.

If the bounds of a VARIETY, v, include those of a VARIETY, w, the representing VARIETY for v shall include or be equal to the representing VARIETY for w.

The representations of two VARIETYs of the form var_limits(0, 2ⁿ-1) and var_limits(-2^n-1, 2^n-1-1) shall have the same number of bits and the mapping of their ALIGNMENTs into the target alignment shall be the same.

7.19. Overflow and Integers

It is necessary first to define what overflow means for integer operations and second to specify what happens when it occurs. The intention of TDF is to permit the simplest possible implementation of common constructions on all common machines while allowing precise effects to be achieved, if necessary at extra cost.

Integer varieties may be represented in the computer by a range of integers which includes the bounds given for the variety. An arithmetic operation may therefore yield a result which is within the stated variety, or outside the stated variety but inside the range of representing values, or outside that range. Most machines provide instructions to detect the latter case; testing for the second case is possible but a little more costly.

In the first two cases the result is defined to be the value in the representation. Overflow occurs only in the third case.

If the ERROR_TREATMENT is impossible overflow will not occur. If it should happen to do so the effect of the operation is undefined.

If the ERROR_TREATMENT is error_jump a LABEL is provided to jump to if overflow occurs.

If the ERROR_TREATMENT is trap(overflow), a producer-defined TOKEN ~Throw: NAT -> EXP must be provided. On an overflow, the installer will arrange that ~Throw(error_val(overflow)) is evaluated.

The wrap ERROR_TREATMENT is provided so that a useful defined result may be produced in certain cases where it is usually easily available on most machines. This result is available on the assumption that machines use binary arithmetic for integers. This is certainly so at present, and there is no close prospect of other bases being used.

If a precise result is required further arithmetic and testing may be needed which the installer may be able to optimise away if the word lengths happen to suit the problem. In extreme cases it may be necessary to use a larger variety.

7.20. Representing floating point

FLOATING_VARIETYs shall be implemented by a representation which has at least the properties specified.

Installers may limit the size of FLOATING_VARIETY which they implement. A statement of such limits shall be part of the specification of an installer.

The limit may also permit or exclude infinities.

Any installer shall implement at least one FLOATING_VARIETY with the following properties (c.f. IEEE doubles):

mantissa_digs shall not be less than 53.
min_exponent shall not be less than 1023.
max_exponent shall not be less than 1022.

Operations are performed and overflows detected in the representing FLOATING_VARIETY.

There shall be at least two FLOATING_VARIETYs, one occupying the same number of bytes and having the same alignment as a VARIETY representation, and one occupying twice as many bytes.

7.21. Floating point errors

The only permitted ERROR_TREATMENTs for operations delivering FLOATING_VARIETYs are impossible, error_jump and trap (overflow).

The kinds of floating point error which can occur depend on the machine architecture (especially whether it has IEEE floating point) and on the definitions in the ABI being obeyed.

Possible floating point errors depend on the state of the machine and may include overflow, divide by zero, underflow, invalid operation and inexact. The setting of this state is performed outside TDF (at present).

If an error_jump or trap is taken as the result of a floating point error the operations to test what kind of error it was are outside the TDF definition (at present).

7.22. Rounding and floating point

Each machine has a rounding state which shall be one of to_nearest, toward_larger, toward_smaller, toward_zero. For each operation delivering a FLOATING_VARIETY, except for make_floating, any rounding necessary shall be performed according to the rounding state.

7.23. Floating point accuracy

While it is understood that most implementations will use IEEE floating arithmetic operations, there are machines which use other formats and operations. It is intended that they should not be excluded from having TDF implementations.

For TDF to have reasonably consistent semantics across many platforms, one must have some minimum requirements on the accuracies of the results of the floating point operations defined in TDF. The provisional requirements sketched below would certainly be satisfied by an IEEE implementation.

Let @ be some primitive dyadic arithmetic operator and @' be its TDF floating-point implementation. Let F be some non-complex FLOATING_VARIETY and F' be a representational variety of F.

Condition 1:

If a, b and a @ b can all be represented exactly in F, then they will also be represented exactly in F'. Extending the '-notation in the obvious manner:

	(a @ b)' = (a' @' b')

This equality will also hold using the TDF equality test, i.e.:

	(a @ b)' =' (a' @' b')

Condition 2:

The operator @' is monotonic in the sense apposite to the operator @. For example, consider the operator +; if x is any number and a and b are as above:

	(x > b) => ((a' +' x') >=  (a + b)')

and:

	(x < b) =>  ((a' +' x') <= (a + b)')

and so on, reflecting the weakening of the ordering after the operation from > to >= and < to <=. Once again, the inequalities will hold for their TDF equivalents e.g., >=' and >'.

Similar conditions can be expressed for the monadic operations.

For the floating-point test operation, there are obvious analogues to both conditions. The weakening of the ordering in the monotonicity condition, however, may lead to surprising results, arising mainly from the uncertainty of the result of equality between floating numbers which cannot be represented exactly in F.

Accuracy requirements for complex FLOATING_VARIETYs could follow directly by considering the above conditions applied to real and imaginary parts independently. The following proviso is added for some complex operations however, to allow for possible intermediate error conditions. With floating_div, floating_mult and floating_power for complex FLOATING_VARIETYs, errors are guaranteed not to occur only if the square of the modulus of each argument is representable and the square of the modulus of the result is representable. Whenever these additional constraints are not met, the operation will either complete with the accuracy conditions above applying, or it will complete according to the ERROR_TREATMENT specified.

7.24. Representing bitfields

BITFIELD_VARIETYs specify a number of bits and shall be represented by exactly that number of bits in twos-complement notation. Producers may expect them to be packed as closely as possible.

Installers may limit the number of bits permitted in BITFIELD_VARIETYs. Such a limit shall be not less than 32 bits, signed or unsigned.

It is intended that there should be no upper limit allowed at some future date.

Some offsets of which the second parameter contains a BITFIELD alignment are subject to a constraint defined below. This constraint is referred to as variety_enclosed.

The intent of this constraint is to force BITFIELDs to be implemented (in memory) as being included in some properly aligned VARIETY value. The constraint applies to:

	x: offset(p, b)

and to:

	sh = bitfield(bfvar_bits(s, n))

where alignment(sh) is included in b. The constraint is as follows:

There will exist a VARIETY, v, and r: offset(p, q) where v is in q.

	offset_pad(b, r) <= x

and:

	offset_pad(b, r + sz(v)) >= offset_pad( b, x + sz(sh))

where the comparisons are in the sense of offset_test, + is offset_add and sz is shape_offset.

7.25. Permitted limits

An installer may specify limits on the sizes of some of the data SHAPEs which it implements. In each case there is a minimum set of limits such that all installers shall implement at least the specified SHAPEs. Part of the description of an installer shall be the limits it imposes. Installers are encouraged not to impose limits if possible, though it is not expected that this will be feasible for floating point numbers.

7.26. Least Upper Bound

The LUB of two SHAPEs, a and b is defined as follows:

If a and b are equal shapes, then a.
If a is BOTTOM then b.
If b is BOTTOM then a.
Otherwise TOP.

7.27. Read-only areas

Consider three scenarios in increasingly static order:

Dynamic loading. A new module is loaded, initialising procedures are obeyed and the results of these are then marked as read-only.
Normal loading. An ld program is obeyed which produces various (possibly circular) structures which are put into an area which will be read-only when the program is obeyed.
Using ROM. Data structures are created (again possibly circular) and burnt into ROM for use by a separate program.

In each case program is obeyed to create a structure, which is then frozen. The special case when the data is, say, just a string is not sufficiently general.

This TDF specification takes the attitude that the use of read-only areas is a property of how TDF is used - a part of the installation process - and there should not be TDF constructions to say that some values in a CAPSULE are read-only. Such constructions could not be sufficiently general.

7.28. Tag and Token signatures

In a TDF program there will usually be references to TAGs which are not defined in TDF; their definitions are intended to be supplied by a host system in system specific libraries.

These TAGs will be declared (but not defined) in a TDF CAPSULE and will be specified by external linkages of the CAPSULE with EXTERNALs containg either TDFIDENTs or UNIQUEs. In previous versions of TDF, the external names required by system linking could only be derived from those EXTERNALs.

Version 4.0 gives an alternative method of constructing extra-TDF names. Each global TAG declaration can now contain a STRING signature field which may be used to derive the external name required by the system.

This addition is principally motivated by the various "name mangling" schemes of C++. The STRING signature can be constructed by concatenations and token expansions. Suitable usages of TOKENs can ensure that the particular form of name-mangling can be deferred to installation time and hence allow, at least conceptually, linking with different C++ libraries.

As well as TAG declarations, TAG definitions are allowed to have signatures. The restriction that the signature (if present) of a TAG definition being identical to its corresponding definition could allow type checking across seperately compiled CAPSULEs.

Similar considerations apply to TOKENs; although token names are totally internal to TDF, it would allow one to check that a token declared in one CAPSULE has the same "type" as its definition in another.

7.29. Dynamic initialisation

The dynamic initialisation of global variables is required for languages like C++. Previous to version 4.0, the only initialisations permissable were load-time ones; in particular no procedure calls were allowed in forming the initialising value. Version 4.0 introduces the constructor initial_value to remedy this situation.

Several different implementation strategies could be considered for this. Basically, one must ensure that all the initial_value expressions are transformed into assignments to globals in some procedure. One might expect that there would be one such procedure invented for each CAPSULE and that somehow this procedure is called before the main program.

This raises problems on how we can name this procedure so that it can be identified as being a special initialising procedure. Some UNIX linkers reserve a name like __init specially so that all instances of it from different modules can be called before the main procedure. Other cases may require a pre-pass on the .o files prior to system linking.