Unfinished text with some information about the TopSpeed assembler

brahn · January 20, 2016, 11:09pm

and one more from the newsgroups!

Re: Clarion Assembler documents?

Overview
Invoking the assembler
Program layout
Segments
Tokens
Generic tokens
Keywords
Operators
Syntax
Assembly language considerations
The symbol table
Operand sizes
Jump and calls
Strings
References
Variables and Data
File inclusion
Conditional assembly
Predefined identifiers
Memory models and predefined identifiers
Miscellaneous points
Smart linking
TSA error messages
2. Calling conventions
3. Using masm with cw
4. Examples TS Blowfish

#1. TopSpeed assembler

This is a transcript of “TopSpeed (TS) TechKit, Advanced Programming
Guide”, Chapter 6, TS Assembler, (c) by Jensen & Partners Int., and
documents the TSA for DOS, release 3. We have used it, coding in TS
assembler for windows v5.x, and found it useful in understanding the
peculiarities of the TSA.
We updated the tokens and instructions, with information from the
binaries of c55asmx.dll, but we haven’t tested in cw5.X all the sintaxis
documented here.

The TSA is a single-pass assembler used to assemble source files into
…OBJ form. The resulting object files can be linked with programs
written in other TS languages. The TS Project System automatically
detects that a particular module is written in assembly language and
invokes the TSA as required. Assembly language files are identified by a
…A extension on the filename. Thus, INITDLL.A is the assembly language
source code for one part of the TS Library.

Overview

The TSA is designed to be simple to use and fast to operate. The TSA
supports smart linking and interfaces to the other languages in the TS
family. The TS assembly language differs from “standard” 8086 assemblers
in a number of ways:

The lexical structure is derived from Modula-2. In particular, the
Assembler is case sensitive, comments are marked with (* and *) and
hexadecimal constants must be in upper case.
A semicolon ( is used to separate multiple statements on a line. In
fact, a newline and a semicolon are regarded as indentical by the assembler.
There is only on kind of label and there are no data types. Instead of
data types, TSA uses specifiers. Thus, the specifiers near, far, byte
and word can be used to define the data type of the operands in a
instruction.
Memory operands and segment overrides must always be explicitly specified.
(No macros or other complicated features are used *** hay macros)

This document describes the assembly language which is recognized by the
TSA. This chapter does not contain a detailed discussion of any
assembly-level programming.
The Assembler achieves its high speed by assembling the source file in a
single pass. This means that variables and labels must be defined before
they are used, otherwise the Assembler assumes that the item is to be
defined later in the same segment. If the item is not found there, an
error message is produced.

##Invoking the Assembler
The TS project system automatically invokes the assembler for any source
file in a project with a .A extension. This means that you do not have
to worry about the interdependencies of a project, or how those
interdependencies are maintained.
The Assembler can be invoked from within the TS environment by selecting
compile from the main menu when a .A file is loaded (or by using the
shortcut, ALT-C). Assembly errors are reported in the same way as
compiler errors: the error window is displayed and you can enter
corrections.

Program Layout
A TSA program has the following general layout:

     module <name>           (* names the module *)
     segment <name>...       (* define segments *)
     <code and data> ...
     section                 (* another section - optional *)
     ... <and so on>
     end                     (* marks the end of the module *)

##Segments
An 8086 program consists of one or more segments. Each segment contains
either code or data. Every program contains at least one segment: a CODE
segment consisting of the executable code of the program.
Segments are declared using the following syntax:

     segment <segment-name> (<class>,<alignment>)

For example, the statement:

     segment SPOT(CODE, 28H)

declares a segment called SPOT of the class CODE, with an alignment code
of 28H. The alignment code determines the tyoe of address the segment is
to be located at when the program is linked.
The normal values are:
8H a byte aligned segment. The segment can be loaded at any address.
48H a word aligned segment. The segment must be loaded at an even address, with a gap of one byte being left if necessary.
68H a paragraph aligned segment. The segment must be loaded at an address which is exactly divisible by 16. A gap of up to 15 bytes is left if necessary.
74H a stack segment. This is only used for segments which are to be used as a stack.

from Fred Meyer:

USE16               = 00H
USE32               = 01H

ABS_ALIGN           =  00H
BYTE_ALIGN          =  20H
WORD_ALIGN          =  40H
DWORD_ALIGN         = 0A0H
PARA_ALIGN          =  60H
PAGE_ALIGN          =  80H

DONT_COMBINE        = 00H
MEMORY_COMBINE      = 04H
PUBLIC_COMBINE      = 08H
STACK_COMBINE       = 14H
COMMON_COMBINE      = 18H

(* %T _WIN32 *)

segment _DATA(DATA, USE32 + DWORD_ALIGN + PUBLIC_COMBINE)
segment _TEXT(CODE, USE32 + DWORD_ALIGN + PUBLIC_COMBINE)

Segments can also be grouped together. Groups must be declared after the segments they contain, using the following syntax:

     group <group-name> (<segment-name-list>)

For example:

     group  F_FOO (A_FOO, B_FOO, C_FOO)

declares a group called G_FOO consisting of the three segments A_FOO, B_FOO and C_FOO. This must be preceded by the definitions of A_FOO, B_FOO and C_FOO.

The TS compiler generates object files which use the following segments:

 _TEXT is the segment for executable code.
 _DATA is the segment for initialized data.
 _BSS is the segment for uninitialized data.

In addition, other segment names are used with a standard meaning. For
example, the _INIT segment is used by the C start-up routine to access
the initialization routines for library sub-systems. These segments are
reserved. They must only be used for their allotted purpose(s).

Example: A simple function
The following example returns a global variable in the AX register.
Points to note are the declaratio n of the DATA and CODE segments of the
program:

     module example

     segment _DATA(DATA,28H)
     public _i: dw 1

     segment _TEXT(CODE,48H)
     public _p:
         mov   ax, [_i]
         ret   0
     end

The equivalent of this example in “C” would be as follows:

     int i = 1;
     int p(void) {
         return i;
     }

Tokens

The basic lexical tokens of TSA are:
Generic Tokens tokens relating to programmer selected items
Keywords tokens relating to the control of the generated code
Instructions tokens that are mnemonics for 8086 instructions
Neither keywords nor instructions can be used for used defined names.

###Generic Tokens
Generic tokens in TSA are one of the following:

 <strings>       are sequences of characters enclosed in single quotations marks (').
 <numbers>       are signed decimal integers or hexadecimal numbers; hexadecimal numbers must start with a digit, all digits must be in upper-case and it must be suffixed with an H.
 <names>         are case-sensitive labels for segments, groups, subroutines and local jump destinations; they must begin with a letter (or an underscore, $ or @) and may contain letters, digits, underscores, $ or @.

                 Some legal names are:

                         $loop
                         Main@Start
                         _again

 <registers>     which are the names of the 8086 registers:
                 (updated from c55asmx.dll)

                         es cs ss ds fs gs
                         ax cx dx bx sp bp si di
                         eax ecx edx ebx esp ebp esi edi
                         al cl dl bl ah ch dh bh
                         cr0 cr1 cr2 cr3 cr4 cr5 cr6 cr7
                         dr0 dr1 dr2 dr3 dr4 dr5 dr6 dr7
                         tr0 tr1 tr2 tr3 tr4 tr5 tr6 tr7
                         st0 st1 st2 st3 st4 st5 st6 st7

Keywords

The TSA uses a number of keywords (updated from c55asmx.dll):

     byte word dword fword qword tbyte
     db dw dd df
     log2 offset power2 ptr seg vdisp far near align end extrn forward
     group include macro macrod module org public purge section segment
     select st

Instructions (from c55asmx.dll, see the end of the document)

##Operators
The following operators are defined in TSA. The operators are listed in orden of precedence (highest precedence first):

     power2      power of 2
     log2        log base 2 (truncated to integer)
     / % *       division, modulus, multiplication
     + -         addition, subtraction
     :           segment reference
     ~           bitwise not
     &           bitwise and
     |           bitwise or
     seg         segment of address

Operators of equal precedence associate with the expression to their left.

##Syntax
This section describes the TSA assembly language.
The conventions used are as follows:

 ::=     is used to mean "is defined to be"
 |       is used to mean "or, alternatively"
 .       is used to terminate a production of the syntax
 <>      are used to enclose syntax elements
 ''      are used to enclose required typographic symbols

The syntax elements , , , and
are not defined here. The interpretation of these elements is
described in the section Tokens earlier in this chapter. The syntax
element is used to indicate that an alternative to a production
may be omitted entirely.

     <compilation>    ::=  <globalsdefs><modhead><statements> end <entry>
     <globalsdef>     ::=  <globaldef> | <globalsdef>';'<globaldef>.
     <globaldef>      ::=  <name>'='<operand> | <empty>.
     <entry>          ::=  <empty> | '*'.
     <modhead>        ::=  module <name>.
     <statements>     ::=  <statement> | <statements>';'<statement>.
     <statement>      ::=  <instruction><specifier><operand-list> | (* instruction *)
                           db <byteexps> | (* define bytes *)
                           dw <wordexps> | (* define word *)
                           dd <wordexps> | (* define dword *)
                           fixup <exp> | (* generates fixup *)
                           org <exp> | (* reserve <exp> bytes *)
                           <name> '=' <exp> | (* equate *)
                           public <name> '=' <exp> | (* absolute public *)
                           extrn <name> | (* external symbol/label *)
                           section | (* delimits minimun linker units *)
                           select <exp> | (* selects new output segment *)
                           segment <name>'('<name>','<exp>')' | (* segment *)
                           group <name>'('<segnames>')' | (* group *)
                           <label>':'<statement> | (* label a statement *)
                           <empty>. (* empty statement *)
     <segnames>       ::=  <groupcomponent> | <segnames>','<groupcomponent>.
     <groupcomponent> ::=  <exp>.<label>          ::=  public <name> | (* public (global) label *)
                           <name>. (* normal (local) label *)
     <specifier>      ::=  <empty> | (* default *)
                           near | far | byte | word | dword | qword | tbyte.
     <byteexps>       ::=  <byteexp> | <byteexps>';'<byteexp>
     <byteexp>        ::=  <string> | (* outputs a string *)
                           <exp>. (* outputs a byte *)
     <wordexps>       ::=  <wordexp> | <wordexps>','<wordexp>.
     <wordedp>        ::=  <exp>. (* outputs a word *)
     <operand-list>   ::=  <empty> | (* no operands *)
                           <operand> | (* one operand *)
                           <operand>','<operand>. (* two operands *)
     <operand>        ::=  <register> | (* register operand *)
                           <exp> | (* inmediate operand *)
                           <register>':'<addr-list> | (* segment override *)
                           <addr-list> | (* memory operand *)
                           st '('<exp>')' | (* 8087 register *)
                           seg <exp>. (* segment of <exp> *)
     <addr-list>      ::=  <addr> | <addr-list><addr>.
     <addr>           ::=  '['<exp>']' | (* displacement *)
                           '['<register>']'. (* index register *)
     <exp>            ::=  <number> | (* deximal or hex *)
                           <name> | '~'<exp> | (* bitwise NOT *)
                           '-'<exp> | (* arithmetic negation *)
                           <exp>'*'<exp> | (* multiplication *)
                           <exp>'/'<exp> | (* integer division *)
                           <exp>'%'<exp> | (* modulus *)
                           <exp>'+'<exp> | (* addition *)
                           <exp>'-'<exp> | (* subtraction *)
                           <exp>'&'<exp> | (* bitwise AND *)
                           <exp>'|'<exp> | (* bitwise OR *)
                           '('<exp>')' |
                           power2 <exp> | (* 2  <exp> *)
                           log2 <exp>. (* log base 2 of <exp> *)

Assembly Language considerations

This section describes a number of points of interest for the assembly
level programmer, who is used to using other assemblers.
You may find it useful to use te TS dissasembler to generate examples of
TSA from high-level language object files.

The symbol table

The TSA discards the current symbol table every time a new section is
started, except for any global equates which occur before the module
keyword in the source file. This means that symbols may be redefined. If
you redefine a public symbol, the linker issues a warning message.
If you wish to reference external objects, the extrn keyword should be
used, for example:

     extrn ReturnOne
       ...
       call far ReturnOne
       ...

To make an object accessible from another module, the name must be
proceded by the public keyword.

Operand sizes

In most cases, the operands to an instruction have an implied size
(byte, word, etc). This size can be calculated by the assembler and
requires no intervention on your part. However, when there is a chance
of ambiguity, you must use a specifier to indicate the size of the operand:

     inc   word [bp][-6]
     mov   byte es:[ebx],1
     mov   es:[bx],ax         (* no specifier needed, ax implies word *)
     push  es:[bx]

###Jump and calls
Jump and calls default to the smallest possible case. Thus, unless a
label has been defined before use, the assembler assumes that it is a
label in the same segment as the statement referencing it.
Labels should result in a jump of between -128 bytes and +127 bytes in
the output object code. If a jump exceeds this range, a specifier must
be added. For example:

 jmp  near _loop

###Strings
Single character string constants are treated as numbers by the TSA.
The assembler syntax does not provide a segment override for string
instructions. If you wish to do this, it must be done using the db
keyword. Por example:

 db  2EH;  (* cs: *) stosb

References

Whithin TSA, forward references are assumed for labels that have not yet
been defined. The TSA assumes that such labels are in the currently
selected segment whithin the current section.
Whitin a single section, equated symbols may be redefined; labels, on
the other hand, may not be redefined.

Variables and Data

Variables can be defined, and uninitialized data areas reserved, using
the following directives:
db initialize storage as bytes (expects byte or string).
dw initialize storage as words (expects word operands).
org reserve some uninitialized memoryt of a size equal to the value of the operand.
****** completar

File inclusion

The TSA has an include directive which allows a header file to be
included. This is used in the library to define commonly used identifiers:

 include  "corelib.inc"

Conditional assembly

The TSA has the ability to conditionally assemble sections of code based
on the value of an identifier. There are three special directives that
mark which blocks to include/exclude. These are:

 (*%T<identifier>*)      means process this section only when <identifier>=1
 (*%F<identifier>*)      means process this section only when <identifier>=0
 (*%E*)                  means end of conditional section

These directors are commonly used with the return code for a procedure that is memory model independent:

 (*%T _fcall *)
   ret 0
 (*%E *)
 (*%F _fcall *)
   ret far 0
 (*%E *)

Conditional sections can be nested.

For further examples of using conditional assembly, please refer to the
file COREGRAP.A in the \TS\SRC directory. This file is also an excellent
example of assembly language procedures providing memory model and
register passed parameter support.

Predefined identifiers

The assembler predefines a symbol for each #pragma define(=) which is active in the project file at the point where the #compile occurs. This allows conditional assembly according to which model has been selected. The value is 1 if the project value is on, otherwise the value is 0.

Memory models and predefined identifiers

The following identifieres are defined by the Project System, and have
the meaning shown:

 _fcall      calls and returns are far.
 _fptr       pointers are 32 bit.
 _fdata      the ds register not = dgroup on function entry.
 _mthread    the program is multi-threaded.
 _jpicall    parameters are passed in registers.
             target system:
 _WINDOW       windows (16 bit)
 _DLL          DLL
 _WINDLL       windows DLL
 _OVL          overlay
 _OS2          OS/2
 MSDOS       MSDOS
 _WIN32        win32
 M_I86xM       8086, memory model indicated by x.
                     x may be S, M, C, L, T, X, O or D.

Miscelaneous points

Other assemblers allow instructions formats suchs as:

 mov  ax, es:[bx+6]

In TPA, these must be expressed as:

 mov  ax, es:[bx][6]

The instructions rep, repne and lock are regarded by the assembler as
separate instructions. They must be followed by a semicolon, for example:

 rep; movsb      (* note the semicolon *)

Smart Linking

When the TSA linker links your program, it includes only the CODE and
DATA segments which are actually referenced. In the following example _p
and _q are placed in separate segments (of the same name) so that they
will be ‘smart’ linked into the final program:

 module  Example2
 segment _TEXT(CODE,48H)
 public _p:
     mov  ax, 1
     ret  0

 segment _TEXT(CODE,48H)
 public _q:
     mov  ax, 2
     ret  0
 end

TS error messages TO BE COMPLETED

cw5.5:
   c55asm.dll
   c55asmx.dll

NONE
Error

Imm Mem String Disp
Ib IV IX Iw Iv Ix Jb Jv Pv
Av AV EB EW EV Eb Ev Ew
M MD MQ MT MW
Ob Ov Cd Dd Gb Gv Gw
Qb Qv Rd STi Sw Td
eAX sb sv db dv xb

ZERO ONE THREE TEN

Instructions:

aaa      aad    aam    aas    adc    add    and    arpl
bound    bsf    bsr    bswap  bt     btc    btr    bts
call     clc    cld    cli    clts   cmc    cmp    cmps
cmpxchg  daa    das    dec    div    enter  esc    halt
idiv     imul   in     inc    ins    int    into   invd
invlpg   jmp    jb     jae    je     jne    jbe    ja
jp       jpo    jl     jge    jle    jg     jo     jno
js       jns    lahf   lar    lds    lea    leave  les
lfs      lgdt   lgs    lidt   lldt   lmsw   lods   lsl
lss      ltr    mov    movs   movsx  movzx  mul    neg
nop      not    or     out    outs   pop    push   rcl
rcr      rol    ror    sahf   sar    sbb    scas   retf
retn     setb   setae  sete   setne  setbe  seta   setp
setpo    setl   setge  setle  setg   seto   setno  sets
setns    sgdt   shl    shld   shr    shrd   sidt   sldt
smsw     stc    std    sti    stos   str    sub    test
verr     verw   wbinvd fwait  xadd   xchg   xlats  xor

f2xm1    fabs     fadd     faddp    fbld   fbstp    fchs     fclex
fcom     fcomp    fcompp   fdecstp  fdiv   fdivp    fdivr    fdivrp
ffree    fiadd    ficom    ficomp   fidiv  fidivr   fild     fimul
fincstp  finit    fist     fistp    fisub  fisubr   fld      fld1
fldcw    fldenv   fldl2e   fldl2t   fldlg2 fldln2   fldpi    fldz
fmul     fmulp    fnop     fpatan   fprem  fptan    frndint  frstor
fsave    fscale   fsetpm   fsqrt    fst    fstcw    fstenv   fstp
fstsw    fsub     fsubp    fsubr    fsubrp ftst     fxam     fxch
fxtract  fyl2x    fyl2xp1  fsin     fcos   fsincos  fprem1   fucom
fucomp   fucompp  fdisi    feni     lock

rep     repne   cbw    cwd    iret    pusha  pushf   popa
popf    jcxz    loop   loope  loopne  cwde   cdq     iretd
pushad  pushfd  popad  popfd  jecxz   loopd  looped  loopned
ret

mul_x
imul_x
div_x
idiv_x
rmovsb
rmovsw
rmovsd

addrsiz  datasiz

grp1_1 grp1_2 grp1_3
grp2_1 grp2_2 grp2_3 grp2_4 grp2_5 grp2_6
grp3_1 grp3_2
grp4   grp5   grp6   grp7   grp8
grp_f0 grp_f1 grp_f2 grp_f3 grp_f4

over
twobyte
xx
data
org
align
abspub
public
flat

*** end