and one more from the newsgroups!
Re: Clarion Assembler documents?
Overview
Invoking the assembler
Program layout
Segments
Tokens
Generic tokens
Keywords
Operators
Syntax
Assembly language considerations
The symbol table
Operand sizes
Jump and calls
Strings
References
Variables and Data
File inclusion
Conditional assembly
Predefined identifiers
Memory models and predefined identifiers
Miscellaneous points
Smart linking
TSA error messages
2. Calling conventions
3. Using masm with cw
4. Examples TS Blowfish
#1. TopSpeed assembler
This is a transcript of “TopSpeed (TS) TechKit, Advanced Programming
Guide”, Chapter 6, TS Assembler, (c) by Jensen & Partners Int., and
documents the TSA for DOS, release 3. We have used it, coding in TS
assembler for windows v5.x, and found it useful in understanding the
peculiarities of the TSA.
We updated the tokens and instructions, with information from the
binaries of c55asmx.dll, but we haven’t tested in cw5.X all the sintaxis
documented here.
The TSA is a single-pass assembler used to assemble source files into
…OBJ form. The resulting object files can be linked with programs
written in other TS languages. The TS Project System automatically
detects that a particular module is written in assembly language and
invokes the TSA as required. Assembly language files are identified by a
…A extension on the filename. Thus, INITDLL.A is the assembly language
source code for one part of the TS Library.
Overview
The TSA is designed to be simple to use and fast to operate. The TSA
supports smart linking and interfaces to the other languages in the TS
family. The TS assembly language differs from “standard” 8086 assemblers
in a number of ways:
- The lexical structure is derived from Modula-2. In particular, the
Assembler is case sensitive, comments are marked with (* and *) and
hexadecimal constants must be in upper case. - A semicolon ( is used to separate multiple statements on a line. In
fact, a newline and a semicolon are regarded as indentical by the assembler. - There is only on kind of label and there are no data types. Instead of
data types, TSA uses specifiers. Thus, the specifiers near, far, byte
and word can be used to define the data type of the operands in a
instruction. - Memory operands and segment overrides must always be explicitly specified.
- (No macros or other complicated features are used *** hay macros)
This document describes the assembly language which is recognized by the
TSA. This chapter does not contain a detailed discussion of any
assembly-level programming.
The Assembler achieves its high speed by assembling the source file in a
single pass. This means that variables and labels must be defined before
they are used, otherwise the Assembler assumes that the item is to be
defined later in the same segment. If the item is not found there, an
error message is produced.
##Invoking the Assembler
The TS project system automatically invokes the assembler for any source
file in a project with a .A extension. This means that you do not have
to worry about the interdependencies of a project, or how those
interdependencies are maintained.
The Assembler can be invoked from within the TS environment by selecting
compile from the main menu when a .A file is loaded (or by using the
shortcut, ALT-C). Assembly errors are reported in the same way as
compiler errors: the error window is displayed and you can enter
corrections.
Program Layout
A TSA program has the following general layout:
module <name> (* names the module *)
segment <name>... (* define segments *)
<code and data> ...
section (* another section - optional *)
... <and so on>
end (* marks the end of the module *)
##Segments
An 8086 program consists of one or more segments. Each segment contains
either code or data. Every program contains at least one segment: a CODE
segment consisting of the executable code of the program.
Segments are declared using the following syntax:
segment <segment-name> (<class>,<alignment>)
For example, the statement:
segment SPOT(CODE, 28H)
declares a segment called SPOT of the class CODE, with an alignment code
of 28H. The alignment code determines the tyoe of address the segment is
to be located at when the program is linked.
The normal values are:
8H a byte aligned segment. The segment can be loaded at any address.
48H a word aligned segment. The segment must be loaded at an even address, with a gap of one byte being left if necessary.
68H a paragraph aligned segment. The segment must be loaded at an address which is exactly divisible by 16. A gap of up to 15 bytes is left if necessary.
74H a stack segment. This is only used for segments which are to be used as a stack.
from Fred Meyer:
USE16 = 00H
USE32 = 01H
ABS_ALIGN = 00H
BYTE_ALIGN = 20H
WORD_ALIGN = 40H
DWORD_ALIGN = 0A0H
PARA_ALIGN = 60H
PAGE_ALIGN = 80H
DONT_COMBINE = 00H
MEMORY_COMBINE = 04H
PUBLIC_COMBINE = 08H
STACK_COMBINE = 14H
COMMON_COMBINE = 18H
(* %T _WIN32 *)
segment _DATA(DATA, USE32 + DWORD_ALIGN + PUBLIC_COMBINE)
segment _TEXT(CODE, USE32 + DWORD_ALIGN + PUBLIC_COMBINE)
Segments can also be grouped together. Groups must be declared after the segments they contain, using the following syntax:
group <group-name> (<segment-name-list>)
For example:
group F_FOO (A_FOO, B_FOO, C_FOO)
declares a group called G_FOO consisting of the three segments A_FOO, B_FOO and C_FOO. This must be preceded by the definitions of A_FOO, B_FOO and C_FOO.
The TS compiler generates object files which use the following segments:
_TEXT is the segment for executable code.
_DATA is the segment for initialized data.
_BSS is the segment for uninitialized data.
In addition, other segment names are used with a standard meaning. For
example, the _INIT segment is used by the C start-up routine to access
the initialization routines for library sub-systems. These segments are
reserved. They must only be used for their allotted purpose(s).
Example: A simple function
The following example returns a global variable in the AX register.
Points to note are the declaratio n of the DATA and CODE segments of the
program:
module example
segment _DATA(DATA,28H)
public _i: dw 1
segment _TEXT(CODE,48H)
public _p:
mov ax, [_i]
ret 0
end
The equivalent of this example in “C” would be as follows:
int i = 1;
int p(void) {
return i;
}
Tokens
The basic lexical tokens of TSA are:
Generic Tokens tokens relating to programmer selected items
Keywords tokens relating to the control of the generated code
Instructions tokens that are mnemonics for 8086 instructions
Neither keywords nor instructions can be used for used defined names.
###Generic Tokens
Generic tokens in TSA are one of the following:
<strings> are sequences of characters enclosed in single quotations marks (').
<numbers> are signed decimal integers or hexadecimal numbers; hexadecimal numbers must start with a digit, all digits must be in upper-case and it must be suffixed with an H.
<names> are case-sensitive labels for segments, groups, subroutines and local jump destinations; they must begin with a letter (or an underscore, $ or @) and may contain letters, digits, underscores, $ or @.
Some legal names are:
$loop
Main@Start
_again
<registers> which are the names of the 8086 registers:
(updated from c55asmx.dll)
es cs ss ds fs gs
ax cx dx bx sp bp si di
eax ecx edx ebx esp ebp esi edi
al cl dl bl ah ch dh bh
cr0 cr1 cr2 cr3 cr4 cr5 cr6 cr7
dr0 dr1 dr2 dr3 dr4 dr5 dr6 dr7
tr0 tr1 tr2 tr3 tr4 tr5 tr6 tr7
st0 st1 st2 st3 st4 st5 st6 st7
Keywords
The TSA uses a number of keywords (updated from c55asmx.dll):
byte word dword fword qword tbyte
db dw dd df
log2 offset power2 ptr seg vdisp far near align end extrn forward
group include macro macrod module org public purge section segment
select st
Instructions (from c55asmx.dll, see the end of the document)
##Operators
The following operators are defined in TSA. The operators are listed in orden of precedence (highest precedence first):
power2 power of 2
log2 log base 2 (truncated to integer)
/ % * division, modulus, multiplication
+ - addition, subtraction
: segment reference
~ bitwise not
& bitwise and
| bitwise or
seg segment of address
Operators of equal precedence associate with the expression to their left.
##Syntax
This section describes the TSA assembly language.
The conventions used are as follows:
::= is used to mean "is defined to be"
| is used to mean "or, alternatively"
. is used to terminate a production of the syntax
<> are used to enclose syntax elements
'' are used to enclose required typographic symbols
The syntax elements , , , and
are not defined here. The interpretation of these elements is
described in the section Tokens earlier in this chapter. The syntax
element is used to indicate that an alternative to a production
may be omitted entirely.
<compilation> ::= <globalsdefs><modhead><statements> end <entry>
<globalsdef> ::= <globaldef> | <globalsdef>';'<globaldef>.
<globaldef> ::= <name>'='<operand> | <empty>.
<entry> ::= <empty> | '*'.
<modhead> ::= module <name>.
<statements> ::= <statement> | <statements>';'<statement>.
<statement> ::= <instruction><specifier><operand-list> | (* instruction *)
db <byteexps> | (* define bytes *)
dw <wordexps> | (* define word *)
dd <wordexps> | (* define dword *)
fixup <exp> | (* generates fixup *)
org <exp> | (* reserve <exp> bytes *)
<name> '=' <exp> | (* equate *)
public <name> '=' <exp> | (* absolute public *)
extrn <name> | (* external symbol/label *)
section | (* delimits minimun linker units *)
select <exp> | (* selects new output segment *)
segment <name>'('<name>','<exp>')' | (* segment *)
group <name>'('<segnames>')' | (* group *)
<label>':'<statement> | (* label a statement *)
<empty>. (* empty statement *)
<segnames> ::= <groupcomponent> | <segnames>','<groupcomponent>.
<groupcomponent> ::= <exp>.<label> ::= public <name> | (* public (global) label *)
<name>. (* normal (local) label *)
<specifier> ::= <empty> | (* default *)
near | far | byte | word | dword | qword | tbyte.
<byteexps> ::= <byteexp> | <byteexps>';'<byteexp>
<byteexp> ::= <string> | (* outputs a string *)
<exp>. (* outputs a byte *)
<wordexps> ::= <wordexp> | <wordexps>','<wordexp>.
<wordedp> ::= <exp>. (* outputs a word *)
<operand-list> ::= <empty> | (* no operands *)
<operand> | (* one operand *)
<operand>','<operand>. (* two operands *)
<operand> ::= <register> | (* register operand *)
<exp> | (* inmediate operand *)
<register>':'<addr-list> | (* segment override *)
<addr-list> | (* memory operand *)
st '('<exp>')' | (* 8087 register *)
seg <exp>. (* segment of <exp> *)
<addr-list> ::= <addr> | <addr-list><addr>.
<addr> ::= '['<exp>']' | (* displacement *)
'['<register>']'. (* index register *)
<exp> ::= <number> | (* deximal or hex *)
<name> | '~'<exp> | (* bitwise NOT *)
'-'<exp> | (* arithmetic negation *)
<exp>'*'<exp> | (* multiplication *)
<exp>'/'<exp> | (* integer division *)
<exp>'%'<exp> | (* modulus *)
<exp>'+'<exp> | (* addition *)
<exp>'-'<exp> | (* subtraction *)
<exp>'&'<exp> | (* bitwise AND *)
<exp>'|'<exp> | (* bitwise OR *)
'('<exp>')' |
power2 <exp> | (* 2 <exp> *)
log2 <exp>. (* log base 2 of <exp> *)
Assembly Language considerations
This section describes a number of points of interest for the assembly
level programmer, who is used to using other assemblers.
You may find it useful to use te TS dissasembler to generate examples of
TSA from high-level language object files.
The symbol table
The TSA discards the current symbol table every time a new section is
started, except for any global equates which occur before the module
keyword in the source file. This means that symbols may be redefined. If
you redefine a public symbol, the linker issues a warning message.
If you wish to reference external objects, the extrn keyword should be
used, for example:
extrn ReturnOne
...
call far ReturnOne
...
To make an object accessible from another module, the name must be
proceded by the public keyword.
Operand sizes
In most cases, the operands to an instruction have an implied size
(byte, word, etc). This size can be calculated by the assembler and
requires no intervention on your part. However, when there is a chance
of ambiguity, you must use a specifier to indicate the size of the operand:
inc word [bp][-6]
mov byte es:[ebx],1
mov es:[bx],ax (* no specifier needed, ax implies word *)
push es:[bx]
###Jump and calls
Jump and calls default to the smallest possible case. Thus, unless a
label has been defined before use, the assembler assumes that it is a
label in the same segment as the statement referencing it.
Labels should result in a jump of between -128 bytes and +127 bytes in
the output object code. If a jump exceeds this range, a specifier must
be added. For example:
jmp near _loop
###Strings
Single character string constants are treated as numbers by the TSA.
The assembler syntax does not provide a segment override for string
instructions. If you wish to do this, it must be done using the db
keyword. Por example:
db 2EH; (* cs: *) stosb
References
Whithin TSA, forward references are assumed for labels that have not yet
been defined. The TSA assumes that such labels are in the currently
selected segment whithin the current section.
Whitin a single section, equated symbols may be redefined; labels, on
the other hand, may not be redefined.
Variables and Data
Variables can be defined, and uninitialized data areas reserved, using
the following directives:
db initialize storage as bytes (expects byte or string).
dw initialize storage as words (expects word operands).
org reserve some uninitialized memoryt of a size equal to the value of the operand.
****** completar
File inclusion
The TSA has an include directive which allows a header file to be
included. This is used in the library to define commonly used identifiers:
include "corelib.inc"
Conditional assembly
The TSA has the ability to conditionally assemble sections of code based
on the value of an identifier. There are three special directives that
mark which blocks to include/exclude. These are:
(*%T<identifier>*) means process this section only when <identifier>=1
(*%F<identifier>*) means process this section only when <identifier>=0
(*%E*) means end of conditional section
These directors are commonly used with the return code for a procedure that is memory model independent:
(*%T _fcall *)
ret 0
(*%E *)
(*%F _fcall *)
ret far 0
(*%E *)
Conditional sections can be nested.
For further examples of using conditional assembly, please refer to the
file COREGRAP.A in the \TS\SRC directory. This file is also an excellent
example of assembly language procedures providing memory model and
register passed parameter support.
Predefined identifiers
The assembler predefines a symbol for each #pragma define(=) which is active in the project file at the point where the #compile occurs. This allows conditional assembly according to which model has been selected. The value is 1 if the project value is on, otherwise the value is 0.
Memory models and predefined identifiers
The following identifieres are defined by the Project System, and have
the meaning shown:
_fcall calls and returns are far.
_fptr pointers are 32 bit.
_fdata the ds register not = dgroup on function entry.
_mthread the program is multi-threaded.
_jpicall parameters are passed in registers.
target system:
_WINDOW windows (16 bit)
_DLL DLL
_WINDLL windows DLL
_OVL overlay
_OS2 OS/2
MSDOS MSDOS
_WIN32 win32
M_I86xM 8086, memory model indicated by x.
x may be S, M, C, L, T, X, O or D.
Miscelaneous points
Other assemblers allow instructions formats suchs as:
mov ax, es:[bx+6]
In TPA, these must be expressed as:
mov ax, es:[bx][6]
The instructions rep, repne and lock are regarded by the assembler as
separate instructions. They must be followed by a semicolon, for example:
rep; movsb (* note the semicolon *)
Smart Linking
When the TSA linker links your program, it includes only the CODE and
DATA segments which are actually referenced. In the following example _p
and _q are placed in separate segments (of the same name) so that they
will be ‘smart’ linked into the final program:
module Example2
segment _TEXT(CODE,48H)
public _p:
mov ax, 1
ret 0
segment _TEXT(CODE,48H)
public _q:
mov ax, 2
ret 0
end
TS error messages TO BE COMPLETED
cw5.5:
c55asm.dll
c55asmx.dll
NONE
Error
Imm Mem String Disp
Ib IV IX Iw Iv Ix Jb Jv Pv
Av AV EB EW EV Eb Ev Ew
M MD MQ MT MW
Ob Ov Cd Dd Gb Gv Gw
Qb Qv Rd STi Sw Td
eAX sb sv db dv xb
ZERO ONE THREE TEN
Instructions:
aaa aad aam aas adc add and arpl
bound bsf bsr bswap bt btc btr bts
call clc cld cli clts cmc cmp cmps
cmpxchg daa das dec div enter esc halt
idiv imul in inc ins int into invd
invlpg jmp jb jae je jne jbe ja
jp jpo jl jge jle jg jo jno
js jns lahf lar lds lea leave les
lfs lgdt lgs lidt lldt lmsw lods lsl
lss ltr mov movs movsx movzx mul neg
nop not or out outs pop push rcl
rcr rol ror sahf sar sbb scas retf
retn setb setae sete setne setbe seta setp
setpo setl setge setle setg seto setno sets
setns sgdt shl shld shr shrd sidt sldt
smsw stc std sti stos str sub test
verr verw wbinvd fwait xadd xchg xlats xor
f2xm1 fabs fadd faddp fbld fbstp fchs fclex
fcom fcomp fcompp fdecstp fdiv fdivp fdivr fdivrp
ffree fiadd ficom ficomp fidiv fidivr fild fimul
fincstp finit fist fistp fisub fisubr fld fld1
fldcw fldenv fldl2e fldl2t fldlg2 fldln2 fldpi fldz
fmul fmulp fnop fpatan fprem fptan frndint frstor
fsave fscale fsetpm fsqrt fst fstcw fstenv fstp
fstsw fsub fsubp fsubr fsubrp ftst fxam fxch
fxtract fyl2x fyl2xp1 fsin fcos fsincos fprem1 fucom
fucomp fucompp fdisi feni lock
rep repne cbw cwd iret pusha pushf popa
popf jcxz loop loope loopne cwde cdq iretd
pushad pushfd popad popfd jecxz loopd looped loopned
ret
mul_x
imul_x
div_x
idiv_x
rmovsb
rmovsw
rmovsd
addrsiz datasiz
grp1_1 grp1_2 grp1_3
grp2_1 grp2_2 grp2_3 grp2_4 grp2_5 grp2_6
grp3_1 grp3_2
grp4 grp5 grp6 grp7 grp8
grp_f0 grp_f1 grp_f2 grp_f3 grp_f4
over
twobyte
xx
data
org
align
abspub
public
flat
*** end