A modern Forth 2012 standard compliant system for the vintage SHARP PC-G850(V)(S) pocket computer or any Z80 system (with a few tweaks to port).
Forth850 is under 8K and has 295 words.
A more complete 11K version forth850-full is also included with:
-
single precision floating point math words implemented with a new and efficient Z80 IEEE 754 floating point math library I wrote for Forth850. More floating point math functions that use this library are defined in Forth in examples/MATH.FTH
-
a more capable Forth line editor with replay back feature (cursor keys)
Forth850 includes stack under/overflow checks, dictionary overflow checks and can be interrupted by pressing BREAK.
You can write Forth source code in the PC-G850(V)(S) built-in TEXT editor and compile it into Forth850 with the TEXT word included in the full version.
You can extend Forth850 as you wish, including assembly code written on the
machine itself in the PC-G850(V)(S) TEXT editor and assembled with its Z80
Assembler. See ASMDEMO1.FTH for an example with an
explanation. You can also use the Monitor to set breakpoints and run Forth850
from the Monitor with G100
to trigger them.
If you want to rebuild Forth850 from source code, you will need to install the asz80 assembler part of the ASxxxx Cross Assemblers.
If you plan to use parts of Forth850 and/or the optimized Z80 code in a project that you plan to share or redistribute, then please give me credit for my work as per BCD-3 license.
I've implemented Forth850 as efficiently as possible in direct threaded code (DTC) with new Z80 code written from scratch, including faster Z80 integer and float math routines compared to other Z80 Forth implementations. See the technical implementation sections why Forth850 is fast for a DTC implementation. The Forth850 source code is included and extensively documented.
The n-queens benchmark is solved in 0.865 seconds, the fastest Forth implementation of the benchmarks. Forth850 n-queens runs 5 times faster than the C n-queens benchmark on the Sharp PC-G850VS.
In RUN MODE enter MON
to enter the Monitor, then enter USER3FFF
to reserve
16K memory space:
*USER3FFF
FREE:0100-3FFF
Loading via SIO (serial) requires a serial adapter. See my post on the
HP Forum
how to construct one as a DIY project. After reserving memory in the Monitor
as described above, use the R
command to read the forth850.ihx file or the
forth850-full.ihx full version sent from your PC to your PC-G850(V)(S):
*R100
The R
command is used to transmit/receive data in Intel hex format over SIO.
This command is for receiving machine code from a personal computer or other
device. See the Sharp PC-G850(V)(S) manual.
To load via the cassette interface, press BASIC
to return to RUN MODE. Load
forth850.wav using a cassette interface CE-126P or a CE-124:
BLOADM
Load the forth850-full.wav "full version" to include many additional words and floating point words. The full version will continue to evolve with new features.
To return to Forth, enter CALL256
in RUN MODE.
To return to BASIC from Forth, press the BASIC key. The TEXT key takes you to the TEXT editor.
To turn the machine off, press the OFF key. The machine will also power off automatically after about 10 minutes waiting for user input at the prompt.
Memory allocation can be adjusted without affecting the Forth dictionary.
In RUN MODE enter MON
to enter the Monitor, then enter USERaddr
with an
upper address addr
larger than 23ff
(9K bytes.) If words are added to
Forth850, you must make sure that addr
is large enough, i.e. equal or larger
than the hex value displayed with:
HERE #708 + HEX . DECIMAL
23FF
In the Monitor specify USERaddr
with the address displayed. This leaves
about 200 bytes free dictionary space plus 40 bytes for the "hold area" to run
Forth850. The largest size is USER75FF
which gives about 21K free dictionary
space (but there won't be space left on the machine for files, BASIC or TEXT.)
Forth850 is 2012 standard compliant. For help, see the manual included with Forth for the Sharp PC-E500(S) and Forth 2012 Standard.
Forth850 implements a subset of the standard Forth words. A list of Forth850 words with an explanation is given below.
List of Forth850 built-in words. Reference implementations in Forth are included when applicable, although many words are implemented in Z80 code for speed rather than in Forth.
-- ; R: -- ip call colon definition; runtime of the : compile-only word
-- ; R: ip -- return to caller from colon definition; runtime of the ; compile-only word
-- ; R: ip -- return to caller from colon definition; runtime of the EXIT compile-only word
-- ; R: ip -- set LASTXT cfa to ip and return from colon definition; a runtime word compiled by the DOES> compile-only word
addr -- addr ; R: -- ip calls the DOES> definition with pfa addr; a runtime word compiled by the DOES> compile-only word coded as call dodoes
-- addr leave parameter field address (pfa) of variable; runtime word of a VARIABLE coded as call dovar
-- x fetch value; runtime word of a VALUE coded as call doval
-- dx fetch double value; runtime word of a 2VALUE coded as call dotwoval
-- x fetch constant; runtime word of a CONSTANT coded as call docon
-- x fetch double constant; runtime word of a 2CONSTANT coded as call dotwocon
-- execute deferred word; runtime word of a DEFER coded as call dodef
-- x fetch literal; runtime word compiled by EVALUATE, INTERPRET and NUMBER
-- x1 x2 fetch double literal; runtime word compiled by EVALUATE, INTERPRET and NUMBER
-- c-addr u fetch literal string; runtime word compiled by S" and ."
-- 0 leave constant 0
0 CONSTANT 0
-- 1 leave constant 1
1 CONSTANT 1
-- -1 leave constant -1
-1 CONSTANT -1
-- 32 leave constant 32 (space)
#32 CONSTANT BL
-- c-addr leave address of the PAD; the PAD is a free buffer space of 256 bytes not used by Forth850
-- c-addr leave address of TIB; the terminal input buffer used by Forth850
-- c-addr leave address of the next temp string buffer; switches between two string buffers of 256 free bytes each; used by S" to store a string when interpreting
x -- drop TOS
x -- x x duplicate TOS
x -- x x or 0 -- 0 duplicate TOS if nonzero
x1 x2 -- x2 x1 swap TOS with 2OS
x1 x2 -- x1 x2 x1 copy 2OS over TOS
x1 x2 x3 -- x2 x3 x1 rotate cells
: ROT >R SWAP R> SWAP ;
x1 x2 x3 -- x3 x1 x2 undo (or back, or left) rotate cells
: -ROT ROT ROT ;
x1 x2 -- x2 nip 2OS
: NIP SWAP DROP ;
x1 x2 -- x2 x1 x2 tuck TOS under 2OS
: TUCK SWAP OVER ;
xd1 xd2 -- xd1 drop double TOS
: 2DROP DROP DROP ;
xd -- xd xd duplicate double TOS
: 2DUP OVER OVER ;
xd1 xd2 -- xd2 xd1 swap double TOS with double 2OS
: 2SWAP ROT >R ROT R> ;
: 2SWAP 3 ROLL 3 ROLL ;
xd1 xd2 -- xd1 xd2 xd1 copy double 2OS over double TOS
: 2OVER >R >R 2DUP R> R> 2SWAP ;
: 2OVER 3 PICK 3 PICK ;
-- u parameter stack depth
: DEPTH sp0 @ SP@ - 2- 2/ ;
... -- purge parameter stack
: CLEAR sp0 @ SP! ;
-- display parameter stack
: .S DEPTH 0 ?DO sp0 @ I 2+ CELLS - ? LOOP ;
-- addr fetch stack pointer
addr -- store stack pointer
x -- ; R: -- x move TOS to the return stack
x -- x ; R: -- x duplicate TOS to the return stack, a single word for DUP >R
R: x -- ; -- x move cell from the return stack
R: x -- ; -- drop cell from the return stack, a single word for R> DROP
R: x -- x ; -- x fetch cell from the return stack
x1 x2 -- ; R: -- x1 x2 move double TOS to the return stack, a single word for SWAP >R >R
R: x1 x2 -- ; -- x1 x2 move double cell from the return stack, a single word for R> R> SWAP
R: x1 x2 -- x1 x2 ; -- x1 x2 fetch double cell from the return stack
-- addr fetch return stack pointer
addr -- store return stack pointer
xu ... x0 u -- xu ... x0 xu pick u'th cell from the parameter stack; 0 PICK is the same as DUP; 1 PICK is the same as OVER
: PICK 1+ CELLS SP@ + @ ;
addr -- x fetch from cell
c-addr -- char fetch char
addr -- x1 x2 fetch from double cell
: 2@ DUP CELL+ @ SWAP @ ;
x addr -- store in cell
x -- store in value; runtime of the TO compile-only word
char c-addr -- store char
x1 x2 addr -- store in double cell
: 2! TUCK ! CELL+ ! ;
dx -- store in double value; runtime of the TO compile-only word
n addr -- increment cell
n -- increment value; runtime of the +TO compile-only word
addr -- store TRUE (-1) in cell
: ON -1 SWAP ! ;
addr -- store FALSE (0) in cell
: OFF 0 SWAP ! ;
n1 n2 -- n3 sum n1+n2
d1 n -- d2 double sum d1+n
d1 d2 -- d3 double sum d1+d2
: D+ >R M+ R> + ;
n1 n2 -- n3 difference n1-n2
d1 d2 -- d3 double difference d1-d2
: D- DNEGATE D+ ;
u1 u2 -- ud unsigned double product u1*u2
n1 n2 -- d signed double product n1*n2
: M*
2DUP XOR >R
ABS SWAP ABS UM*
R> 0< IF DNEGATE THEN ;
n1|u1 n2|u2 -- n3|u3 signed and unsigned product n1*n2
: * UM* DROP ;
ud1 u -- ud2 unsigned double product ud1*u
: UMD* DUP>R UM* DROP SWAP R> UM* ROT + ;
d1 n -- d2 signed double product d1*n
: MD*
2DUP XOR >R
ABS -ROT DABS ROT
UMD*
R> 0< IF DNEGATE THEN ;
ud u1 -- u2 u3 unsigned remainder and quotient ud/u1; the result is undefined when u1 = 0
d1 n1 -- n2 n3 symmetric remainder and quotient d1/n1 rounded towards zero; the result is undefined when n1 = 0
: SM/REM
2DUP XOR >R
OVER >R
ABS -ROT DABS ROT
UM/MOD
R> 0< IF SWAP NEGATE SWAP THEN
R> 0< IF NEGATE THEN ;
d1 n1 -- n2 n3 floored signed modulus and quotient d1/n1 rounded towards negative (floored); the result is undefined when n1 = 0
: FM/MOD
DUP>R
SM/REM
DUP 0< IF
SWAP R> + SWAP 1-
ELSE
RDROP
THEN ;
n1 n2 -- n3 n4 symmetric remainder and quotient n1/n2; the result is undefined when n2 = 0
: /MOD SWAP S>D ROT SM/REM ;
n1 n2 -- n3 symmetric remainder of n1/n2; the result is undefined when n2 = 0
: MOD /MOD DROP ;
n1 n2 -- n3 quotient n1/n2; the result is undefined when n2 = 0
: / /MOD NIP ;
n1 n2 n3 -- n4 n5 product with symmetric remainder and quotient n1*n2/n3; the result is undefined when n3 = 0
: */MOD -ROT M* ROT SM/REM ;
n1 n2 n3 -- n4 product with quotient n1*n2/n3; the result is undefined when n3 = 0
: */ */MOD NIP ;
d1 n1 n2 -- d2 double product with quotient d1*n1/n2; the result is undefined when n2 = 0
: M*/ >R MD* R> SM/REM NIP ;
x1 x2 -- x1&x2 bitwise and x1 with x2
x1 x2 -- x1|x2 bitwise or x1 with x2
x1 x2 -- x1^x2 bitwise xor x1 with x2
x1 x2 -- flag true if x1 = x2
x1 x2 -- flag true if x1 <> x2
n1 n2 -- flag true if n1 < n2 signed
: <
2DUP XOR 0< IF
DROP 0<
EXIT
THEN
- 0< ;
n1 n2 -- flag true if n1 > n2 signed
: > SWAP < ;
u1 u2 -- flag true if u1 < u2 unsigned
: U<
2DUP XOR 0< IF
NIP 0<
EXIT
THEN
- 0< ;
u1 u2 -- flag true if u1 > u2 unsigned
: U> SWAP U< ;
x -- flag true if x = 0
n -- flag true if n < 0
dx -- flag true if dx = 0
: D0= OR 0= ;
d -- flag true if d < 0
: D0< NIP 0< ;
n -- d widen single to double
d -- n narrow double to single; may throw -11 "result out of range" valid range is -32768 to 65535
n1 n2 -- n3 signed max of n1 and n2
: MAX
2DUP < IF SWAP THEN
DROP ;
n1 n2 -- n3 signed min of n1 and n2
: MIN
2DUP > IF SWAP THEN
DROP ;
u1 u2 -- u3 unsigned max of u1 and u2
: UMAX
2DUP U< IF SWAP THEN
DROP ;
u1 u2 -- u3 unsigned min of u1 and u2
: UMIN
2DUP U> IF SWAP THEN
DROP ;
x1 x2 x3 -- flag true if x1 is within x2 up to x3 exclusive
: WITHIN OVER - >R - R> U< ;
x1 -- x2 one's complement ~x1
: INVERT 1+ NEGATE ;
: INVERT -1 XOR ;
n1 -- n2 two's complement -n1
: NEGATE 0 SWAP - ;
: NEGATE INVERT 1+ ;
n1 -- n2 absolute value |n1|
: ABS DUP 0< IF NEGATE THEN ;
d1 -- d2 two's complement -d1
: DNEGATE SWAP INVERT SWAP INVERT 1 M+ ;
d1 -- d2 absolute value |d1|
: DABS DUP 0< IF DNEGATE THEN ;
x1 u -- x2 logical shift left x1 << u
x1 u -- x2 logical shift right x1 >> u
n1 -- n2 increment n1+1
: 1+ 1 + ;
n1 -- n2 increment n1+2
: 2+ 2 + ;
n1 -- n2 decrement n1-1
: 1- 1 - ;
n1 -- n2 decrement n1-2
: 2- 2 - ;
n1 -- n2 arithmetic shift left n1 << 1
: 2* 2 * ;
n1 -- n2 arithmetic shift right n1 >> 1
: 2/ 2 / ;
c-addr1 -- c-addr2 u convert counted string to string
: COUNT DUP 1+ SWAP C@ ;
c-addr1 u1 c-addr2 u2 -- -1|0|1 compare strings, leaves -1 = less or 0 = equal or 1 = greater
c-addr1 u1 c-addr2 u2 -- flag true if strings match
: S= COMPARE 0= ;
c-addr1 u1 c-addr2 u2 -- c-addr3 u3 flag true if the second string is in the first; leaves matching address, remaining length and true; or leaves the first string and false
c-addr1 c-addr2 u -- move u bytes from c-addr1 to c-addr2 (from begin)
: CMOVE
SWAP >R
BEGIN DUP WHILE
NEXT-CHAR R@ C!
R> 1+ >R
REPEAT
RDROP
2DROP ;
c-addr1 c-addr2 u -- move u bytes from c-addr1 to c-addr2 up (from end)
c-addr1 c-addr2 u -- move u bytes from c-addr1 to c-addr2
: MOVE
-ROT
2DUP U< IF
ROT CMOVE>
ELSE
ROT CMOVE
THEN ;
c-addr u char -- fill memory with char
c-addr u -- fill memory with zeros
: ERASE 0 FILL ;
c-addr u -- fill memory with 0x20 (BL) chars
: ERASE BL FILL ;
c-addr u1 char -- c-addr u2 truncate a string up to a matching char; leaves the string if char not found; char = 0x20 (BL) chops 0x00 to 0x20 (white space and control)
c-addr1 u1 char -- c-addr2 u2 trim initial chars from a string; char = 0x20 (BL) trims 0x00 to 0x20 (white space and control)
c-addr u1 char -- c-addr u2 trim trailing chars from a string; char = 0x20 (BL) trims 0x00 to 0x20 (white space and control)
c-addr u1 -- c-addr u2 trim trailing white space and control characters from a string
: -TRAILING BL -TRIM ;
c-addr1 u1 n -- c-addr2 u2 slice n characters off the start of a string
: /STRING ROT OVER + -ROT - ;
c-addr1 u1 -- c-addr2 u2 char get next char from a string; increments the string address and decrements its length by one
: NEXT-CHAR OVER C@ >R 1- SWAP 1+ SWAP R> ;
: NEXT-CHAR OVER C@ -ROT 1- SWAP 1+ SWAP ROT ;
u -- set cursor column 0 to 23
u -- set cursor row 0 to 5
-- u fetch cursor column 0 to 23, or 24 when beyond the right window edge
-- u fetch cursor row 0 to 5
u1 u2 -- set column x to u1 (0 to 23) and row y to u2 (0 to 5)
: AT-XY Y! X! ;
char -- emit char to screen; supports the following control codes: 8 (BS backspace, cursor left), 9 (TAB), 10 (LF line feed), 11 (VT scroll), 12 (FF clear screen), 13 (CR carriage return), 28 (cursor right), 29 (cursor left), 30 (cursor up), 31 (cursor down)
c-addr u -- type string to output; string may contain control codes, see EMIT
: TYPE
BEGIN DUP WHILE
NEXT-CHAR EMIT
REPEAT
2DROP ;
-- carriage return and line feed
: CR $A EMIT ;
-- emit a space (BL)
: SPACE BL EMIT ;
n -- emit n spaces (zero or negative n does nothing)
: SPACES
DUP 0< IF
DROP
EXIT
THEN
0 ?DO SPACE LOOP ;
-- clear screen
: PAGE $C EMIT ;
-- addr variable with numeric base for conversion
VARIABLE BASE
-- set BASE to 10
: DECIMAL #10 BASE ! ;
-- set BASE to 16
: HEX #16 BASE ! ;
-- addr hold pointer
0 VALUE HP
-- begin pictured numeric output
: <# HERE h_size + TO HP ;
char -- hold char for pictured numeric output
: HOLD HP 1- DUP TO HP C! ;
ud1 -- ud2 hold digit
: #
0 BASE @ UM/MOD >R
BASE @ UM/MOD
SWAP DUP #9 > IF
#7 +
THEN
'0 + HOLD
R> ;
ud -- 0 0 hold all remaining digits
: #S BEGIN # 2DUP D0= UNTIL ;
n -- hold minus sign if n < 0
: SIGN 0< IF '- HOLD THEN ;
ud -- c-addr u end pictured numeric output, leave string
: #> 2DROP HP HERE h_size + OVER - ;
d +n -- output signed double d right aligned in field of +n chars wide
: D.R -ROT TUCK DABS <# #S ROT SIGN #> ROT OVER - SPACES TYPE ;
d -- output signed double d with a trailing space
: D. 0 D.R SPACE ;
u +n -- output unsigned u right aligned in field of +n chars wide
: U.R 0 SWAP D.R ;
u -- output unsigned u with a trailing space
: U. 0 D. ;
n +n -- output signed n right aligned in field of +n chars wide
: .R SWAP S>D ROT D.R ;
n -- output signed n with a trailing space
: . S>D D. ;
addr -- output signed cell stored at addr
: ? @ . ;
u1 u2 -- output byte u1 to port u2
u1 -- u2 input from port u1
c-addr u -- draw pixel patterns on screen at xy; writes string c-addr u of pixel patterns at xy; specify xy with AT-XY, xy not changed after DRAW
c-addr u -- view screen pixels at xy; read string of screen pixel patterns at xy into buffer c-addr u specify xy with AT-XY, xy not changed after VIEW
+n -- reverse video of the +n characters displayed at xy; specify xy with AT-XY
-- x check for key press and read key code of a key is pressed; 0x00 = no key pressed and 0x52 = multiple keys pressed
-- char wait and read key; leaves ASCII char or special key code: ON =$05, BS =$08, DEL =$09, CA =$0b, CLS =$0c, ENTER =$0d, DIGIT =$0e, F-E =$0f, INS =$12, ANS =$15, CONST =$17, RCM =$19, M+ =$1a, M- =$1b, right =$1c, left =$1d, up =$1e, down =$1f; a space is produced for the TAB key by the GETCHR system call, calc keys and BASIC keys produce BASIC tokens as key code $fe: SIN =$fe register B = $95 BASIC token for SIN (ignored)
-- char display cursor and wait to read key; same as GETKEY leaves ASCII char or special key code
c-addr +n1 n2 n3 n4 -- c-addr +n5 edit buffer c-addr; buffer size +n1; string in buffer has length n2; place cursor at n3; non-editable left margin n4; leaves c-addr and length +n5
c-addr +n1 -- +n2 accept user input into buffer c-addr +n1; leaves length +n2
: ACCEPT 0 0 0 EDIT NIP ;
-- addr variable with offset into input buffer (TIB)
VARIABLE >IN
-- 0|-1 value with 0 = source input or -1 = string input
0 VALUE SOURCE-ID
-- c-addr u double value with input source
TIB 0 2VALUE SOURCE
-- flag attempt to refill the input buffer; leaves false when end of input
char "" -- skips chars in input when present, 0x20 (BL) skips 0x00 to 0x20 (white space and control)
: SKIPS SOURCE >IN @ /STRING ROT TRIM DROP SOURCE DROP - >IN ! ;
char "ccc" -- c-addr u parse "ccc" up to char when present
: PARSE SOURCE >IN @ /STRING ROT CHOP DUP 1+ >IN @ + SOURCE NIP UMIN >IN ! ;
char "ccc" -- c-addr u parse char-delimited word; may throw -18 "parsed string overflow"
: PARSE-WORD
DUP SKIPS PARSE
DUP tmp_size-1 U> IF -18 THROW THEN ;
c-addr u -- c-addr u check if name is valid; may throw -16 "attempt to use a zero-length string as a name"; may throw -19 "definition name too long"
: CHECK-NAME
DUP 0= IF -16 THROW THEN
DUP length_bits U> IF -19 THROW THEN ;
"name" -- c-addr u parse space-delimited name; check if name length is valid
: PARSE-NAME BL PARSE-WORD CHECK-NAME ;
"ccc" -- start a comment block; parse and skip input up to the closing )
: (
') PARSE
BEGIN
+ DROP
SOURCE + = IF
DROP REFILL
ELSE
C@ ') <> IF
REFILL
ELSE
FALSE
THEN
THEN
0= UNTIL ; IMMEDIATE
"ccc" -- start a comment line; parse and skip input up to the end of line; note that the PC-G850 symbol for \ is ÂĄ
: \ $A PARSE 2SROP ;
"ccc" -- emit CR then type "ccc" up to the closing )
: .( ') PARSE CR TYPE ; IMMEDIATE
char -- n convert char digit to numeric digit when within BASE; leaves -1 if char is invalid
ud1 c-addr1 u1 -- ud2 c-addr2 u2 convert string to number; updates accumulated double ud1 to ud2; leaves string with the remaining unconvertable chars or empty
: >NUMBER
BEGIN DUP WHILE
NEXT-CHAR >DIGIT
DUP 0< IF
DROP -1 /STRING
EXIT
THEN
>R
2SWAP
BASE @ UMD*
R> M+
2SWAP
REPEAT ;
-- flag true if >DOUBLE or NUMBER produced a double
0 VALUE DBL
c-addr u -- d true | false convert string to signed double; leaves the double and true if string is converted; leaves false if string is unconvertable
lfa -- nt convert link field address to name token (nfa)
nt -- c-addr u convert name token (nfa) to string
nt -- xt convert name token (nfa) to execution token (cfa)
xt -- nt convert execution token (cfa) to name token (lfa); may throw -24 "invalid numeric argument"
xt -- pfa convert execution token to parameter field address
c-addr u -- c-addr 0 | xt 1 | xt -1 search dictionary for matching word; leaves execution token and 1 = immediate or -1 = not immediate; leaves c-addr and 0 when not found
"name" -- xt parse name and get execution token; may throw -13 "undefined word"
: ' PARSE-NAME FIND-WORD 0= IF -13 THROW THEN ;
-- display context vocabulary words
-- addr address of free memory after the dictionary; new definitions are added here; note that numeric output words use HERE for conversion
-- xt leaves the last execution token defined
0 VALUE LASTXT
-- addr compilation state; STATE @ leaves TRUE when compiling; STATE @ leaves FALSE when interpreting
VARIABLE STATE
-- switch state to interpreting
: [ STATE OFF ;
-- switch state to compiling
: ] STATE ON ;
-- hide the last definition
: HIDE CURRENT @ L>NAME DUP C@ smudge_bits OR SWAP C! ;
-- reveal the last definition
: REVEAL CURRENT @ L>NAME DUP C@ ~smudge_bits AND SWAP C! ;
-- make the last definition immediate
: IMMEDIATE CURRENT @ L>NAME DUP C@ immediate_bits OR SWAP C! ;
-- check if compiling; may throw -14 "interpreting a compile-only word"
-- ; C: x -- check if compiled control structure matches x; may throw -14 "interpreting a compile-only word"; may throw -22 "control structure mismatch"
-- u unused dictionary space
: UNUSED top @ HERE - ;
n -- allocate n bytes starting from HERE in the dictionary; undo the last ALLOT with negative n to reclaim memory (only do this when no new words are defined); may throw -8 "dictionary overflow"
xt -- append execution token to dictionary; may throw -8 "dictionary overflow"
: COMPILE, , ;
x -- append cell to dictionary; may throw -8 "dictionary overflow"
char -- append char to dictionary; may throw -8 "dictionary overflow"
x1 x2 -- append double cell to dictionary; may throw -8 "dictionary overflow"
: 2, , , ;
"name" -- parse name and append dictionary entry with name; set LASTXT to HERE; may throw -8 "dictionary overflow"
: NFA, PARSE-NAME HERE CURRENT @ , CURRENT ! DUP C, HERE SWAP DUP ALLOT CMOVE HERE TO LASTXT ;
addr -- append cfa call addr to dictionary; may throw -8 "dictionary overflow"
-- addr colon_sys append cfa colon definition to dictionary; make CONTEXT the CURRENT vocabulary; start compiling; may throw -8 "dictionary overflow"
: CFA:, ] HERE colon_sys ['] (:) CFA, CURRENT TO CONTEXT ;
"name" -- postpone compile action of name; if name is immediate, then compile name instead of executing it; otherwise compile name into the current colon definition; can be used to create macros, e.g. : TRUE POSTPONE -1 ; IMMEDIATE; may throw -13 "undefined word"; may throw -14 "interpreting a compile-only word"
-- ; C: "name" -- addr colon_sys define name and start compiling
: : NFA, HIDE CFA:, ;
-- ; C: addr colon_sys -- end colon definition and stop compiling; may throw -14 "interpreting a compile-only word"; may throw -22 "control structure mismatch"
: ; ?COMP colon_sys <> IF -22 THROW THEN DROP POSTPONE (;) REVEAL [ ; IMMEDIATE
-- exit colon definition
: EXIT ?COMP POSTPONE (EXIT) ; IMMEDIATE
"name" -- ; -- addr create name; executing name leaves address (HERE addr after CREATE)
: NFA, ['] (VAR) CFA, ;
-- ; ... -- ... change CREATE name behavior to execute code after DOES>
: DOES> ?COMP POSTPONE (;CODE) ['] (DOES) CFA, ; IMMEDIATE
"name" -- ; -- addr define a variable; executing name leaves address of value (initialized to zero)
: VARIABLE CREATE 0 , ;
"name" -- ; -- addr define a double variable; executing name leaves address of double value (initialized to zero)
: 2VARIABLE CREATE 0 0 2, ;
x "name" -- ; -- x define a constant; executing name leaves x
: CONSTANT NFA, ['] (CON) CFA, , ;
: CONSTANT CREATE , DOES> @ ;
x1 x2 "name" -- ; -- x1 x2 define a double constant; executing name leaves x1 x2
: 2CONSTANT NFA, ['] (2CON) CFA, 2, ;
: 2CONSTANT CREATE 2, DOES> 2@ ;
x "name" -- ; -- x define a value; executing name leaves x
: VALUE NFA, ['] (VAL) CFA, , ;
dx "name" -- ; -- dx define a double value; executing name leaves dx
: 2VALUE NFA, ['] (2VAL) CFA, 2, ;
"name" -- ; x -- assign value name; may throw -32 "invalid name argument"
: TO
'
DUP VALUE? IF
>BODY
STATE @ IF
POSTPONE (TO)
,
EXIT
THEN
!
EXIT
THEN
DUP 2VALUE? IF
>BODY
STATE @ IF
POSTPONE (2TO)
,
EXIT
THEN
2!
EXIT
THEN
#-32 THROW ; IMMEDIATE
"name" -- ; n -- increment value name; may throw -32 "invalid name argument"
: +TO
'
DUP VALUE? IF
>BODY
STATE @ IF
POSTPONE (+TO)
,
EXIT
THEN
+!
EXIT
THEN
#-32 THROW ; IMMEDIATE
"name" -- ; ... -- ... define a deferred name
: DEFER NFA, ['] (DEF) CFA, ['] UNDEF , ;
-- throw -256 "execution of an uninitialized deferred word"
: UNDEF -256 THROW ;
xt1 xt2 -- store xt1 in deferred xt2
: DEFER! >BODY ! ;
xt1 -- xt2 fetch execution token from deferred xt1
: DEFER@ >BODY @ ;
xt "name" -- assign execution token to deferred name; may throw -32 "invalid name argument"
: IS
'
DUP DEFER? IF
STATE @ IF
LITERAL
POSTPONE DEFER!
EXIT
THEN
DEFER!
EXIT
THEN
#-32 THROW ; IMMEDIATE
"name" -- xt fetch execution token of deferred name; may throw -32 "invalid name argument"
: ACTION-OF
'
DUP DEFER? IF
STATE @ IF
LITERAL
POSTPONE DEFER@
EXIT
THEN
DEFER@
EXIT
THEN
#-32 THROW ; IMMEDIATE
x -- ; -- x compile a literal
: LITERAL ?COMP POSTPONE (LIT) , ; IMMEDIATE
x1 x2 -- ; -- x1 x2 compile a double literal
: 2LITERAL ?COMP POSTPONE (2LIT) 2, ; IMMEDIATE
c-addr u -- ; -- c-addr u compile a string literal; max literal string length is 255
: SLITERAL
?COMP
DUP 255 U> IF -18 THROW THEN
POSTPONE (SLIT)
DUP C,
HERE OVER ALLOT SWAP CMOVE ; IMMEDIATE
"ccc" -- ; -- type "ccc" (compiled)
: ." '" PARSE SLITERAL POSTPONE TYPE ; IMMEDIATE
"ccc" -- ; -- c-addr u leave string "ccc" (compiled and interpreted)
: S"
'" PARSE
STATE @ IF
SLITERAL
EXIT
THEN
TMP SWAP
2DUP 2>R
CMOVE
2R> ; IMMEDIATE
xt -- flag true if xt is a VALUE
: VALUE? DUP C@ $CD = SWAP 1+ @ ['] (VAL) = AND ;
xt -- flag true if xt is a 2VALUE
: 2VALUE? DUP C@ $CD = SWAP 1+ @ ['] (2VAL) = AND ;
xt -- flag true if xt is a DEFER word
: DEFER? DUP C@ $CD = SWAP 1+ @ ['] (DEF) = AND ;
-- addr only permit FORGET past the dictionary FENCE address
0 VALUE FENCE
"name" -- delete name and all following definitions; beware of vocabulary definitions crossings; may throw -15 "invalid FORGET"
[']
"name" -- ; -- xt compile xt of name as literal; may throw -14 "interpreting a compile-only word"
: ['] ?COMP ' LITERAL ; IMMEDIATE
... -- ... recursively call the currently defined word; may throw -14 "interpreting a compile-only word"
: RECURSE ?COMP LASTXT COMPILE, ; IMMEDIATE
-- check parameter stack bounds; may throw -3 "stack overflow"; may throw -4 "stack underflow"
x -- branch if x = 0; runtime of the UNTIL compile-only word
x -- branch if x = 0; runtime of the IF and WHILE compile-only words
-- branch; runtime of the AGAIN and REPEAT compile-only words
-- branch; runtime of the AHEAD, ELSE and ENDOF compile-only words
x1 x2 -- x1 or x1 x2 -- branch if x1 <> x2; runtime of the OF compile-only word
-- repeat loop unless loop counter crosses the limit; runtime of the LOOP compile-only word
-- increment counter and repeat loop unless counter crosses the limit; runtime of the +LOOP compile-only word
n1|u1 n2|u2 -- begin loop with limit n1|u1 and initial value n2|u2; skip loop when zero trip loop; runtime of the ?DO compile-only word
n1|u1 n2|u2 -- begin loop with limit n1|u1 and initial value n2|u2; loop at least once; runtime of the DO compile-only word
R: ... -- remove loop parameters; runtime of the UNLOOP compile-only word
-- discard the loop parameters and exit the innermost do-loop; runtime of the LEAVE compile-only word
-- ; C: -- addr orig branch ahead to THEN; may throw -14 "interpreting a compile-only word"
-- ; C: -- addr dest begin WHILE REPEAT; may throw -14 "interpreting a compile-only word"
-- ; C: addr dest -- branch back to BEGIN; may throw -14 "interpreting a compile-only word"; may throw -22 "control structure mismatch"
x -- ; C: addr dest -- branch back to BEGIN if x = 0; may throw -14 "interpreting a compile-only word"; may throw -22 "control structure mismatch"
x -- ; C: -- addr orig branch to closest ELSE or THEN if x = 0; may throw -14 "interpreting a compile-only word"
-- ; C: addr orig -- close AHEAD, IF, ELSE; may throw -14 "interpreting a compile-only word"; may throw -22 "control structure mismatch"
-- ; C: addr orig -- addr orig close IF and branch to THEN; may throw -14 "interpreting a compile-only word"; may throw -22 "control structure mismatch"
x -- ; C: addr sys -- addr orig addr sys branch to exit REPEAT if x = 0; may throw -14 "interpreting a compile-only word"
-- ; C: addr orig addr dest -- branch back to BEGIN after WHILE; may throw -14 "interpreting a compile-only word"; may throw -22 "control structure mismatch"
n1|u1 n2|u2 -- ; C: -- addr do_sys begin loop from initial value n2|u2 to the limit n1|u1; loop at least once; may throw -14 "interpreting a compile-only word"
n1|u1 n2|u2 -- ; C: -- addr do_sys begin loop from initial value n2|u2 to the limit n1|u1; skip loop when zero trip loop; may throw -14 "interpreting a compile-only word"
-- ; C: addr do_sys -- repeat loop unless loop counter crosses the limit; may throw -14 "interpreting a compile-only word"; may throw -22 "control structure mismatch"
n|u -- ; C: addr do_sys -- increment counter and repeat loop unless counter crosses the limit; may throw -14 "interpreting a compile-only word"; may throw -22 "control structure mismatch"
-- remove loop parameters; may throw -14 "interpreting a compile-only word"
-- exit the innermost do-loop; may throw -14 "interpreting a compile-only word"
-- n counter of innermost do loop
-- n counter of outer (second) do loop
x -- ; C: -- 0 begin CASE ENDCASE switch; may throw -14 "interpreting a compile-only word"
x1 x2 -- x1 or x1 x2 -- ; C: n1 -- orig n2 take CASE arm if x1 = x2; otherwise branch to next OF; may throw -14 "interpreting a compile-only word"
-- ; C: n -- orig n branch to ENDCASE; may throw -14 "interpreting a compile-only word"; may throw -22 "control structure mismatch"
x -- ; C: n*orig n -- close CASE; may throw -14 "interpreting a compile-only word"; may throw -22 "control structure mismatch"
-- addr variable with saved return stack pointer
VARIABLE HANDLER
... xt -- ... execute execution token xt
... xt -- ... 0 or xt -- n execute xt leaving nonzero exception code n or 0 when no exception occurred; when an exception was caught, the parameter and return stacks are restored to their state before execution of xt
: CATCH
SP@ >R
HANDLER @ >R
RP@ HANDLER !
EXECUTE
R> HANDLER !
RDROP
0 ;
0 -- or ... n -- ... n throw exception n if nonzero
: THROW
?DUP IF
HANDLER @ ?DUP IF
RP!
R> HANDLER !
R> SWAP >R
SP!
DROP
R>
EXIT
THEN
>R CLEAR R>
ERROR
REPL
THEN ;
... -- ; R: ... -- throw -56 "QUIT"; no exception error is displayed; unlike ABORT, the parameter stack is not cleared
: QUIT -56 THROW ;
... flag c-addr u -- ; R: ... -- if flag then abort with string message unless an active catch is present; runtime of the ABORT" compile-only word; throw -2 "ABORT""
: (ABORT")
ROT IF
HANDLER @ IF
2DROP
ELSE
TYPE
THEN
-2 THROW
THEN
2DROP ;
... flag -- ; C: "ccc" -- ; R: ... -- if flag then abort with string message unless an active catch is present; throw -2 "ABORT""; clears the parameter stack unless caught with CATCH; may throw -14 "interpreting a compile-only word"
: ABORT" ?COMP POSTPONE S" POSTPONE (ABORT") ; IMMEDIATE
... -- ; R: ... -- throw -1 "ABORT"; clears the parameter stack unless caught with CATCH
: ABORT -1 THROW ;
n -- display exception n at the offending location in the input; n = -1 ABORT and n = -2 ABORT" clear the stack; n = -56 QUIT stays silent; List of Forth850 errors:
code | error |
---|---|
-1 | ABORT |
-2 | ABORT" |
-3 | stack overflow |
-4 | stack underflow |
-8 | dictionary overflow |
-10 | division by zero |
-11 | result out of range |
-13 | undefined word |
-14 | interpreting a compile-only word |
-15 | invalid FORGET |
-16 | attempt to use zero-length string as a name |
-18 | parsed string overflow |
-19 | definition name too long |
-22 | control structure mismatch |
-24 | invalid numeric argument |
-28 | user interrupt (BREAK was pressed) |
-32 | invalid name argument (invalid TO name) |
-42 | floating-point divide by zero |
-43 | floating-point result out of range |
-46 | floating-point invalid argument |
-56 | QUIT |
-256 | execution of an uninitialized deferred word |
c-addr u -- n|u|d|ud convert string to number; value DBL is set to -1 when the number is a double; may throw -13 "undefined word" when string is not numeric
-- interpret input while input is available
... c-addr u -- ... evaluate string
-- read-evaluate-print loop
: REPL
rp0 @ RP!
HANDLER OFF
0 TO SOURCE-ID
CR
[
BEGIN
BEGIN ['] REFILL CATCH ?DUP WHILE
ERROR CR
REPEAT
WHILE
SPACE
['] INTERPRET CATCH ?DUP IF
ERROR
REPL
THEN
STATE @ INVERT IF
." OK["
DEPTH 0 U.R
'] EMIT
THEN
CR
REPEAT
BYE ;
-- return to BASIC
-- addr leaves address of link of the last vocabulary context definition
' FORTH VALUE CONTEXT
-- addr leaves address of link of the last current vocabulary definition
' FORTH VALUE CURRENT
-- make CURRENT the CONTEXT vocabulary
: DEFINITIONS CONTEXT TO CURRENT ;
"name" -- define a new vocabulary
: VOCABULARY CREATE , fig_kludge , DOES> TO CONTEXT ;
-- make FORTH the CONTEXT vocabulary
VOCABULARY FORTH
-- 0 leave 0
0 CONSTANT FALSE
-- -1 leave -1
-1 CONSTANT TRUE
xd1 xd2 xd3 -- xd2 xd3 xd1 rotate double cells
: 2ROT 5 ROLL 5 ROLL ;
xu x(u+1) ... x1 x0 u -- x(u+1) ... x1 x0 xu roll u cells on the parameter stack; 0 ROLL does nothing; 1 ROLL is the same as SWAP; 2 ROLL is the same as ROT
d1|ud1 d2|ud2 -- d3|ud3 signed and unsigned double product d1*d2
: D* >R ROT DUP>R -ROT MD* 2R> * 0 SWAP D+ ;
ud1 ud2 -- ud3 ud4 unsigned double remainder and quotient ud1/ud2; the result is undefined when ud2 = 0
d1 d2 -- d3 d4 double symmetric remainder and quotient d1/d2; the result is undefined when d2 = 0
: D/MOD
DUP 3 PICK DUP>R XOR >R
DABS 2SWAP DABS 2SWAP
UD/MOD
R> 0< IF DNEGATE THEN
R> 0< IF 2SWAP DNEGATE 2SWAP THEN ;
d1 d2 -- d3 double symmetric remainder of d1/d2; the result is undefined when d2 = 0
: DMOD D/MOD 2DROP ;
d1 d2 -- d3 double quotient d1/d2; the result is undefined when d2 = 0
: D/ D/MOD 2SWAP 2DROP ;
d1 d2 -- flag true if d1 = d2
: D= D- D0= ;
d1 d2 -- flag true if d1 < d2
: D<
DUP 3 PICK XOR 0< IF
2DROP D0<
EXIT
THEN
D- D0< ;
du1 du2 -- flag true if ud1 < ud2
: DU<
DUP 3 PICK XOR 0< IF
2SWAP 2DROP D0<
EXIT
THEN
D- D0< ;
d1 d2 -- d3 signed double max of d1 and d2
: DMAX
2OVER 2OVER D< IF 2SWAP THEN
2DROP ;
d1 d2 -- d3 signed double min of d1 and d2
: DMIN
2OVER 2OVER D< INVERT IF 2SWAP THEN
2DROP ;
addr -- addr increment to next cell
: CELL+ 2+ ;
n1 -- n2 convert to cell unit
: CELLS 2* ;
n1 -- n1 increment to next char
: CHAR+ 1+ ;
n1 -- n2 convert to char unit
: CHARS ;
c-addr u -- dump memory in hex
: DUMP
BASE @ >R
HEX
BEGIN DUP WHILE
NEXT-CHAR .
REPEAT
2DROP
R> BASE ! ;
c-addr u -- hold string for pictured numeric output
: HOLDS
BEGIN DUP WHILE
1- 2DUP + C@ HOLD
REPEAT
2DROP ;
-- sound the speaker for a short ~2KHz beep
-- wait until no keys are pressed
: KEY-CLEAR BEGIN INKEY 0= UNTIL ;
-- flag true if a key is pressed
: KEY? INKEY 0= 0= ;
char "ccc" -- c-addr parse word as a counted string
: WORD TMP DUP ROT PARSE-WORD ROT 2DUP C! 1+ SWAP CMOVE ;
"name" -- char parse char; note that the syntax 'char is preferred instead of this legacy word
: CHAR PARSE-NAME DROP C@ ;
c-addr -- c-addr 0 | xt 1 | xt -1 search dictionary for counted string; see WORD, COUNT and FIND-WORD
n "name" -- ; -- addr define buffer with n bytes; executing name leaves address of n bytes
: BUFFER: CREATE ALLOT ;
-- xt colon definition without name; leaves execution token of definition to be used or saved
"ccc" -- ; -- c-addr leave counted string "ccc" (compiled); may throw -18 "parsed string overflow"
: C" ?COMP POSTPONE S" POSTPONE DROP POSTPONE 1- ;
xt -- flag true if xt is a MARKER word
"name" -- ; -- define a dictionary marker; executing name deletes marker and all definitions made after; beware of vocabulary definitions crossings
: MARKER
CURRENT
DUP @
HERE
CREATE
, 2,
DOES>
DUP CELL+ 2@
SWAP TO CONTEXT
DUP CONTEXT !
DEFINITIONS
L>NAME NAME> TO LASTXT
@ HERE - ALLOT ;
"name" -- ; -- define a dictionary marker; deletes previously defined name and all following definitions; beware of vocabulary definitions crossings
: ANEW
>IN @ >R
PARSE-NAME FIND-WORD
OVER MARKER?
AND IF
EXECUTE
ELSE
DROP
R> >IN !
MARKER ;
"char" -- ; -- char compile char as literal; note that the syntax 'char is preferred instead of this legacy word; may throw -14 "interpreting a compile-only word"
: [CHAR] ?COMP CHAR LITERAL ; IMMEDIATE
"name" -- ; ... -- ... compile name; note that POSTPONE is preferred instead of this legacy word; may throw -14 "interpreting a compile-only word"
: [COMPILE] ?COMP ' COMPILE, ; IMMEDIATE
-- n counter of outer (third) do loop
-- read and evaluate TEXT editor area with Forth source code; caveat: .( and ( in TEXT cannot span more than one line, they end at EOL
: TEXT
$7973 @ 1+ >R
BEGIN
R> \ -- addr
DUP C@ $FF <> WHILE
2+ DUP C@ SWAP 1+ \ -- len addr
2DUP + >R
SWAP 1- EVALUATE
REPEAT
DROP ;
Floating point values are doubles on the stack. Double words, such as 2DUP, can be used to manipulate floats. Floats can be stored in 2CONSTANT, 2VARIABLE and 2VALUE assigned with TO (but not with +TO.)
Beware that HEX prevents inputting floats and garbles the output of floats.
r1 r2 -- r3 sum r1+r2; may throw -43 "floating-point result out of range"
r1 r2 -- r3 difference r1-r2; may throw -43 "floating-point result out of range"
r1 r2 -- r3 product r1*r2; may throw -43 "floating-point result out of range"
r1 r2 -- r3 quotient r1/r2 may throw -42 "floating-point divide by zero"; may throw -43 "floating-point result out of range"
r1 -- r2 truncate float towards zero
r1 -- r2 floor float towards negative infinity may throw -43 "floating-point result out of range"
r1 -- r2 round float to nearest; may throw -43 "floating-point result out of range"
r1 -- r2 negate float
r1 -- r2 absolute value |r1|
: FABS 2DUP F0< IF FNEGATE THEN ;
r1 r2 -- flag true if r1 = r2
: F= D= ; ( works for IEEE 754 floating point without negative zero and inf/nan )
r1 r2 -- flag true if r1 < r2
: F<
DUP 3 PICK AND 0< IF
2SWAP
D< ; ( works for IEEE 754 floating point without negative zero and inf/nan )
r -- flag true if r = 0.0e0
: F0= D0= ; ( works for IEEE 754 floating point without negative zero and inf/nan )
r -- flag true if r < 0.0e0
: F0< D0< ; ( works for IEEE 754 floating point without negative zero and inf/nan )
r1 r2 -- r3
max of r1 and r2
: FMAX
2OVER 2OVER F< IF 2SWAP THEN
2DROP ;
r1 r2 -- r3
min of r1 and r2
: FMIN
2OVER 2OVER F< INVERT IF 2SWAP THEN
2DROP ;
d -- r widen signed double to float
n -- r widen signed single to float
r -- d narrow float to a signed double; may throw -11 "result out of range"
r -- n narrow float to a signed single; may throw -11 "result out of range"
c-addr u -- r true | false convert string to float; leaves the float and true if string is converted; leaves false if string is unconvertable
r c-addr u -- n flag true convert float to string; store decimal digits of the float in the non-empty buffer c-addr u; leaves decimal exponent n+1 and flag = true if negative
-- +n floating point output precision, the number of decimal digits displayed is 7 by default
7 VALUE PRECISION
r -- output float with a trailing space; output fixed notation when 1e-1 <= |r| < 1e+7, otherwise output scientific notation
: F.
HERE PRECISION REPRESENT DROP IF
'- EMIT
THEN
DUP 0 PRECISION 1+ WITHIN IF
HERE OVER TYPE
'. EMIT
HERE OVER +
PRECISION ROT - '0 -TRIM TYPE SPACE
EXIT
THEN
HERE C@ EMIT
'. HERE C!
HERE PRECISION '0 -TRIM TYPE
'E EMIT
1- . ;
The Forth850 dictionary is organized as follows:
low address
_________
+--->| $0000 | last link is zero (2 bytes)
^ |---------|
| | 3 | length of "(:)" (1 byte)
| |---------|
| | (:) | "(:)" word characters (3 bytes)
| |---------|
| | code | machine code
| |=========|
+<==>+ link | link to previous entry (2 bytes)
^ |---------|
: : :
: : :
: : :
| |=========|
+<==>| link | link to previous entry (2 bytes)
^ |---------|
| | $80+5 | length of "aword" (1 byte) with IMMEDIATE bit set
| |---------|
| | aword | "aword" word characters (5 bytes)
| |---------|
| | code | Forth code and/or data
| |=========|
+<---| link |<--- last link to previous entry (2 bytes)
|---------|
| 7 | length of "my-word" (1 byte)
|---------|
| my-word | "my-word" word characters (7 bytes)
|---------|
| code |<--- LASTXT points to code (last xt)
|=========|<--- HERE pointer
| hold | hold area for numerical output (40 bytes)
|---------|
| |
| free | unused dictionary space
| space |
| |
|=========|<--- dictionary limit
| |
| data | stack of 256 bytes (128 cells)
| stack | grows toward lower addresses
| |<--- SP stack pointer
|=========|
| |
| return | return stack of 256 bytes (128 cells/calls)
| stack | grows toward lower addresses
| |<--- RP return stack pointer
|_________|<--- USER address
<--- USER+1 address
high address set with USER in Monitor MON
A link field points to the previous link field. The last link field at the lowest address of the dictionary is zero.
LASTXT
returns the execution token of the last definition, which is the
location where the machine code of the last word starts.
Forth850 is a Direct Threaded Code Forth implementation. Code is either
machine code or starts with a jump or call machine code instruction of 3 bytes,
followed by Forth code (a sequence of execution tokens in a colon definition)
or data (constants, variables, values and other words created with CREATE
.)
Immediate words are marked with the length byte high bit 7 set ($80). Hidden
words have the "smudge" bit 6 ($40) set. A word is hidden until successfully
compiled. HIDE
hides the last defined word by setting the smudge bit.
REVEAL
reveals it. Incomplete colon definitions with compilation errors
should never be revealed.
The following sections explain parts of the technical implementation of Forth850. I will explain the new Forth system routines, the new Z80 math routines and the string routines.
Forth850 is built with the asz80 assembler and aslink linker.
Forth850 uses direct threaded code (DTC). Faster would be to use subroutine threaded code (STC), but this would significantly increase the overall code size and Forth compilation complexity, which are less desirable for a small Z80-based system.
The following Z80 Forth routines are inspired by the article Moving Forth. However, I've decided to use a different Z80 register mapping that is more efficient:
- BC: instruction pointer (IP)
- DE: top of stack (TOS)
- IY: address of the "next routine", for
jp (iy)
By contrast to the article, having the TOS in DE makes it quicker to perform
address arithmetic with the TOS, because we can exchange DE with HL with ex de,hl
in just 4 CPU cycles. Moving BC to HL takes 8 CPU cycles.
I've placed the return stack pointer (RP) in RAM. There is no advantage to use
register IX for RP as the article suggests. In fact, the colon call and return
have the same cycle counts, but almost all of the return stack operations, such
as >R
, require more cycles with the RP in IX compared to the RP in RAM.
A jump to the "next routine" is with jp (iy)
takes 8 CPU cycles, compared to
a jp next
that takes 10 cycles. Inlining the "next routine" eliminates
this overhead, but increases the code size. Inlining should only be applied to
performance-critical words that are frequently used. See macros NEXT
and
JP_NEXT
defined in the section below.
Fetching an execution token (xt) from the instruction pointer (IP) address, incrementing IP and executing the token takes 38 CPU cycles in the "next routine":
.macro NEXT
ld a,(bc) ; 7 ;
ld l,a ; 4 ;
inc bc ; 6 ;
ld a,(bc) ; 7 ;
ld h,a ; 4 ;
inc bc ; 6 ; [ip++] -> hl with xt
jp (hl) ; 4(38); jump to hl
.endif
The "next routine" cycles contribute to the overhead of DTC, which cannot be
removed to speed up DTC execution. To improve speed by 10% on average, the
fetch and execute routine is inlined with the NEXT
macro for
performance-critical words. When performance is not critical, a JP_NEXT
macro is used, which simply expands into jp (iy)
with the IY register
pointing to the "next routine":
.macro JP_NEXT
jp (iy) ; 8(46); jump to next routine
.endm
Each colon definition in memory starts with a call docol
. The docol
routine associated with the (:)
word saves the instruction pointer in BC on
the return stack and pops the new instruction pointer from the parameter stack
(since call docol
leaves the address after the call on the stack.) The
routine checks for ON/BREAK key and begins executing the colon definition with
the "next routine":
docol: ld hl,(rp) ; 16 ; [rp] -> hl
dec hl ; 6 ;
ld (hl),b ; 7 ;
dec hl ; 6 ;
ld (hl),c ; 7 ; save bc -> [--rp] with caller ip on the return stack
ld (rp),hl ; 16 ; ip - 2 -> rp
pop bc ; 10(68); pop ip saved by call docol
; continue with ON/BREAK key check
cont: in a,(0x1f) ; 11 ; port 0x1f bit 7 is set if ON/BREAK is depressed
add a ; 4 ; test ON/BREAK key
jr c,break ; 7(22); if ON/BREAK pressed then break
; next
next: ld a,(bc) ; 7 ;
ld l,a ; 4 ;
inc bc ; 6 ;
ld a,(bc) ; 7 ;
ld h,a ; 4 ;
inc bc ; 6 ; [bc++] -> hl with xt
jp (hl) ; 4(38); jump to hl
A return from a colon definition with the (;)
word pops the return
instruction pointer off the return stack to continue executing the caller's
next instruction.
doret: ld hl,(rp) ; 16 ; [rp] -> hl
ld c,(hl) ; 7 ;
inc hl ; 6 ;
ld b,(hl) ; 7 ;
inc hl ; 6 ;
ld (rp),hl ; 16(58); restore [rp++] -> bc with ip of the caller
NEXT ; 38 ; continue
A colon call takes 145 cycles (17 + 68 + 22 + 38 cycles) and a colon return takes 96 cycles (58 + 38 cycles.) This includes the 38 cycle overhead of the "next routine" to fetch and execute the next token.
A Forth variable leaves its address on the parameter stack. A call dovar
is
used to push the address on the stack, which is then retrieved to set the new
TOS:
dovar: pop hl ; 10 ; pop hl with pfa addr saved by call dovar
push de ; 11 ; save TOS
ex de,hl ; 4(25); set new TOS to hl with pfa addr
NEXT ; 38 ; continue
Executing a word defined as a variable takes 80 cycles (17 + 25 + 38 cycles), which includes the "next routine" overhead.
A Forth constant or value leaves its value on the parameter stack. A call doval
is used to push the address of the constant/value on the stack. The
constant/value is then fetched:
doval: pop hl ; 10 ; pop hl with pfa addr saved by call doval
push de ; 11 ; save TOS
ld e,(hl) ; 7 ;
inc hl ; 6 ;
ld d,(hl) ; 7(41); set [hl] -> de as new TOS
NEXT ; 38 ; continue
Executing a word defined as a constant or value takes 96 cycles (17 + 41 + 38 cycles), which includes the "next routine" overhead.
The @
fetch and !
store words make good use of ex de,hl
:
fetch: ex de,hl ; 4 ; addr -> hl
ld e,(hl) ; 7 ;
inc hl ; 6 ;
ld d,(hl) ; 7(24); set [hl] -> de as new TOS
NEXT ; 38 ; continue
store: pop hl ; 10 ; pop addr -> hl
ex de,hl ; 4 ; x -> de, addr -> hl
ld (hl),e ; 7 ;
inc hl ; 6 ;
ld (hl),d ; 7 ; de -> [hl] with x
pop de ; 10(44); pop new TOS
NEXT ; 38 ; continue
A Forth definer word that uses CREATE with DOES> to define new words is
compiled to execute the (;CODE)
token with label doscode
, followed by a
call dodoes
to start interpreting the DOES> code:
doscode: ld hl,(lastxt+3) ; LASTXT -> hl with last defined word xt
inc hl ;
ld (hl),c ;
inc hl ;
ld (hl),b ; ip -> [LASTXT+1] overwrite call address
jr doret ; (;) return to caller
dodoes: ld hl,(rp) ; 16 ; [rp] -> hl
dec hl ; 6 ;
ld (hl),b ; 7 ;
dec hl ; 6 ;
ld (hl),c ; 7 ;
ld (rp),hl ; 16 ; save bc -> [--rp] with old ip on the return stack
pop bc ; 10 ; pop bc with new ip of the DOES> routine saved by call dodoes
pop hl ; 10 ; pop pfa addr
push de ; 11 ; save TOS
ex de,hl ; 4(93); set new TOS to hl with pfa addr
NEXT ; 38 ; continue
A word defined by a CREATE/DOES> definer makes a call
to the call dodoes
routine. For example, suppose we define CONSTANT
as follows:
: CONSTANT CREATE , DOES> @ ;
123 CONSTANT X
then CONSTANT
and X
are compiled as:
CONSTANT: call docol
.dw create
.dw comma
.dw doscode
CONSTANT_does: call dodoes
.dw fetch
.dw doret
X: call CONSTANT_does
.dw 123
Executing X
takes 192 cycles (17 + 17 + 24 + 38 + 96 cycles.) When
more optimally defined as a CONSTANT
in code, this takes 96 cycles.
Forth words are parsed with my new CHOP
and TRIM
words that efficiently
parse and extract white-space-delimited words from the input.
Entry:
- DE with TOS: a char to truncate the string with
- 2OS: string length u1
- 3OS: string address c-addr
Exit:
- DE with TOS: truncated string length u2
- 2OS: string address c-addr
Performance: 21 cycles per character for non-BL char to chop, 40 cycles per character for BL to chop white space
chop: ld a,e ; char -> a
exx ; save bc with ip
ex af,af' ; save a with char
pop bc ; pop u1 -> bc
ld e,c ;
ld d,b ; u1 -> de
ld a,c ;
or b ; test bc = 0, 0 -> cf
jr z,2$ ; if bc = 0 then not found
pop hl ;
push hl ; c-addr -> hl
ex af,af' ; restore a with char
cp 0x20 ;
jr z,3$ ; if a = 0x20 then find white space
or a ; 0 -> cf not found
; find char in string
cpir ; 21/16 ; repeat until a = [hl++] or --bc = 0
jr nz,2$ ; if match then
1$: ccf ; complement to correct cpi bc--
2$: ex de,hl ; u1 -> hl
sbc hl,bc ; u1 - bc - cf -> hl
push hl ; save hl as TOS
exx ; restore bc with ip
pop de ; pop new TOS
JP_NEXT ; continue
; find white space in string
3$: cp (hl) ; 7 ; loop to compare a to [hl]
cpi ; 16 ; hl++, bc--
jr nc,1$ ; 7 ; if [hl]<a then found
jp pe,3$ ; 10 ; until bc = 0
jr 1$ ; not found
Entry:
- DE with TOS: char to trim the string by removing them from its beginning
- 2OS: string length u1
- 3OS: string address c-addr1
Exit:
- DE with TOS: updated string length u2
- 2OS: updated string address c-addr2
Performance: 33 cycles to trim non-BL char, 106 cycles to trim white space with BL char
trim: ld a,e ; char -> a
exx ; save bc with ip
pop bc ; u1 -> bc
pop hl ; c-addr1 -> hl
1$: ex af,af' ; 4 ; save a
ld a,c ; 4 ;
or b ; 4 ;
jr z,3$ ; 7 ; if bc <> 0 then
ex af,af' ; 4 ; restore a
2$: cpi ; 16 ; loop
jr nz,4$ ; 7/12 ; while a = [hl++], --bc
jp pe,2$ ; 10 ; until b = 0
3$: push hl ; save hl as 2OS
push bc ; save bc as TOS
exx ; restore bc with ip
pop de ; pop new TOS
JP_NEXT ; continue
4$: cp 0x20 ; 7 ;
jr nz,5$ ; 7 ; if char = 0x20 then
dec hl ; 6 ;
cp (hl) ; 7 ;
inc hl ; 6 ;
jr nc,1$ ; 12 ; if [hl-1] <= 0x20 then keep trimming
5$: inc bc ; correct bc++ for cpi match
dec hl ; correct hl-- for cpi match
jr 3$ ; finalize trimming
To parse a white-space-delimited word is efficiently performed with BL PARSE
where the PARSE
word is defined as:
: PARSE ( char "ccc<char>" -- c-addr u )
SOURCE
>IN @ /STRING
ROT CHOP
DUP 1+ >IN @ +
SOURCE NIP UMIN >IN ! ;
To skip input until the next non-white-space character is efficiently performed
with BL SKIPS
, where SKIPS
is defined as:
: SKIPS ( char "<chars>" -- )
SOURCE >IN @ /STRING
ROT TRIM
DROP
SOURCE DROP - >IN ! ;
The FIND-WORD
word searches the dictionary starting with CONTEXT
for a
matching word. The search is case insensitive. Smudged words are skipped.
Entry:
- DE with TOS: size of the string to search u
- 2OS: address of the string to search c-addr
Exit:
- DE with TOS: 0 = not found, 1 = found immediate, -1 = found (not immediate)
- 2OS: string address if not found or execution token when found
Performance: 95 cycles per dictionary entry, 51 or 102 cycles per character comparison when characters match
findword: ld a,d ;
or a ; test d = 0 high order byte of u
jp nz,zero_next ; if u is too large then set new TOS to 0
sla e ; shift u to compare w/o immediate bit
jp c,zero_next ; if u is too large then set new TOS to 0
jp z,zero_next ; if u = 0 then set new TOS to 0
push de ; save de with 2*u
exx ; save bc with ip
pop bc ; pop 2 * u -> bc
pop de ; pop c-addr -> de
ld hl,(context+3) ; CONTEXT -> hl
jr 3$ ; start searching
; loop over dictionary
1$: pop de ; restore de with c-addr
2$: pop hl ; 10 ; loop, restore hl with lfa
3$: ld a,(hl) ; 7 ;
inc hl ; 6 ;
ld h,(hl) ; 7 ;
ld l,a ; 4 ; [hl] -> hl follow link at hl = lfa
or h ; 4 ;
jr z,6$ ; 7 ; if hl = 0 then not found
push hl ; 11 ; save hl with lfa
inc hl ; 6 ;
inc hl ; 6 ; hl + 2 -> hl with nt (nfa)
ld a,(hl) ; 7 ; word length
add a ; 4 ; shift away immediate bit
cp c ; 4 ; test a = c word length match (both shifted)
jr nz,2$ ; 12(95); if lengths differ then continue searching
; compare string to word
push de ; save de with c-addr
inc hl ; hl++ point to nfa chars
ld b,c ; 2 * u -> b
srl b ; u -> b word length (nonzero)
; loop over word chars
4$: ld a,(de) ; 7 ; loop
cp (hl) ; 7 ; compare [de] = [hl]
jr z,5$ ; 12/7 ; if mismatch then
and 0xdf ; 7 ; make upper case
cp 'A ; 7 ;
jr c,1$ ; 7 ; if a<'A' then continue search
cp 'Z+1 ; 7 ;
jr nc,1$ ; 7 ; if a>'Z' then continue search
xor (hl) ; 7 ;
and 0xdf ; 7 ; case insensitive compare [de] = [hl]
jr nz,1$ ; 7 ; if mismatch then continue search
5$: inc de ; 6 ; de++ point to next char of c-addr
inc hl ; 6 ; hl++ point to next char of word
djnz 4$ ; 13(51/102);until --b = 0
; found a matching word
pop de ; discard saved c-addr
ex (sp),hl ; save hl with xt as 2OS, restore hl with lfa
inc hl ;
inc hl ; hl + 2 -> hl with nt (nfa)
bit immediate_bit,(hl) ; test immediate bit of [hl] word length
exx ; restore bc with ip
jp nz,one_next ; set new TOS to 1 if word is immediate
jp mone_next ; set new TOS to -1
; not found
6$: push de ; save de with c-addr as 2OS
exx ; restore bc with ip
jp zero_next ; set new TOS to 0
JP_NEXT ; continue
I've written the following Z80 math routines. My objective was to make them as efficient as possible. The second objective was to keep the code size small by using tricks with CPU arithmetic and flags.
Entry:
- BC: signed or unsigned multiplier n1
- DE: signed or unsigned multiplicand n2
Exit:
- HL: signed product or unsigned product n3
Perfomance: max 51 cycles x 16 iterations = 816 cycles or max 51 cycles x 8 iterations + 45 x 8 = 768 cycles, excluding entry/exit overhead
mult1616: ld hl,0 ; 0 -> hl
ld a,c ; c -> a low order byte of n1
ld c,b ; b -> c save high order byte of n1
ld b,8 ; 8 -> b loop counter
1$: rra ; 4 ; loop, a >> 1 -> a set cf
jr nc,2$ ; 7 ; if cf = 1 then
add hl,de ; 11 ; hl + de -> hl
2$: sla e ; 8 ;
rl d ; 8 ; de << 1 -> de
djnz 1$ ; 13(51); until --b = 0
ld a,c ; c -> a high order byte of n1
ld b,8 ; 8 -> b loop counter
3$: rra ; 4 ; loop, a >> 1 -> a set cf
jr nc,4$ ; 7 ; if cf = 1 then
add hl,de ; 11 ; hl + de -> hl
4$: sla e ; 8 ;
rl d ; 8 ; de << 1 -> de
djnz 3$ ; 13(51); until --b = 0
ret ; done
We can make an additional speed improvement, which only costs us one more instruction byte. To calculate the high order byte we do not need to iterate over all 8 bits of the high order multiplier stored in register c, but only over the nonzero bits. We also can ignore the lower order result stored in register e. This reduces the max loop iteration cycle time to 32 and 33 per bit. Furthermore, the second loop only runs until the last bit of register c is shifted out. If register c is zero, the second loop does not execute thereby saving hundreds of cycles. We also use jp instead of jr to improve and balance the cycle time per bit:
mult1616: ld hl,0 ; 0 -> hl
ld a,c ; c -> a low order byte of n1
ld c,b ; b -> c save high order byte of n1
ld b,8 ; 8 -> b loop counter
1$: rra ; 4 ; loop, a >> 1 -> a set cf
jr nc,2$ ; 7 ; if cf = 1 then
add hl,de ; 11 ; hl + de -> hl
2$: sla e ; 8 ;
rl d ; 8 ; de << 1 -> de
djnz 1$ ; 13(51); until --b = 0
ld a,h ; h -> a do high order, low order is done
jr 5$ ; jump to shift c and loop
3$: add d ; 4 ; loop, a + d -> d
4$: sla d ; 8 ; d << 1 -> d
5$: srl c ; 8 ; c >> 1 -> c set cf and z if no bits left
jr c,3$ ; 12/7(32); until cf = 0 repeat with addition
jp nz,4$ ; 10(33); until c = 0 repeat without addition
ret ; done
Note: unrolling the loops would improve the speed at the cost of a significant code size increase, which is undesirable for small memory devices.
Entry:
- DE: unsigned multiplicand u1
- BC: unsigned multiplier u2
Exit:
- DE: low order unsigned product u3
- HL: high order unsigned product u3
Perfomance: max 64 cycles x 17 iterations = 1088 cycles, excluding entry/exit overhead
umult1632: xor a ; 0 -> cf
ld l,a ;
ld h,a ; 0 -> hl
ld a,17 ; 17 -> a loop counter
1$: rr h ; 8 ; loop
rr l ; 8 ;
rr d ; 8 ;
rr e ; 8 ; hlde + cf >> 1 -> hlde
jr nc,2$ ; 7 ; if cf = 1 then
add hl,bc ; 11 ; hl + bc -> hl
2$: dec a ; 4 ;
jp nz,1$ ; 10(64); until --a = 0
ret ; done
Note: unrolling the loop would improve the speed at the cost of a significant code size increase, which is undesirable for small memory devices.
Entry:
- BC': low order signed or unsigned multiplicand d1
- DE: high order signed or unsigned multiplicand d1
- DE': low order signed or unsigned multiplier d2
- HL': high order signed or unsigned multiplier d2
Exit:
- HL': low order signed product or unsigned product d3
- HL: high order signed product or unsigned product d3
Perfomance: max 98 cycles x 32 iterations = 3136 cycles, excluding entry/exit overhead
mult3232: ld hl,0 ; 0 -> hl high order d3, de with d2 high order
exx ; save bc with ip
ld a,h ;
push af ; save d1 high order byte 3
ld a,l ;
push af ; save d1 high order byte 2
ld a,b ;
push af ; save d1 low order byte 1
ld a,c ;
push af ; save d1 low order byte 0
ld hl,0 ; 0 -> hl' low order d3
ld c,4 ; 4 -> c outer loop counter
1$: pop af ; loop, [sp++] -> a next d1 byte
ld b,8 ; 8 -> b inner loop counter
2$: rra ; 4 ; loop, a >> 1 -> a set cf
jr nc,3$ ; 7 ; if cf = 1 then
add hl,de ; 11 ; hl' + de' -> hl add low order
exx ; 4 ;
adc hl,de ; 15 ; hl + de + cf -> hl add high order
exx ; 4 ;
3$: sla e ; 8 ;
rl d ; 8 ; de' << 1 -> de' shift low order
exx ; 4 ;
rl e ; 8 ;
rl d ; 8 ; de << 1 + cf -> de shift high order
exx ; 4 ;
djnz 2$ ; 13(98); until --b = 0
dec c ;
jr nz,1$ ; until --c = 0
ret ; done
The same tricks as the 16x16->16 multiplication method can be used to reduce the cycle time, but at a cost of increased code size. We also assign different registers to the low and high order parts:
Entry:
- BC': low order signed or unsigned multiplicand d1
- DE: high order signed or unsigned multiplicand d1
- HL': low order signed or unsigned multiplier d2
- DE': high order signed or unsigned multiplier d2
Exit:
- HL: low order signed product or unsigned product d3
- HL': high order signed product or unsigned product d3
Perfomance: max 8 x (98+87+59+33) = 2216 cycles, excluding entry/exit overhead
mult3232: ld hl,0 ; 0 -> hl low order d3, de with d2 low order
exx ; save bc with ip
ld a,h ;
push af ; save d1 high order byte 3
ld a,l ;
push af ; save d1 high order byte 2
ld a,b ;
push af ; save d1 low order byte 1
ld a,c ; d1 -> a low order byte 0
ld hl,0 ; 0 -> hl' high order d3
ld b,8 ; 8 -> b loop counter
1$: rra ; 4 ; loop, a >> 1 -> a set cf
jr nc,2$ ; 7 ; if cf = 1 then
exx ; 4 ;
add hl,de ; 11 ; hl + de -> hl add low order
exx ; 4 ;
adc hl,de ; 15 ; hl' + de' + cf -> hl add high order
2$: exx ; 4 ;
sla e ; 8 ;
rl d ; 8 ; de << 1 -> de shift low order
exx ; 4 ;
rl e ; 8 ;
rl d ; 8 ; de' << 1 + cf -> de' shift high order
djnz 1$ ; 13(98); until --b = 0
pop af ;
ld c,a ; d1 -> c low order byte1
exx ;
ld a,h ; h -> a low order d3
exx ;
ld b,8 ; 8 -> b loop counter
3$: rr c ; 8 ; loop, c >> 1 -> c set cf
jr nc,4$ ; 7 ; if cf = 1 then
exx ; 4 ;
add d ; 4 ; a + d -> a add low order
exx ; 4 ;
adc hl,de ; 15 ; hl' + de' + cf -> hl add high order
4$: exx ; 4 ;
sla d ; 8 ; d << 1 -> d shift low order
exx ; 4 ;
rl e ; 8 ;
rl d ; 8 ; de' << 1 + cf -> de' shift high order
djnz 3$ ; 13(87); until --b = 0
exx ;
ld h,a ; a -> h low order d3
exx ;
pop af ; d1 -> a high order byte 2
ld b,8 ; 8 -> b loop counter
5$: rra ; 8 ; loop, c >> 1 -> c set cf
jr nc,6$ ; 7 ; if cf = 1 then
add hl,de ; 15 ; hl' + de' + cf -> hl add high order
6$: rl e ; 8 ;
rl d ; 8 ; de' << 1 + cf -> de' shift high order
djnz 5$ ; 13(59); until --b = 0
pop af ;
ld c,a ; d1 -> c high order byte 3
ld a,h ; h -> a high order
jr 9$ ; jump to shift c and loop
7$: add d ; 4 ; loop, a + d -> a
8$: sla d ; 8 ; d << 1 -> d
9$: srl c ; 8 ; c >> 1 -> c set cf and z if no bits left
jr c,7$ ; 12/7(32); until cf = 0 repeat with addition
jp nz,8$ ; 10(33); until c = 0 repeat without addition
ld h,a ; a -> h high order
exx ;
ret ; done
Note: unrolling the inner loop would improve the speed at the cost of a significant code size increase, which is undesirable for small memory devices.
This implementation is used by all division and remainder (modulo) Forth words
by calling UM/MOD
. As such, it is an important and versatile algorithm.
Entry:
- HL: high order dividend ud
- BC: low order dividend ud
- DE: divisor u1
Exit:
- HL: remainder u2
- BC: quotient u3
Performance: max 85 cycles x 16 iterations = 1360 cycles, excluding entry/exit overhead
udiv3216: xor a ;
sub e ;
ld e,a ;
sbc a ;
sub d ;
ld d,a ; -de -> de with -u1
ld a,b ; b -> a low order dividend in ac
ld b,16 ; 16 -> b loop counter
sla c ;
rla ; ac << 1 -> ac
1$: adc hl,hl ; 15 ; loop, hl << 1 + cf -> hl
jr nc,2$ ; 12/ 7 ; if cf = 1 then
add hl,de ; 11 ; hl + -u1 -> hl
scf ; 4 ; 1 -> cf
jr 3$ ; 12 ; else
2$: add hl,de ; 11 ; hl + -u1 -> hl
jr c,3$ ; 12/ 7 ; if cf = 0 then
sbc hl,de ; 15 ; hl - -u1 -> hl to undo, no carry
3$: rl c ; 8 ;
rla ; 4 ; ac << 1 + cf -> ac
djnz 1$ ; 13(85); until --b = 0
ld b,a ; a -> b quotient bc, remainder in hl
ret ; done
The algorithm negates the divisor first to speed up subtraction by adding the negative of the divisor instead. Another benefit of using the negated divisor is that this produces the right carry value to shift into the quotient, otherwise the carry should be inverted or the resulting quotient must be inverted.
By moving the first conditional block out of the loop, we can save 5 CPU cycles on the critical path (the most expensive path through the loop) to reduce to max 80 cycles per iteration at the cost of making the code more cluttered.
Entry:
- HL: high order dividend ud
- BC: low order dividend ud
- DE: divisor u1
Exit:
- HL: remainder u2
- BC: quotient u3
Performance: max 80 cycles x 16 iterations = 1280 cycles, excluding entry/exit overhead
udiv3216: xor a ;
sub e ;
ld e,a ;
sbc a ;
sub d ;
ld d,a ; -de -> de with -u1
ld a,b ; b -> a low order dividend in ac
ld b,16 ; 16 -> b loop counter
sla c ;
rla ; ac << 1 -> ac
1$: adc hl,hl ; 15 ; loop, hl << 1 + cf -> hl
jr c,3$ ; 7/12 ; if cf = 1 then hl + -u1 -> hl, 1 -> cf else
add hl,de ; 11 ; hl + -u1 -> hl
jr c,2$ ; 12/ 7 ; if cf = 0 then
sbc hl,de ; 15 ; hl - -u1 -> hl to undo, no carry
2$: rl c ; 8 ;
rla ; 4 ; ac << 1 + cf -> ac
djnz 1$ ; 13(80); until --b = 0
ld b,a ; a -> b quotient bc, remainder in hl
ret ; done
3$: add hl,de ; 11 ; hl + -u1 -> hl
scf ; 4 ; 1 -> cf
jr 2$ ; 12 ;
By comparison, the CamelForth Z80 code is also fast, but slower than my implemenation with 90 cycles x 16 iterations = 1440 cycles, excluding entry/exit overhead:
udiv3216: ld a,16 ; 16 -> a loop counter
sla e ;
rl d ; de << 1 -> de
1$: adc hl,hl ; 15 ; loop, hl << 1 + cf -> hl
jr nc,2$ ; 12/ 7 ; if cf = 1 then
or a ; 4 ; 0 -> cf
sbc hl,bc ; 15 ; hl - u1 -> hl
or a ; 4 ; 0 -> cf
jp 3$ ; 10 ; else
2$: sbc hl,bc ; 15 ; hl - u1 -> hl
jr nc,3$ ; 12/ 7 ; if cf = 1 then
add hl,bc ; 11 ; hl + u1 -> hl to undo sbc, sets cf
3$: rl e ; 8 ;
rl d ; 8 ; de << 1 + cf -> de with inverse cf we'll need
dec a ; 4 ;
jp nz,1$ ; 10(90); until --a = 0
ld a,e ;
cpl ;
ld e,a ;
ld a,d ;
cpl ;
ld d,a ; complement de, faster than ccf in loop
ret ; done
Note: unrolling the loop would improve the speed at the cost of a significant code size increase, which is undesirable for small memory devices.
This algorithm uses the shadow registers BC', DE' and HL'. Because of this register pressure, there is little room for further optimization. Registers IX and IY cannot be used since they lack the necessary adc and sbc instructions.
Entry:
- BC: high order dividend ud1
- BC': low order dividend ud1
- DE': high order divisor ud2
- DE: low order divisor ud2
Exit:
- HL': high order remainder ud3
- HL: low order remainder ud3
- BC: high order quotient ud4
- BC': low order quotient ud4
Performance: max 162 cycles x 32 iterations = 5184 cycles, excluding entry/exit overhead
udiv3232: exx ;
xor a ;
ld h,a ;
ld l,a ; 0 -> hl'
rl c ;
rl b ;
exx ;
ld h,a ;
ld l,a ; 0 -> hl
ld a,b ; b -> a
rl c ;
rla ; ac << 1 -> ac
ld b,32 ; 32 -> b loop counter
1$: adc hl,hl ; 15 ;
exx ; 4 ;
adc hl,hl ; 15 ;
exx ; 4 ; hl'.hl << 1 + cf -> hl'.hl no carry
sbc hl,de ; 15 ;
exx ; 4 ;
sbc hl,de ; 15 ; hl'.hl - de'.de -> hl'.hl
jr nc,2$ ; 12/ 7 ; if cf = 1 then
exx ; 4 ;
add hl,de ; 11 ;
exx ; 4 ;
adc hl,de ; 15 ; hl'.hl + de'.de -> hl'.hl to undo, sets carry
2$: ccf ; 4 ; complement cf
rl c ; 8 ;
rl b ; 8 ;
exx ; 4 ;
rl c ; 8 ;
rla ; 4 ; ac.bc' << 1 + cf
djnz 1$ ; 13(162); until --b = 0
ld b,a ;
ld e,c ;
ret ;
I've written a collection of Z80 IEEE 754 single precision floating point math routines:
- math.asm (960 bytes of code) a simple version with truncation
- mathr.asm (1085 bytes of code) includes three IEEE 754 rounding modes, where the default rounding mode is to round to nearest, ties to even;
- mathri.asm (1296 bytes of code) includes the three IEEE 754 rounding modes, and inf/nan and signed zero. This version is not intended for Forth850, because Forth850 raises floating point exceptions.
My objective was to make the floating point routines as efficient as possible, such as by using the shadow registers instead of memory. No memory is used, except at most one push-pop pair to move a value between the (shadow) registers. The second objective was to keep the code size small by using tricks with CPU arithmetic and flags. The floating point library is about 1KB.
Single precision floating point values are stored in registers BC (high order)
and DE (low order) to form a 32 bit float bcde
and shadow float bcde'
.
fadd
: floatbcde
+bcde'
->bcde
; cf set on overflowfsubx
: floatbcde
-bcde'
->bcde
; cf set on overflowfsuby
: floatbcde'
-bcde
->bcde
; cf set on overflowfneg
: float -bcde
->bcde
; no errors (cf reset)fabs
: float |bcde
| ->bcde
; no errors (cf reset)fmul
: floatbcde
*bcde'
->bcde
; cf set on overflowfdivx
: floatbcde
/bcde'
->bcde
; cf set on overflow or when dividing by zerofdivy
: floatbcde'
/bcde
->bcde
; cf set on overflow or when dividing by zeroftoi
: floatbcde
-> signed 32 bit integerbcde
truncated towards zero; cf set when out of rangeitof
: signed 32 bit integerbcde
-> floatbcde
; no errors (cf reset)ftrunc
: float trunc(bcde
) ->bcde
; no errors (cf reset)ffloor
: float floor(bcde
) ->bcde
; cf set on overflowfround
: float round(bcde
) ->bcde
; cf set on overflowfpow10
: 10^a
*bcde
->bcde
for -128 <=a
< 39; cf set on overflowatof
: string [hl
..hl
+a
-1] -> floatbcde
; cf set on parsing error andhl
points after the charftoa
: floatbcde
-> [hl
...hl
+a
-1] string of digits, exponente
and signd
bit 7; no errors (flags undefined)fzero
: setbcde
to 0.0
mathri.asm includes inf/nan and signed zero. In this version the routines listed above may return signed inf or nan with cf set to indicate overflow and errors. In addition, this version includes the following routines:
ftype
: floatbcde
->bcde
unchanged; cf set ifbcde
is nan, cf reset and z set ifbcde
is +/-inffnan
: setbcde
to nan; cf setfinf
: setbcde
to inf with sign in register A bit 7 (negative when set); cf set
I've written the following Z80 string routines. My objective was to make them
as efficient as possible, such as by making the obvious choice to use the cpi
and cpir
Z80 instructions to minimize cycle count. The second objective was
to keep the code size small by using tricks with CPU arithmetic and flags.
Entry:
- IX: address of the first string c-addr1
- HL: size of the first string u1
- DE: address of the second string c-addr2
- BC: size of the second string u2
Exit:
- A: -1 (less), 0 (equal), 1 (greater)
- F: zero flag set when equal, sign flag set when less
Performance: 46 cycles per character comparison when characters match
compare: push ix ; save c-addr1
push hl ; save u1
xor a ; 0 -> a flags u1 = u2, 0 -> cf
sbc hl,bc ;
jr z,1$ ; if u1 <> u2 then
inc a ; 1 -> a flags u1 > u2
jr nc,1$ ; if u1 < u2 then
pop bc ; pop u1 -> bc
push bc ; rebalance stack
ld a,-1 ; -1 -> a flags u1 < u2
1$: pop hl ; pop to discard u1
pop hl ; pop c-addr1 -> hl
ex af,af' ; save a with -1|0|1 flag
ld a,c ;
or b ;
jr z,3$ ; if bc <> 0 then
; compare chars
2$: ld a,(de) ; 7 ; loop
cpi ; 16 ; compare [hl++] to [de], --bc
jr nz,4$ ; 7 ; while characters [hl] and [de] are equal
inc de ; 6 ; de++
jp pe,2$ ; 10(46); until bc = 0
; chars match, check lengths
3$: ex af,af' ; restore a with -1|0|1 flag
ret ;
; strings differ
4$: dec hl ; hl-- to correct cpi overshoot
cp (hl) ; test a<[hl]
ccf ; complement cf, cf = 1 if [hl]<a
sbc a ; a = -1 if cf = 1 else 0
add a ; a = -2 if cf = 1 else 0
inc a ; a = -1 if cf = 1 else 1
ret ; done
Naive string search, i.e. not Knuth-Morris-Pratt which is faster but would require a table and more code.
Entry:
- HL: address of the string searched c-addr1
- IX: size of the string searched u1
- DE: address of the string to search c-addr2
- BC: size of the string to search u2
Exit:
- F: carry set when no match found
- HL: address of the string position found c-addr3
- BC: size of the remaining characters after the match
Performance: 21 cycles per character to search the first or next character match and 46 cycles per character comparison when characters match
search: or a ; 0 -> cf
sbc ix,bc ; u1 - u2 -> ix
ret c ; if u2>u1 then impossible search, cf = 1
ld a,c ;
or b ;
ret z ; if u2 = 0 then done (found), cf = 0
push ix ;
push bc ;
pop ix ; u2 -> ix
pop bc ; u1 - u2 -> bc
inc bc ; u1 - u2 + 1 -> bc correct for cpir
push hl ; save c-addr1 on the stack
; find char match
1$: push de ; loop, save de with c-addr2
ld a,(de) ; [de] -> a
cpir ; 21/16 ; repeat until a = [hl++] or --bc = 0
jr nz,4$ ; if no match then not found
pop de ; restore de with c-addr2
push bc ;
push de ;
push hl ; save bc,de,hl
push ix ;
pop bc ; u2 -> bc
; compare substrings
dec bc ; u2 - 1 -> bc since u2 > 0
ld a,c ;
or b ;
jr z,3$ ; if bc<> 0 then
inc de ; de++ to start matching at c-addr2+1
2$: ld a,(de) ; 7 ; loop
cpi ; 16 ; compare [hl++] to [de], --bc
jr nz,3$ ; 7 ; while characters [hl] and [de] are equal
inc de ; 6 ; de++
jp pe,2$ ; 10(46); until bc = 0
3$: pop hl ;
pop de ;
pop bc ; restore bc,de,hl
jr nz,1$ ; repeat
; substrings match
dec hl ; hl-- to correct cpir overshoot
ret ; done, cf = 0
; not found
4$: scf ; 1 -> cf
ret ; done, cf = 1
- Manual: http://basic.hopto.org/basic/manual/Sharp%20PC-G850V.pdf
- HP Forum thread: https://www.hpmuseum.org/forum/thread-10520.html
- Sharp PC-G850(V)(S) software (Japanese site): http://ver0.sakura.ne.jp/g800/index.html
- Forth for the Sharp PC-E500(S): https://github.com/Robert-van-Engelen/Forth500
- Forth 2012 Standard: https://forth-standard.org/standard/intro
- Moving Forth: https://www.bradrodriguez.com/papers/moving1.htm
Forth850 benefits from the work done by many others to offer inspiration, but the system does not include licensed code of the following implementations or any other implementation not listed here. Some parts of Forth850 are derived from freely available Forth resources listed above and the Z80 resources listed further below:
- CamelForth for the Z80: http://www.camelforth.com/page.php?5
- eForth: https://github.com/lispnik/eforth/blob/master/z80efort/EFZ80.ASM
- Jupiter Ace ROM listing: http://www.jupiter-ace.co.uk/romlisting.html#L085F
- asz80 assembler and linker: https://shop-pdp.net/ashtml/asz80.htm download https://shop-pdp.net/ashtml/asxget.php
- Zilog Z80 CPU User Manual: https://www.zilog.com/docs/z80/um0080.pdf
- Z80 Instruction Set: https://wikiti.brandonw.net/index.php?title=Z80_Instruction_Set
- Z80 Instruction Set: https://www.smspower.org/Development/InstructionSet
- Z80 the 8-bit number cruncher: http://www.andreadrian.de/oldcpu/Z80_number_cruncher.html
- Z80 integer math routines: http://map.grauw.nl/sources/external/z80bits.html
- Z80 bits (integer math routines): https://wikiti.brandonw.net/index.php?title=Category:Z80_Routines:Math
- Z80 advanced math: http://z80-heaven.wikidot.com/advanced-math
- Z80 classic maths libraries: https://github.com/z88dk/z88dk/wiki/Classic--Maths-Libraries
- Z80 IEEE754 floating point library: https://github.com/Zeda/z80float