-
Notifications
You must be signed in to change notification settings - Fork 30
/
startup.mu4
1865 lines (1439 loc) · 62.9 KB
/
startup.mu4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
token | .forth. (unlinked-name) <:> ] (lit) [ trailing
drop c@ , ] parse 2drop ^ [ show
| This file is part of muforth: https://muforth.dev/
|
| Copyright 2002-2024 David Frech. (Read the LICENSE for details.)
| This file is muforth/mu/startup.mu4. It contains high-level Forth code
| necessary to the useful execution of muforth. This file is loaded and
| interpreted every time muforth starts up.
|
| The idea is to move as much code as possible *out* of muforth's C kernel
| and instead implement it in Forth. Hence the name "muforth": mu is the
| Greek letter often used in engineering to represent "micro".
|
| However, this "micro-ness" only refers to the C kernel; once everything
| in this file is loaded, muforth has over 500 words defined!
|
| This file exemplifies a Forth strength - shared by Lisp and Smalltalk,
| among other interpretive/compiled languages - that I like to call
| "writing the reader"; the reader being, in this case, the Forth
| interpreter/compiler.
|
| As defined in the kernel, the interpreter/compiler is very simple; it
| only knows how to do the following things:
|
| 1. parse a whitespace-delimited token out of the input stream;
|
| 2. look up a token in the dictionary, and complain if it is not found;
|
| 3. execute the code that is associated with a token;
|
| 4. compile a "call" to the code that is associated with a token, by
| appending its execution address to the end of the current dictionary
| entry;
|
| 5. create a new dictionary entry.
|
| That's it! No numbers, no control structures, and no error reporting
| other than "xyz isn't defined".
|
| In this file, in Forth, we need to extend the interpreter/compiler to do
| the following:
|
| 1. compile control structures: if/then, for/next, begin/while/repeat;
|
| 2. compile data structures: variables, constants, create/does words;
|
| 3. read and write numbers - an interesting exercise since muforth starts
| life not even knowing the constants 0 or 1;
|
| 4. read and write strings;
|
| 5. create return stack exception frames - for error handling and "fluid
| binding" of global variables - and unwind through these frames when
| errors occur.
|
| Once these are complete we will have a useful Forth for doing real work.
|
| The order of business will sometimes seem haphazard; words can only be
| defined after the words they depend on have been defined, so we end up
| jumping around a bit in the "semantics" of the language.
|
| Hopefully the reader will find this an interesting exercise in
| bootstrapping, which was precisely my intention.
|
| So, here goes; now we start extending the language, bit by bit.
| The bit of inscrutable poetry at the beginning of this file creates the
| word | which uses parse to look for a newline character, and throws
| away the parsed text. I think that | is the *the* best choice for
| a "treat the rest of the line as a comment" character, because it creates
| a running vertical bar along the left side of the comment text that
| nicely sets it off from the code nearby.
|
| Sometimes, however, we want to embed comments *within* the code. The
| Forth convention for this has always been to use ( to start a comment,
| and ) to end it. The ( has to be followed by a space, in order to be
| tokenized properly; the ) needs no whitespace around it.
|
| Let's define ( as well. Like | it looks forward in the input for
| a following ) and throws away the parsed text. The
|
| token ) drop c@
|
| parses the ) as token, drops its length, and fetches its first
| character. This way we can pass the ASCII value of ) to parse .
token ( .forth. (unlinked-name)
<:> ] (lit) [ token ) drop c@ , ] parse 2drop ^ [ show
| We want to create the current variable, but we have no defining words
| to help us. We need to create a create/does word completely by hand. We
| do it in two phases.
| First, create an empty does> clause, and leave on the stack the IP
| pointing to the ^/EXIT/UNNEST.
here ] ^ [
| Next, create in the .forth. chain a <does> word named current, point it
| to the empty does> clause above, and copy in the address of the .forth.
| chain as its initial value.
token current .forth. (unlinked-name) show
<does> , ( ip of empty does> clause)
.forth. , ( initial value)
| With current defined we can now create the word new which we will use
| from now on to create new names in the dictionary. Instead of using
| .forth. as above, we create new names in the chain pointed to by current .
| But before we define new let's create another ( word, this time in the
| .compiler. chain, so we can comment the following code. It's a little
| complicated, since they have the same name. So we will first find the forth
| ( and then compile it by hand.)
token ( .compiler. (unlinked-name)
<:> token ( .forth. find huh? compile, ] ^ [ show | call the forth (
token new .forth. (unlinked-name) ( new is defined in the .forth. chain)
<:> ]
token
nope ( placeholder for code we paste in later!)
current @ (unlinked-name) ( link new word to current chain, not .forth.)
^ [ show
| Finally, we can create the word : which we will use to make colon
| definitions. It creates a new word, compiles the code field for colon
| words, and then uses ] to switch to compiling state.
new :
<:> ]
new <:> ]
^ [ show
| We don't have ; so let's define it. It lives in the compiler chain, and
| when executed, it compiles ^ and then executes [ to switch back to
| interpret state from compiling state. Since [ lives in the .compiler.
| chain, we have to search for it in a complicated way.
.compiler. current ! ( make .compiler. the current chain)
: ; compile ^
[ token [ .compiler. find huh? compile, ] show ^ [ show
.forth. current ! ( switch back to .forth.)
| By convention, names of dictionary chains start and end with dots. When a
| dictionary chain word is executed, it pushes the address of its link
| pointer - which in turns points to the last word defined on the chain.
|
| To make it the current chain, we store this pointer into current. Let's
| create a meaningful name for that.
: definitions current ! ;
| If we want to make a chain the "current" one that receives all new
| definitions, we execute the chain, and then execute "definitions". But
| that's a lot to type, and we do this a lot, so in general when I create a
| new chain - with the dots in its name - I also create a word *without* the
| dots, that simply calls the chain word and then calls definitions.
|
| Let's do that for our three existing chains.
: forth .forth. definitions ;
: compiler .compiler. definitions ;
: runtime .runtime. definitions ;
( Create a literal in the word currently being defined.)
: literal compile (lit) , ;
( End a bracketed computation and use the result to create a literal.)
: #] literal ] ;
( Stack manipulations.)
: rot >r swap r> swap ; ( a b c - b c a) ( fig!)
: -rot swap >r swap r> ; ( a b c - c a b)
: nip swap drop ; ( a b - b)
: tuck swap over ; ( a b - b a b)
| Back up over last token parsed from the input so we can re-parse it.
|
| NOTE: We have to be careful about line numbers. If the token was followed
| immediately by a newline, line will have been incremented. When backing
| up over the token, we need to reset line to its previous value, otherwise
| parsing the token again will increment line *again* and it will no longer
| refer to the correct line!
|
| Happily, @line captures the value of line at the *beginning* of a token.
| Let's reset line to @line when we back up over the token.
: untoken source@ drop ( end) parsed drop ( new-first) source!
@line line ! ;
| Let's define some words that are useful for searching specific dictionary
| chains and compiling words from them.
( Roll tokenizing and searching into one.)
: token' token rot find ; ( chain - a u F | body T)
| Compiling from specific chains. Note that `\' is an elaboration of the
| basic scheme of `\chain'. These words will be handy in the assembler
| and target compiler.
( Tick)
: chain' token' huh? ;
: \chain chain' compile, ;
( 28-apr-2000. Do we ever -really- want to search anything other than .forth.?)
: ' .forth. chain' ;
( : ' current @ chain' ; ( XXX)
compiler
( XXX: should this and ' do the same thing?)
( : ['] .forth. chain' literal ;)
( XXX should this search .runtime. rather than .forth. ??)
: ['] ' literal ;
( XXX: is this useful? Here? Maybe in a target compiler...)
: \f .runtime. \chain ;
: \c .compiler. \chain ; ( until we have \ ; we need this for "if")
forth
( We don't even have any constants yet! So we make the easiest one first...)
: 0 [ dup dup xor #] ;
( From 0 we can create a few more useful constants!)
: -1 [ 0 invert #] ;
: 1 [ -1 negate #] ;
: 2 [ 1 2* #] ;
( On and off)
: on -1 swap ! ;
: off 0 swap ! ;
: bl [ 2 2* 2* 2* 2* #] ; ( space character)
: char token drop c@ ; ( grab the first character of the following token)
: ctrl char [ bl 2* ( 64) #] xor ; ( how you get ^? = 127.)
compiler
: char \f char literal ;
: ctrl \f ctrl literal ;
forth
| Before I figured out the trick above, which yields the correct answer for
| ctrl ?, I defined ctrl thus:
( : ctrl char [ bl 1- #] and ; ( 31 and)
( Some useful tidbits.)
: - negate + ;
: u+ ( a b c - a+c b) rot + swap ; ( "under-plus")
: v+ ( x1 y1 x2 y2 - x1+x2 y1+y2) push u+ pop + ; ( add 2-vectors)
: 1+ 1 + ; ( these are common)
: 1- -1 + ;
( cells are always 64 bits - 8 bytes.)
: cell [ 1 cells #] ;
: cell+ [ cell #] + ;
: cell- [ cell negate #] + ;
( For fetching and storing a series of bytes.)
: c@+ ( a - b a+1) dup c@ swap 1+ ;
: c!+ ( b a - a+1) tuck c! 1+ ;
( For fetching and storing a series of cells.)
: @+ ( a - n a+) dup @ swap cell+ ;
: !+ ( n a - a+) tuck ! cell+ ;
( Two-cell fetch and store.)
: 2@ @+ @ swap ; ( cell at lower address to TOP)
: 2! !+ ! ;
: 2dup ( a b - a b a b) over over ;
: 2swap ( a b c d - c d a b) rot push rot pop ;
: 2over ( a b c d - a b c d a b) [ 2 1+ #] nth [ 2 1+ #] nth ;
: 2tuck ( a b c d - c d a b c d) 2swap 2over ;
: = xor 0= ;
: not 0= ; ( warning! this is NOT 1's complement)
: bic invert and ;
: @execute @ execute ;
| jump allows jumping thru a table of addresses; you are responsible for
| making sure the index is within range! It must be used at the end of a
| word. Common usage looks like this: jump nope do1 do2 do3 [
|
| That example assumes the top of stack has a number from 0 to 3.
|
| Since no UNNEST needs to be compiled, use of [ rather than ; to end the
| word is common.
runtime
: jump ( which) cells pop + @execute ;
forth
( Control structures and \ )
( Mark a branch source for later fixup.)
: mark ( - src) here 0 , ;
( Resolve a forward or backward jump, from src to dest.)
( When using absolute branch addresses, this is easy: just store dest at src.)
: <resolve ( dest src) ! ;
: >resolve ( src dest) swap <resolve ;
compiler
: then ( src) here >resolve ;
: =if ( - src) compile (=0branch) mark ;
: ?if ( - src) compile (?0branch) mark ;
: if ( - src) compile (0branch) mark ;
: again ( dest) compile (branch) mark <resolve ;
: else ( src0 - src1) compile (branch) mark swap \c then ;
: begin ( - dest) here ;
: =until ( dest) \c =if <resolve ;
: ?until ( dest) \c ?if <resolve ;
: until ( dest) \c if <resolve ;
: =while ( dest - src dest) \c =if swap ;
: ?while ( dest - src dest) \c ?if swap ;
: while ( dest - src dest) \c if swap ;
: repeat ( src dest) \c again \c then ;
( n for .. next goes n times; 0 if n=0 )
: for ( - src dest) \c ?if compile push \c begin ;
: next ( dest) compile (next) mark <resolve \c then ;
( do, loop, +loop)
: do ( - src dest) compile (do) mark \c begin ;
: loop ( src dest) compile (loop) mark <resolve \c then ;
: +loop ( src dest) compile (+loop) mark <resolve \c then ;
( make \ more like ANS-Forth's POSTPONE)
| Now, the confusion happens because we need to write code _in this word_
| that will compile the above code into _other_ words. How about that?
| Read a token out of the input stream. If the token is on the compiler
| chain, postpone its execution until the word we're compiling executes. If
| the token is on the runtime or forth chains, postpone its compilation
| until the word that we're compiling executes. Got that? ;-)
: \ .compiler. token' if compile, ^ then
.runtime. find huh? compile compile compile, ;
forth
| Our definition of | at the beginning of this file cheated a bit, so now
| that we have if/then we can define it properly.
|
| The word | is a nice way to do full-line comments with no trailing
| delimiter. It throws away the rest of the line, scanning for a newline,
| but only if there was a space after the | . Without this test,
| | followed directly by a newline will throw away the *following* line,
| which is a bit mystifying. ;-)
: | trailing if c@ bl = if ctrl J parse 2drop then then ;
: -- | ; | legacy comment word; the previous name for |
compiler
: | \f | ;
: -- \f | ; | legacy comment word; the previous name for |
forth
| Defining words are next. Right now we only `know' how to make `colon'
| definitions. We need some structural help first.
| I wanted to gain a little of the clarity that Chuck Moore's colorForth
| gains by getting rid of "[ <calculate something here> ] literal". He
| replaces the whole construct with colored words that are executed or
| compiled depending on their color, but with a little added twist: when
| switching from executed to compiled words - yellow to green -
| colorForth assumes that the yellow words calculated a literal; just
| before starting to compile the first green word after the transition,
| colorForth compiles a literal.
|
| Even though we don't have color in muforth, we can make things a bit
| cleaner by creating a new word - #] - that compiles a literal *before*
| restarting the colon compiler.
|
| We retain the normal Forth behaviour that ] simply restarts the colon
| compiler, doing no other work.
| Dictionary structure words. Link fields point to link fields. Roughly, a
| dictionary entry is the following cell-sized things: suffix, link, code;
| where suffix is the last 3 characters of the name, followed by its
| byte-sized length. If length is zero, the word is *hidden*.
: link>name ( 'link - a u) 1- dup c@ ( 'len len) tuck - swap ;
| These words all assume we're calculating to or from a code field
| address.
: >link ( 'code - 'link) cell- ;
: link> ( 'link - 'code) cell+ ;
: >name ( 'code - a u) >link link>name ;
: >ip ( 'code - 'ip) cell+ ;
: ip> ( 'ip - 'code) cell- ;
: >body ( 'code - 'body) >ip cell+ ;
: body> ( 'body - 'code) cell- ip> ;
( Undefine a word by zeroing out the length byte of the name.)
: undef token current @ find if >link 1- 0 swap c! ^ then complain ;
( create and does>. Everything old is new again. ;-)
| 2010-nov-30. After many iterations, I have finally arrived at fig-forth's
| implementation of create/does>. The only difference is the names of the
| words.
| In fig-forth there are several kinds of words:
|
| * CODE words, whose code field points to machine code
|
| * COLON words, whose code field points to docolon, and whose body
| contains a list of execution tokens
|
| * CONSTANTS, whose code field points to doconst, and whose body
| contains a value
|
| * VARIABLES, whose code field points to dovar, and whose body contains
| a value
|
| * DOES words, whose code field points to dodoes, and whose body
| contains an IP pointer, followed optionally by data.
|
| In muforth there are only three kinds of words:
|
| * CODE words - primitives defined in C whose code field points to the C
| code implementation
|
| * COLON words, whose code field points to docolon, and whose body
| contains a list of execution tokens
|
| * DOES words, whose code field points to dodoes, and whose body
| contains an IP pointer, followed optionally by data.
|
| fig and muforth share this inefficient but simple implementation. In the
| case of fig, it was because they didn't know any better. In my case, I
| knew better but in the interest of avoiding machine-code dependencies -
| the efficient way of compiling does> words essentially being a form of
| DTC (direct-threaded code) - I had no choice.
|
| If you want a threaded-code implementation using only pure pointers, you
| need two pointers in each "child" word defined with create/does: one to
| point to C (dodoes) and one to point to Forth (the body of the parent
| defining word).
| last-created contains the address of the ip address slot of the last
| <does> word defined.
|
| We make the variable by hand, since we are about to create the variable
| and constant defining words, but we don't have them yet!
new last-created show
<does>
current body> >ip @ , ( re-use current's IP pointer to empty does> body)
0 , ( initial value)
| does> fixes up the does ip of the last <does> word to point to the code
| after "does>" in the caller.
: does> pop last-created @ ! ;
| We'll use this version of "create" for all host words, since they will be
| *data* words. But for the target compiler we are going to have a *target*
| colon compiler, which will have the same hide/show problem that we have on
| the host. So we need to be able to create hidden target words, and we will
| use create-hidden for this.
: create-hidden
new <does>
here last-created ! 0 , ( placeholder for does ip)
does> ; ( make the does ip point *somewhere*)
: create create-hidden show ; ( always immediately show host create'd words)
: constant ( value)
create , ( compile the constant) does> @ ;
: 2constant ( v1 v2)
create , , ( compile the constants) does> 2@ ( - v1 v2) ;
( An array with every cell set to a default value.)
: defarray ( default cells) create for dup , next drop ;
: array ( cells) 0 swap defarray ;
( A byte array; length is rounded up to cell boundary.)
: buffer ( bytes) aligned ( round up) cell/ array ;
( A self-indexing array with every cell set to a default value.)
: defarray+ ( default cells) defarray does> ( i - a) swap cells + ;
: array+ ( cells) 0 swap defarray+ ;
: variable create 0 , ;
: 2variable variable 0 , ;
| NOTE
|
| Since we are now using "|" for block comments, both comment and the
| clever-but-complicated code to create "self-comments" is now
| deprecated.
|
| Using comment *does*, however, have the following nice feature:
|
| To bracket comments in a flexible way. If you've bracketed some text
| using comment, changing "comment" to "uncomment" will interpret the
| bracketed text - the delimiter becomes a noop.
: comment
token ( the comment end token to match)
begin 2dup token =while string= until 2drop ^ then
2drop 2drop 2drop ;
: uncomment new <:> \ ^ ; ( create a noop word)
| How about a really cool word that makes self-parsing comment words? In
| other words, like using "comment" - defined above - but instead of having
| to say "comment **foobar** <commented text> **foobar**", you define
| **foobar** to skip tokens until it comes to a matching **foobar**!!
| comment no-self-comments
| : make-comment create does> drop untoken comment ;
|
| ( Here is one to get you started - good for block comments. It's 75
| characters long:)
|
| make-comment
| ===========================================================================
| no-self-comments
| I guess we can have deferred words, even though they are, in some ways,
| inelegant. The alternative - creating a variable and a colon word that
| calls through that variable, for _every_ deferred word - is also in some
| ways inelegant - and clumsy.
|
| Actually, the way we define this is exactly equivalent to what we would
| have to do with variables; the difference is that instead of two named
| objects - the variable and the colon word that calls thru it - we have
| one - the deferred word - and we need an extra mechanism to get to its
| value to change it.
|
| The main argument _against_ deferred words is that they aren't orthogonal
| w.r.t. _user_ variables. The way we are defining them here they are
| implemented using a global, system variable. On muforth, we don't care,
| because we don't _have_ user variables; but on a properly multithreaded
| target machine things are different. There we probably wouldn't implement
| deferred words at all, using instead the "<variable> @execute" idiom; or,
| indeed, we could have all deferred use _user_ variables instead of
| globals. But that's what the fuss is.
|
| That and that "vectoring" them isn't strictly postfix. And it requires
| architecture-specific code!
variable undeferred ' nope undeferred !
variable last-deferred-executed
: defer create undeferred @ ,
does> dup last-deferred-executed ! @execute ;
( Syntactic sugar - from Rod Crawford's 4ARM.)
: now ' ;
: is ' >body ! ; ( as in `now host-interpret is interpret')
compiler
: now ' literal ;
: is ' >body literal \ ! ;
forth
( Defining new dictionary chains.)
| These used to be in an array but are now independent of each other. They
| are structures, created in the body of a does word, that look just like a
| name entry in the dictionary - a name-suffix followed by a link field.
|
| The name entry is always the string "muchain" followed by a zero length
| byte. This is exactly 8 bytes long - the length of a suffix now that
| muforth is 64-bit. The name identifies the word as the head of a dictionary
| chain.
|
| The name is hidden - by setting the length to zero - so that dictionary
| searches and word listings won't see it.
|
| The link field points to the link field within the name entry of the last
| word defined on the chain.
|
| We create new chains by reusing the code field and the "muchain" name
| field from .forth. - an existing chain that is created by C code in
| src/dict.c that is executed at startup.
: chain ( anchor-link)
new [ ' .forth. @ #] , ( code field: mu_do_chain)
[ .forth. cell- @ #] , ( hidden "muchain" name field)
, ( anchor-link) show ;
: sealed 0 chain ; ( create an independent vocab chain)
: chained current @ chain ; ( chain to the current vocab)
| It's also possible to chain to an -arbitrary- vocab by simply doing this:
|
| .arbitrary. chain .new-is-chained-to-arbitrary.
| When executed, a chain pushes the address of the link field following the
| fake "muchain" name. To print out the name of a chain, execute it - or
| fetch current to get the current chain - and then execute
|
| >chain-name type
| The first cell- skips backward over "muchain"; the second skips backward
| to point to the code field; from there >name gets us to the name!
: >chain-name ( 'chain - a u) cell- cell- >name ;
( Conditional compilation.)
sealed .conditional.
: conditional .conditional. definitions ;
| eat consumes tokens until it either consumes all the input - in which
| case the while loop will exit - or an execute'd word returns _true_ to
| exit the containing loop. ?toss processes each token. If it exists in
| .conditional. , it executes it; otherwise, it throws it away.
: ?toss .conditional. find if execute ^ then 2drop 0 ;
: eat 0 ( nesting) begin token =while ?toss until drop ( nesting) ^
then 2drop ( token) drop ( nesting) ;
compiler
: .if 0= if eat then ;
: .else eat ;
: .then ;
( Consume a token, search a chain, and return only the "found or not" flag.)
: .contains ( chain - found) token' nip =if ^ then nip ;
: .def .forth. \ .contains ;
: .ndef \ .def 0= ;
: .ifdef \ .def \ .if ;
: .ifndef \ .ndef \ .if ;
conditional
( nesting - nesting exitflag)
: .if 1+ 0 ; ( .if nests, never exits)
: .else dup 0= ; ( .else doesn't nest, exits if nesting at 0)
: .then 1- dup 0< ; ( .then unnests, exits if nesting -was- at 0)
: .ifdef 1+ 0 ; ( these are like .if)
: .ifndef 1+ 0 ;
forth
: .if \ .if ;
: .else \ .else ;
: .then ;
: .def \ .def ;
: .ndef \ .ndef ;
: .ifdef \ .ifdef ;
: .ifndef \ .ifndef ;
: .contains \ .contains ;
: .and and ;
: .or or ;
: .not 0= ;
| -----------------------------------------------------------------------
| Schleisiek-style return stack words.
| -----------------------------------------------------------------------
| Trying out, after all these years, the techniques that Klaus Schleisiek
| presented in 1984 (at FORML) and that I read about in 1993.
|
| The basic idea is that, in addition to return address pointers (saved
| IPs), there are stack frames on the return stack. These can be for any
| purpose, but we're interested here the following: local variable storage,
| "fluid" rebinding of variables - aka dynamic scoping, and
| cleanup-on-return - eg, to close a file that we opened.
| Here is a picture of the return stack, with high memory towards the top of
| the page, and low memory further down:
|
| ^ | |
| | +--------------------+
| | | prev return addr |
| | +--------------------+
| | | ... | several cells could be here; depends on the
| | +--------------------+ type of frame
| | | ... |
| | +--------------------+
| | | cfa of cleanup |
| | +--------------------+
| +---+ prev frame |<--- fp
| +--------------------+
| | ip of remove |<--- rp remove calls unlink
| +--------------------+
runtime
variable fp ( the "top" - most recently pushed - frame)
( fp points to a frame ptr, which pts to a frame ptr...)
| link creates a new frame. It fetches the cfa of the following word and
| pushes it onto the return stack. This is the cleanup routine. Then it
| links this frame into the list rooted at fp, and then returns to its
| caller, skipping the following cfa. link is called by a word that builds
| a new stack frame.
: link r> @+ swap >r ( fetch & skip following cfa & push to r)
fp @ >r rp@ fp ! ( link this frame to previous)
>r ( restore return address) ;
| unlink undoes what link did. It unlinks the frame from the list rooted at
| fp, and then runs the cleanup routine, which will do whatever is
| necessary to de-allocate the frame and undo any state changes made by the
| word that called link.
: unlink r> ( save return address)
fp @ rp! r> fp ! ( unlink frame)
r> execute ( execute cleanup word)
>r ( restore return address) ;
create remove ] unlink ; ( remove pushes IP when executed!)
( Now some interesting applications.)
| -----------------------------------------------------------------------
| Catch and throw
| -----------------------------------------------------------------------
variable cf ( catch frame pointer)
( These versions of catch and throw don't save or restore SP.)
( Call the word following catch, and push 0 if it returned normally.)
: catch ( - 0 | error)
r> @+ >r ( fetch & skip following cfa)
cf @ >r ( push prev catch frame pointer)
rp@ cf ! ( now point to this frame)
execute
r> cf ! ( restore prev catch frame pointer)
0 ;
| catch can only return an error value if throw is called with a non-zero -
| error - value during the execution of the word following catch.
|
| If throw is called with 0, it drops it and does nothing.
|
| If throw is passed an error value - in this implementation this is a
| string pointer - it returns to the return address on the stack with the
| error code on the stack.
|
| It's easier to describe than explain. Here is an example:
|
| catch a b
|
| catch calls a; if no non-zero value is throw'n during the execution of a,
| catch pushes 0 and execution continues with b.
|
| If something non-zero *is* throw'n, then throw "pretends" to return from
| a, but this time b is executed with the non-zero error value on the stack,
| instead of 0.
|
| It is up to the code following a call to catch - b in our example - to
| handle both the zero and non-zero cases, and to print the error and
| unwind the stack in case of an error.
: throw ( 0 | error)
?if
( pretend to return from catch!)
cf @ ( fetch most recently created catch frame)
cell+ @ ( skip catch frame ptr, fetch return address)
>r ( push return address so that we return from catch!)
then ;
| unwind is useful in the context of exceptions. It starts at fp and
| unlinks each frame in turn until fp is zero or points to a frame above
| the current catch frame.
| XXX Right now we are using unwinding as an on/off toggle, but in the
| future we could have different bits that could be tested by the various
| cleanup routines.
variable unwinding
: unwind ( unwind-flags)
unwinding !
r> ( ra)
( While fp non-zero and pushed frames are below last catch frame, unlink them.)
begin fp @ dup cf @ u< and while unlink repeat
cf @ rp!
r> cf ! ( restore prev catch frame pointer)
rdrop ( discard return address from catch - we've already executed it!)
>r ( ra)
unwinding off ;
| -----------------------------------------------------------------------
| Fluid binding (dynamically-scoped variables)
| -----------------------------------------------------------------------
( Restore saved value of a normal cell-sized variable.)
: restore
r> ( ra) r> r> ( value addr) ! >r ( ra) ;
| Preserve the value of a variable for the duration of the execution of the
| calling word.
: preserve ( addr) ( address of variable)
r> ( ra)
over ( addr) >r swap @ ( value) >r
link restore ( push cleanup)
remove >r ( normal return - unlink and cleanup)
>r ( ra) ;
| -----------------------------------------------------------------------
| Cleanup on return
| -----------------------------------------------------------------------
: cleanup
r> ( ra) r> ( value) r> ( cfa) execute >r ( ra) ;
| Push value and following cfa to R stack; on exit or unwind, execute cfa
| with value on the stack.
: on-exit ( value)
r> ( ra)
@+ swap >r ( fetch & skip following cfa & push to r)
swap >r ( push value)
link cleanup ( push code to undo whatever needs undoing)
remove >r ( normal return - unlink and cleanup)
>r ( ra) ;
( There are times when we need to do something with more than one value.)
: cleanup2
r> ( ra) r> r> ( v1 v2) r> ( cfa) execute >r ( ra) ;
| Push v1, v2, and following cfa to R stack; on exit or unwind, execute cfa
| with v1 and v2 on the stack.
: on-exit2 ( v1 v2)
r> ( ra)
@+ swap >r ( fetch & skip following cfa & push to r)
-rot >r >r ( push v2, then push v1)
link cleanup2 ( push code to undo whatever needs undoing)
remove >r ( normal return - unlink and cleanup)
>r ( ra) ;
| -----------------------------------------------------------------------
| Local variable frames
| -----------------------------------------------------------------------
( Deallocate local variables.)
: unroom
r> ( ra)
r> ( #cells) rp+! ( rp+! takes cell count!)
>r ( ra) ;
( Allocate space for local variables.)
| NOTE: do -not- try to use a for loop to push cells! It doesn't work! The
| return stack is being used to store the loop index, but you're busy
| pushing stuff there! All hell breaks loose! If you absolutely want to
| zero locals as they are allocated, do a begin/until loop with the count
| on the data stack.
: room ( #cells)
r> ( ra)
( choose one! mark, zero, allocate)
| swap dup begin "55aa55aa >r 1- dup 0= until drop ( mark)
| swap dup begin 0 >r 1- dup 0= until drop ( zero)
swap dup negate rp+! ( allocate)
( #cells) >r
link unroom
remove >r ( normal return - unlink and cleanup)
>r ( ra) ;
forth
| -----------------------------------------------------------------------
| End of fancy R-stack goodies, and back to pedestrian Forth.
| -----------------------------------------------------------------------
( Number input)
variable dpl ( location of last . ) dpl on ( -1)
variable radix
: radixer constant does> @ radix ! ;
2 2* 2* dup 2* ( 16!) radixer hex
dup ( 8!) radixer octal
2 + ( 10!) radixer decimal
2 radixer binary
decimal
( Punctuation in numbers: sign, radix, decimal point, separators.)
| NOTE WELL: This code - the number parsing code - has been a thorn in my
| side for ever. You'll see, as you read the following code and comments,
| that over the years I have made changes, but it has never been as simple
| or as elegant as I would like. It needs a really good whacking.
| 2006-mar-26. Ok, so this *totally* sucks. The presence of these bits of
| punctuation can mask a word not being found in the dictionary. A bare /,
| for instance, with no digits to keep it company, is happily parsed as a
| number. The number? 0. Urgh.
: punct ( a u ch - a' u' matched)
over if ( still chars to process) swap push over c@ xor if
( no match) pop 0 ^ then
( match) pop 1 -1 v+ -1 ^ then
( end of input) drop 0 ;
: ?sign ( a u - a' u' neg) char - punct if -1 ^ then 0 ;
| I wanted to add Michael Pruemm's '0' as a hex specifier, but it's not as
| simple as adding it to this list. It will match a bare 0, which won't be
| matched as a number.
: ?radix ( a u - a' u')
( char 0 punct if hex ^ then )
char " punct if hex ^ then ( " for hex and ' for octal are Donald Knuthisms)
char ' punct if octal ^ then
char $ punct if hex ^ then ( $ for hex is a time-worn convention)
char # punct if decimal ^ then
char % punct if binary ^ then ;
| . resets dpl; others leave it unchanged; this means that embedding . in a
| number causes dpl to be set to the count of digits _after_ the _last_ .
| in the number.
: dot? ( a u - a' u' matched)
char . punct if dpl off -1 ^ then
char , punct if -1 ^ then
char - punct if -1 ^ then
char / punct if -1 ^ then
char : punct if -1 ^ then
char _ punct if -1 ^ then 0 ;
( This is scary. We need a bunch of literals for `digit>'.)
: digit> ( ch - digit | junk)
char 0 - [ 2 2* 2* 1+ #] ( 9) over u< if ( !decimal)
[ 2 2* 2* 2* 1+ #] ( 17) -
[ 2 1+ 2* 2* 2* 1+ #] ( 25) over u< if ( !hex, UPPERCASE)
[ 2 2* 2* 2* 2* #] ( 32) -
[ 2 1+ 2* 2* 2* 1+ #] ( 25) over u< if ( !hex, lowercase)
( junk) ^
then then ( hex) [ 2 2* 1+ 2* #] ( 10) + then ( decimal) ;
: digit? ( ch - digit T | junk F) digit> dup radix @ u< ;
: @digit? ( a - a digit T | a junk F) dup c@ digit? ;
: *digit ( accum a digit - accum*base+digit a)
rot radix @ * + swap dpl @ 0< 1+ dpl +! ;
| 2002-mar-23. I still don't like how number parsing works. On the one
| hand, we know ahead of time exactly how many characters we have (in the
| token we are trying to convert); on the other, the way the prefix (sign
| and radix) and embedded (. , - : /) characters work, we can't simply put
| them in a loop: there should be at most one sign and one radix at the
| beginning. Right now I have >number (which converts digits) and punct
| words _both_ checking if there are any characters left to process. This
| seems clumsy.
|
| And that "dpl!" in ?dot bugs me, too.
| ANS compatible! - or at least it was when it converting with double numbers.
|
| If >number finds a non-digit, it pops the return stack - which contains
| the for loop counter - and returns this value, which is number of
| characters left in the token.
: >number ( accum a u - accum' a' u') ( a' is first unconvertible char)
for @digit? 0= if drop pop ^ then *digit 1+ next 0 ;
: digits ( accum a u - accum' a' u' #converted)
dup push ( chars left) >number pop over - ;
| XXX 2009-sep-01. The following doesn't make sense, and it's a lie as