There's wind and sun and grass,
people and colour and show.
It's all free. And if there's some
racing too, well good luck to it.
It does no harm.
J.P.W. Mallalieu - Sporting Days
STRENGTH IN FORTH
part nine
IN THE STRINGDOM OF FORTH
A very sophisticated paperware Dutch computermagazine - PCM - of
April 1988 wrote about this magazine and the love-affairs of its
chief editor and founder. The particular article's subject: three
computer-magazines published on disk. In one and only packed
characterstring: ST-NEWS won the race ; it's on top of stack.
But as far as I am concerned: a packed FORTH-string is of more
importance than most racing: it's so much noise and at the end
....you are where you started. With FORTH you never get to the
end; there's always something to (n)extend.
Some of the subjects regarding strings I could have discussed in
relation to input and output with equal right, but I thought it
better to bring them all together in one issue of this course.
The emphasis is laid here on stringmanipulation, which is mostly
called very easy to implement in FORTH with standard words. But
it would have been much easier if the work had been done already.
In BASIC stringmanipulation has grown to a very extensive bunch
of statements and functions as RIGHT$, LEFT$, MID$, STRING$ etc.
The Standard doesn't specify any of these stringmanipulators at
all. In part 6, 7 and 8 we had some discussions on strings in
relation to the input. Remember for instance {WORD}, {QUERY},
{EXPECT} and the 'MOVE'-words. And reading about number
formatting you undoubtely met {PAD}.
But first let's clarify the expression 'a packed string'. A
packed string in FORTH is a sequence of bytes - an onedimensional
byte-array you might say -, of which the first byte contains the
number of bytes involved and the last byte in one way or another
marks the end of the string. The first byte is called the count-
byte. The number of stringbytes it contains, never includes
itself. It counts the number of bytes of a packed string except
the count-byte itself. For marking the end of the string a
delimiter is used. An ASCII null (0) is regarded as an
unconditional delimiter. Other delimiters may be specified by the
user. You met {"} as a delimiter of a textstring beginning with
{."}. The contents of a string are always bytevalues, readable
characters mostly, in ASCII-format.
A MONSTROUS STRING
There are several ways in which one can place strings into the
computer's memory. In part six you were learned to {CREATE} a
buffer. With the help of {EXPECT} and {CMOVE} we succeeded in
transporting a string to a region of the memory which we could
call by name later. By calling the name we got the address of
where the string started. Here is another method by which we can
place strings in a stringbuffer we created in advance. We will
use the scratchpad area {PAD}, an already existing buffer to
store the string temporarely before we move the string to its own
save buffer from where we can call the string for further
manipulation if needed.
: TEXT PAD 80 32 FILL WORD COUNT PAD SWAP MOVE ;
First we clear {PAD} with {PAD} 80 32 {FILL}, which means: insert
ASCII 32 - a space - into 80 bytes starting at the address which
{PAD} will place on the stack and which is the startaddress of
the scratchpad. Then {WORD} is called to come to action. {WORD}
needs a delimiter which we will not provide in the definition of
{TEXT}, but when {TEXT} is executed. {WORD} will transfer the
characterstring to the wordbuffer of which it leaves the
starting-address on the stack. This address is the address of the
count-byte. Then {COUNT} will measure the length of the string by
fetching the contents of the address on the stack, after which it
will increase the address with 1 to get to the byte which
contains the first character of the string. {PAD} leaves the
start-address of the scratchpad and {SWAP} sets the three
stackitems in the right order for {MOVE}. Think of {MOVE} as an
intelligent move-word: it knows which 'move' is to use here .
Let's try:
CREATE MBUF 80 ALLOT
To enter a textstring of up to 80 characters in {MBUF} use the
sequence
: STRIN 44 TEXT PAD MBUF 80 MOVE ;
The word {STRIN} is used as {STRIN} Caelospheridium, Notice the
comma at the end of that terrifying word (It's the first part of
the name of a trilobite, a blind member of a quite extensive
species of animals that lived on earth some 500,000,000 years
ago. It's full name was Caelospheridium cyclocrinophilum. Who
cares ?). That comma is quite a must, as we specified ASCII 44 as
a delimiter for {WORD}. We achieved our goal. We did that in two
steps. First we {CREATE}d something and after that we defined the
actual action: storing characters into memory. It's rather a
round-about way of string handling, don't you think ? We could
improve matters if we could in some way create an empty string,
say. It's time to worry about that when we will hear about
{CREATE} and {DOES>}. For the time being we've enough to wrestle
with strings as far as we got; the third method of creating
strings doesn't have much influence on what is to come next
either.
KILLING AN ELEPHANT
And next is to come some stringhandling with MID$ and the like.
As I stated earlier, there are no such words implemented in FORTH
and probably never will. I can make that prediction with good
reason. The BASIC-language for instance, has to offer statements,
that are useful in all programming situations involving strings.
FORTH, on the contrary, offers programmers tools with whom they
can shape 'statements' to the exact need of a particular program.
In this respect one could call BASIC a complex language and FORTH
a split-up one. BASIC will kill an ant using the very same weapon
as for shooting an elephant. FORTH will construct an elephant-
rifle to do its job and smash the ant with his little finger.
You will see that defining a MID$-word is easily done and doing
it you will get the impression as if you have a glance at BASIC's
MID$ through that fine looking-glass of FORTH.
The following definition of MID$ assumes the words above to be
previously defined and placed into the dictionary. That again
indicates that the MID$-word is meant to act in a specific
setting and not to be used in a general way. A second remark on
MID$ is, that it can be shrunk, if the word does its job well in
the program in which it was meant to be incorporated.
: MID$ MBUF SWAP >R SWAP ( n1 n2 --- )
1 MAX
2DUP R> + SWAP
C@ 1+ MIN >R OVER
C@ MIN R> OVER -
>R + R> DUP
0< IF ABORT" Negative count !"
ELSE TYPE
THEN
;
MID$ is used as follows.
{STRIN} This is part nine, OK.
And then:
3 5 (MID$} is is OK.
{MID$} needs two parameters. n1 indicates the first character -
counted from the left - to be included in the extraction of n2
characters. Notice that a space is a character too ! Got it ?
Let's trace {MID$}.
STACK WORDS COMMENT
3 5 n1 n2
3 5 addr MBUF addr is the address of MBUF
3 addr 5 SWAP
3 addr >R saving the number of chars on
the returnstack.
addr 3 SWAP
addr 3 1 MAX not before first character
addr 3 addr 3 2DUP enough for manipulation
to come
addr 3 addr 3 5 R> replace 5 on stack
addr 3 addr 8 + 8 = max offset into string
addr 3 8 addr SWAP
addr 3 8 C@ first byte of MBUF is length
addr 3 8 17 count byte ( = 17 ) of string
addr 3 8 18 1+ length including count byte
addr 3 8 MIN don't go past end of string
addr 3 >R
addr 3 addr OVER
addr 3 17 C@
addr 3 MIN not for starting char
addr 3 8 R> replace 8 on stack
addr 3 8 3 OVER prepare for subtraction
addr 3 5 - 5 to extract
addr 3 >R save number for later use
addr+3 + address of first char to
extract = i
addr+3 5 R> and how many to extract
addr+3 5 5 DUP
addr+3 5 5 0 0< negative count ?
addr+3 5 0 IF no, skip the if-clause
addr+3 5 1 IF yes, abort and errormessage
addr+3 5 ELSE
addr+3 5 TYPE display addr+3 ... addr+8
THEN job done !
As you might have percepted, the inner live of {MID$} checks
wether the parameters are fit for the string on hand on several
occasions.
SOME THOUGHTS ON POETRY
Note the use of {ABORT"}. This word is ideal for error-checking,
as you can insert any text desired. Speaking of error-checking
and -reporting in FORTH, a number of words in FORTH have an
associated error condition. The documentation provided with your
system should specify the system action on each of those error-
conditions. How your system alarms you, when you are making a
mess of it, can also be found in your documentation. There are no
general rules ! There are some conventions. Although I have a
FORTH that allows division by zero, breaking all conventions and
rules in one.........The next word to define of course is
{RIGHT$} or {LEFT$}, whatever you like most. First a general
remark. Having the know-how for a {MID$}-word, it is, or should
be, quite clear that {LEFT$} and {RIGHT$} are more or less of
the good too much. My first BASIC - on the TI99/4a - performed
all string extraction with one word: SEG$. (SEG of SEGment). I'll
give you the definition of {RIGHT$} first.
: RIGHT$ ( n --- )
MBUF C@ OVER - 1+ SWAP MID$ ( !!!!!!!!!!! ) ;
( !!!!!!!!!!!! ) means: be surprised ! The only surplus action of
{RIGHT$} over {MID$} is to calculate one parameter and to arrange
the right order of the two parameters for {MID$}. {RIGHT$} so,
expects one parameter on the stack. That parameter is the number
of characters to be extracted. Assuming that the string 'This is
part nine' is still in {MBUF}, 6 {RIGHT$} will type t nine OK,
being the six characters seen from the right. Mind the following:
{MBUF} {C@} fetches the contents of the count-byte i.e. the
length of the string stored at {MBUF} (= 17). {OVER} duplicates
the 17 over the 6 and {-} gives 17 11. Now {1+} and {SWAP} leaves
12 17 on the stack, which tells {MID$} to start extracting at
{MBUF} + 12 and to end at {MBUF} + 17. Quite simple !! As {LEFT$}
is even simpler to define, you can do it !!
Now you have three (three ?) tools by which you can play with
strings in any way you like. May be with some applications the
need of slight modifications of the string-words will arise, but
that shouldn't worry you by now.
Two more handy tools would be a string comparison-word and a word
to concatenate two strings together. Most FORTHs are so lucky as
to include both words for your convenience. As FORTH has a
dictionary, it seems logic he has some method to look up the
words in it. And that's right ! When you enter a new definition
with the name of an already existing word, FORTH will point out
to you that you now have a twin word: nnnn isn't unique. So he
must have found a characterstring of the same constitution and
the only way to accomplish that is by comparison. And the other
way round too: if you want to know, whether a word is in the
dictionary, you can look for it by {FIND}. {FIND} {MBUF} will
leave the address of {MBUF}, if found, otherwise a zero is placed
on top of the stack. The address left if the search was
successful is the code field address. Of course you cannot do
much good with {FIND}, if you want to compare two strings not
being a FORTH-word. What we need should be made of some other
stuff. Our comparison-word should compare two strings of any
length <80 bytes and it should terminate its action as soon as it
encounters differing characters. As we, poor mortals, are stuck
to ASCII if we want to compare characters, 'AsCII' will not be
equal to 'ASCII' in this context as ASCII 's' differs from ASCII
'S'. I won't use any word, defining the string-comparison word,
that you do not already know. As it is when learning a new
language - natural or computer - you will always try to express
yourself on the level of the language you know best and that
often implies a complexity of the expression you never can match
in the language you are learning. But it is not said then, that
you can't manage to express yourself with equal power by using a
small vocabulary in an efficient way. The power of the poet is
not the perennial complexity, but the pleasing simplicity.
We are going to define six words. Here they are !
CREATE MBUF1 80 ALLOT CREATE MBUF2 80 ALLOT
: (TEXT) PAD 80 32 FILL WORD COUNT 80 >
IF
ABORT" String too long !"
THEN
1- PAD 80 MOVE
;
: TEXT$1 44 (TEXT) PAD MBUF1 80 MOVE ;
: TEXT$2 44 (TEXT) PAD MBUF2 80 MOVE ;
: COMPARE >R 2DUP C@ SWAP ( addr1 addr2 count --- 0 | 1 | -1 )
C@ R@ >=
SWAP R@ <=
AND 0=
IF
ABORT" Lengthcount error !"
THEN
R> 1+ 1 DO
2DUP I + C@ SWAP I + C@ 2DUP >
IF
2DROP 2DROP 1 LEAVE
ELSE
2DUP <
IF
2DROP 2DROP -1 LEAVE
THEN
THEN 2DROP
LOOP
DUP 1 >
IF
2DROP 0
THEN
;
First we enter both strings. {TEXT$1} SANTA CLAUS,
{TEXT$2} SANTA CLAUS,
Now we have two identical packed strings at two different
addresses, {MBUF1} and {MBUF2}. We are going to compare the
strings in two different ways. The first one will give an error
message. {MBUF1} {MBUF2} 12 {COMPARE} Lengthcount error ! OK. As
we didn't destroy the contents of the two string-buffers we can
happily use them for the second try. {MBUF1} {MBUF2} 8 {COMPARE}
OK. With {.} you can find out, that a 0 has been pushed onto the
stack, as the code for 'strings matched !'. For further testing
the {COMPARE}-word, let's change one string, e.g. {MBUF2}.
{TEXT$2} SANTA PLAUS,. The comparison {MBUF1} {MBUF2} 4 {COMPARE}
will leave a 0 on the stack, but {MBUF1} {MBUF2} 10 returns a 1,
because ASCII C in {MBUF1} is lower than ASCII P in {MBUF2}. You
will get -1 with {TEXT$1} WANTA PLAUS, and {MBUF1} {MBUF2} 10
{COMPARE}. So far for the comparison of strings. Let's see,
what we can do about turning two strings into one by glueing them
together.
WHAT FORTH HAS GLUED TOGETHER...
It really is so very easy done, this glueing, that in some way I
feel uncomfortable to bring up the subject and still more in
demonstrating how it is done. Create three buffers, store in two
of them two strings and having done that move the strings one by
one into the third buffer.
We will use the buffers {MBUF1} and {MBUF2} and create a third
one by
CREATE MBUF3 255 ALLOT
It is important to be aware of the fact, that we are going to
{TYPE} the concatenated string sooner or later. That implies,
that the first byte of the concatenated string should contain the
lengthcount. Well, fasten your seatbelts !!
: GLUE SWAP COUNT DUP >R ( ADDR1 ADDR2 --- )
MBUF3 1+ SWAP MOVE
COUNT SWAP OVER
MBUF3 R@ 1+ + SWAP MOVE
MBUF3 SWAP R> + SWAP C!
;
{GLUE} enables two strings to be chained together to form one
string, which can be typed out on a terminal with {COUNT} {TYPE}.
Enter the strings by
{TEXT1} Vapor, {TEXT2} ware, OK
{MBUF1} {MBUF2} {GLUE} {COUNT} {TYPE} Vaporware OK
I didn't have to include error-checking furthermore, as the
concatenated string can't grow any longer than 160 bytes, being
the buffer for that string 255 bytes in length.
If you like it, I'll show you how to copy strings, to delete
them, to search for a string into another one and how to whipe a
string out of another string and even some more additional
string-manipulating words.
Deleting a string is the same as filling the stringbuffer of
that particular string with zeroes or spaces. The countbyte will
deal with the exact number of zeroes or spaces. So the definition
looks like
: DEL$ DUP C@ 1+ ERASE ( addr --- ) ;
If the string to be deleted is at {MBUF1}, the actual action is
carried out by {MBUF1) {DEL$}.
The second easy to define word is {COPY}, which will copy the
string from {MBUF1} to {MBUF2}.
: COPY$ OVER C@ 1+ MOVE ( addr1 addr2 --- ) ;
To copy a string, act as follows: {MBUF1} {MBUF2) {COPY$}. When a
certain application involves a lot of stringcopying, a
intelligent {MOVE} becomes inevitable, as the source- and
destinationbuffers may vary all the time from high to low and low
to high addresses. So be sure to have one !!
SOME COMPLICATIONS
To find a (smaller) string into the body of another (longer)
string is at some point somewhat more complicated.
VARIABLE POINT 0 POINT !
: IN$ DUP C@ 1+ 0 DO ( addr1 addr2 --- 1 | 0 )
OVER C@ 1+ 1 DO
2DUP J I + + C@ SWAP
I + C@ =
IF
1 POINT +!
ELSE
0 POINT ! LEAVE
THEN
LOOP OVER C@ POINT @ =
IF
2DROP 1 LEAVE
THEN
LOOP DUP 1 <>
IF
2DROP 0
THEN
;
Some observations: the first {LEAVE} jumps out of the inner {DO..
.LOOP} if the two characters at the same places in the two
strings don't match and execution is handed over to the following
{OVER}. The variable {POINT} is used to store the number of times
the characters of the smaller string one after another match the
characters of the longer string one after another. The sequence
between the two {LOOP}-words checks if the number stored in the
count-byte of the smaller string - found by {OVER} {C@} - is
equal to number of matching characters - {POINT} {@}. If so, then
the smaller string is found in the longer one and a 1 is pushed
onto the stack, indicating a positive result of the search-
labour. The search for the matching string continues until the
outer {DO...LOOP} has exceeded his task or until the string is
found. This outer loop uses the contents of the count-byte of the
longer string at address addr2 as its delimiter. The inner loop
takes its delimiter from the count-byte of the smaller string.
If you are going to use at address addr1 a longer string than at
address addr2, the {IN$}-word will not work properly.
Removing a (smaller) string out of a longer one can be done by
: REMOVE$ ( INDEX COUNT ADDR --- )
OVER >R >R SWAP DUP 1+ ROT OVER + ROT R> DUP C@ ROT
- >R SWAP OVER + SWAP ROT OVER + ROT SWAP R> MOVE
DUP C@ R> - SWAP C!
;
Enter {TEXT1} This is well done, OK. 8 5 {MBUF1} {REMOVE$} will
erase the word 'well' and shorten the string to 'This is done'.
If you unravel the definition, you will see that removing 'well'
is more complicated than you probably thought. You can type the
shortened string with {MBUF1} {COUNT} {TYPE} This is done OK. The
first parameter tells {REMOVE$} where to start the removal and
the second indicates the number of characters to remove. The
section following {MOVE} adapts the count-byte to bear the length
of the shortened string.
For inserting a string into another one the following words will
do.
: <MOVE 0 DO 2DUP I - C@ SWAP I - C!
LOOP 2DROP
;
Why do we need this byte-moving word ? Well we know two methods
of moving bytes. This method adds a third one. It starts at the
'high' end of the string-buffer and moves to the 'high' end of
the other buffer and moves in decreasing direction as depicted
below, so it never can overwrite anything.
|¯¯|¯¯|¯¯|¯¯|¯¯|¯¯| |¯¯|¯¯|¯¯|¯¯|¯¯|¯¯|
¯¯¯¯¯¯¯¯¯¯¯¯|¯¯|¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯^¯^¯
| ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯|¯¯
| first move |
| |
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
second move
: INSERT$ ( ADDR1 ADDR2 INDEX --- )
-ROT 2DUP >R >R DUP DUP C@ + >R SWAP C@ R@ + R>
2OVER C@ 1+ SWAP - <MOVE
+ R@ DUP C@ SWAP 1+ ROT ROT MOVE
R> C@ R> DUP C@ ROT + SWAP C!
;
Its use is
{TEXT1) surely , {TEXT2} I will come tonight,
{MBUF1} {MBUF2} 3 {INSERT$} will insert the string 'surely ' at
the third byte of {MBUF2}, after (!) 'will come tonight' has been
moved out of the way. Typing {MBUF2} {COUNT} {TYPE} will give I
surely will come tonight OK. Notice the space at the end of
'surely '. {INSERT$} will not work if you try to 'insert' a
string at the end of another one. But that's no insertion, but
concatenating strings. We defined another word for that purpose.
As I used a lot of stackmanipulators, it is as a exercise worth
while to investigate the stackbehaviour of the various
stringwords.
A closer look at the various stringmanipulating words defined in
the above section, will learn that they are not integrated i.e.
it is not possible for instance to use {STRIN} in combination
with {GLUE}. There are, of course, a few reasons why it is as it
is. One reason is, that there are little new words to explain in
the section SUMMARY as usual, leaving much room for expanding the
EXERCISES-part. The second reason is related to the first:
creating more opportunities to tease and tickle your brains
asking questions about that intellectual problem of integration.
Next time we will observe the creations of {CREATE...DOES>} and
discover that FORTH let's you expand his compiler. We will meet
again the problem of strings, but things will be easier then with
{CREATE...DOES>}.
Happy Birthday !!
SUMMARY
_______________________________________________________________
| WORDS | STACKNOTATION | DESCRIPTION |
|¯¯¯¯¯¯¯¯¯¯¯¯¯¯|¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯|¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯|
| ABORT" | ( --- ) |Used to terminate execution |
| | |of a word on a user-defined |
| | |error-condition. A message |
| | |is given. The text of the |
| | |message is to be put after |
| | |the " and should be closed |
| | |with a ". |
| FIND | ( --- addr ) |Leaves the codefield address|
| | Used as |of the next word nnnn in the|
| | FIND nnnn |input stream. If not found, |
| | |a zero is left. |
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
EXTRAS
In the previous part I defined a word called {INPUT}. In that
definition the sequence {2DROP} {2DROP} occurs. I can tell you
now, that there is a mistake. One of the {2DROP}s should have
been a {DROP}. Perhaps you found your system in distress, when
you tried to execute {INPUT}. It was just a typing mistake. I
checked the original definition; it was right ! If you correct
the {INPUT}-word, it will work properly on your system too !
Sorry !!!
SOLUTIONS TO PART 8
1. The definition is : BASE? BASE @ DUP DECIMAL . BASE ! ; .
2. A practical upper limit could be 36, while staying within the
alpha-numerical range of the characterset (26 letters and 10
figures). Even in a radix of 36 there is no chance of getting
numbers like "#~&6G or something like that.
3. U. -1 HEX.
4. : HEX. BASE @ SWAP HEX . BASE ! ;
: DEC. BASE @ SWAP DECIMAL . BASE ! ;
5. As {CONVERT} itself uses {BASE}, the solution is simply to
change {BASE} with {HEX} e.g.
6. : H. ." &H" HEX. ;
7. : H. DUP 0< IF ." -&H" ELSE ." &H" THEN ABS HEX. ;
8. Because 32 {BASE} {!} means 50 {BASE} {!} in decimal !!
EXERCISES
1. Why did we use a comma (ASCII 44) as a delimiter in the
{TEXT)-words and not a space ?
2. Define a {LEFT$}-word, which uses {MID$} as defined in the
text above.
3. The {COMPARE}-word could be optimized a fairly great deal by
dropping one of the two {2DUP}-words. As a result 3 {2DROP}-
words can be left out. Now which {2DUP} and which 3 {2DROP} ?
4. Why is - practically - 255 bytes the absolute maximum-length
of a packed string as decribed above ?
5. The {COMPARE}-word exists of two parts with clearly distinct
tasks. It is possible to define these two parts as two
separate words. Do it !!
6. The {MID$}-word can be optimized too, as I stipulated above.
Try it !!
7. Install an error-message in the {IN$}-word to prohibit that a
search is made for a longer string in a small one. Use
{ABORT"}.
8. The {GLUE}-word is 'attached' to the buffer {MBUF3}. Could
you write a new 'independent' {GLUE} ? (I did it !!).
9. The optimized {MID$}-word - as asked in exercise 6 - is no
longer dependent of a certain buffer. That surely has its
repercussions on related words as {RIGHT$}. Adapt those
related words.
�
Disclaimer
The text of the articles is identical to the originals like they appeared
in old ST NEWS issues. Please take into consideration that the author(s)
was (were) a lot younger and less responsible back then. So bad jokes,
bad English, youthful arrogance, insults, bravura, over-crediting and
tastelessness should be taken with at least a grain of salt. Any contact
and/or payment information, as well as deadlines/release dates of any
kind should be regarded as outdated. Due to the fact that these pages are
not actually contained in an Atari executable here, references to scroll
texts, featured demo screens and hidden articles may also be irrelevant.