Programming GfA: SimpleGrep by Stefan Posthuma
Maybe it is time to start a new series. A series about
programming. Now this won't be another 'GfA for beginners'
course, because they can be found in every piece of ST literature
that flood the market. This will be more of a series in which
every time we focus on a certain program. These programs are not
intended to be useful, very userfriendly or in other ways
valuable. They are just ment to cover the topics I talk about in
the article. I assume that you have a good working knowledge of
GfA. I will use things like Varptrs, Gemdos, Xbios and Bios calls
The program itself (the source that is) can be found in the
PROGRAMS folder on this disk.
This time it will be a very, very simple version of 'grep'
utility that can be found on UNIX and UNIX-like operating
systems. Grep is a program that searches files for the occurrence
of a regular expression and prints the filname and the linenumber
in case it finds the specified regular expression. In case you
don't know what a regular expression is, this is a search string
which can contain special characters which will instruct grep how
to search the line. It can roughly be compared to the
fileselector. If you enter *.DOC, all filenames ending with .DOC
will be listed. In this case, '*.DOC' is a regular expression.
But a real regular expression can be much more complex. For
instance, the regular expression '^[0-9][0-9] *[Ss]T' will search
for a line beginning with two digits, followed by zero or more
spaces and the word 'ST' in which the S can be upper or lower
case. In our grep, forget about regular expressions and stick to
simple search strings. (UNIX experts...this is fgrep!)
In order to make grep as flexible as possible, we want to make it
a Tos Takes Parameters (.TTP) application. Whenever you run a
.TTP program, a dialogue box appears prompting you for a command
line. This command line contains certain values which will be
passed to the program. Our little grep program needs to know two
things: which files to search through and what to search for. So
on our command line, we have to enter a filename and a search
string. It would be nice if the output could be sent to the
printer. So we have to give another parameter which tells grep to
send the output to the printer instead of displaying it on the
screen. We'll call this the -p option. So a command line like
*.C hello -p
will search through all .C files (C-sources) for the occurrence
of the string 'hello' and will print the filename, the linenumber
and the line itself if it finds 'hello'. (On the printer. Omit
the -p option, and it will be displayed on the screen).
Let's grab the GfA interpreter and start programming...
The first thing we want our program to do is to read the command
line and see if it understands it. To do this you need to know
that the command line is stored at the adress Basepage+129 and is
127 bytes long. So the only thing we have to do is to create a
string, let's say C$ of 128 bytes long and transfer the command
line to it:
The Bmove command moves bytes from one adress to another. The
first adress is where to move from, the second adress is where to
move to and the third value is the number of bytes to be moved.
We had to use the statement C$=Space$(127) first to allocate 127
bytes of memory for C$. If we did not do this, the Varptr of C$
would have been 0 and the Bmove command would have resulted in a
bus error (2 bombs).
Now the command line is happily residing in C$, ready for us to
be taken apart!
First we have to extract to strings from it, the filename and the
search string. Before that, we have to strip all trailing spaces,
newlines and other rubbish:
Now we have to find the first string (I call it a word, a
sequence of characters delimited by spaces)
While Mid$(C$,Pos1%,1)=" " And Pos1%<=Len(C$)
While Mid$(C$,Pos2%,1)<>" " And Pos2%<=Len(C$)
Pos1% points to the beginning of the word and Pos2% points to the
end of the word + 1.
Now get the filename:
Now get the second word, the search string:
If there is a third argument (the -p option), read it:
Print "Grep: unknown option ";Opt$
While Gemdos(17)=0 ! printer ready?
Print "Grep: printer not ready .. Abort Retry Ignore?"
Until K$="A" Or K$="R" Or K$="I"
Exit If K$="A" Or K$="I"
The The_end routine will ask the user to press a key and will
exit the program, back to the desktop.
Now we have extracted the nessaray data from the command line, it
is time to call our actual grep program:
Gosub Grep(Filename$, Search$, Prt%)
The first thing we have to do is to extract the possible pathname
from the filename. Later on we use this path to open the files.
Knowing that directory names will be separated by '\' characters,
we can construct the following routine:
Print "Grep: incorrect filename specification."
Now we have to search all files that correspond to the given
filename. If you entered '*.C', all files with the extenstion
'.C' have to be examined. Fortunately there are Gemdos calls
which can take care of this. But before we can use them, we have
to determine the Disk Transfer Adress. This is the adress of a
44-byte buffer which will be used by the aforementioned Gemdos
calls to pass information to our program. It will contain things
like filename, size, date etc. This must be a fixed adress and is
not allowed to change. So we can't allocate a 44-byte string and
pass its Varptr to the Set DTA routine, because GfA will move
strings around memory while string-processing, and the Varptr of
the 44-byte string will change while Gemdos still uses the old
adress we gave it, which usually results in a mess. So we have to
get a fixed adress.
Memory Management: assigning buffers and stuff
The answer is simple, and will satisfy the needs for a lot of
buffer-purposes which need fixed adresses. When a GfA program is
run, it claims all available memory. Most programs don't need
700-plus K of storage (on a 1040 or Mega). So we can drastically
reduce this amount with the Reserve command. Now if you are a
good programmer (and we all are, aren't we) you can make a good
guess of how many bytes your program is going to use. This grep
program won't need more than 1000 bytes, so a Reserve 1000 will
do the job. But what the heck, plenty of memory, and to make sure
nothing will go wrong (we all want to write bug-free programs,
don't we?) we'll make that 5000. Boy, on the commodore 64 (waves
of sentiment are now disturbing the calm seas of thought I just
had) I struggled to save yet another 100 bytes and here I am
sacrificing 4000 bytes like it is nothing! Times have changed....
But let's get back to the point. Five thousand bytes are
allocated and the remaining 695000 are released and never touched
again by GfA. Now there is a nice system-variable (A GfA system
variable that is) called Himem. This contains the last adress
used by GfA. All adresses beyond this value are free. But there
is one trap here: the fileselector. There are some pretty weird
things going on here. Since GfA does not treat GEM with respect
and the fileselector is a typical GEM object, the two will clash.
When the fileselector is closed, an ugly grey spot will remain.
This is the underlying desktop which quickly grabs the
opportunity to exhibit itself. GfA swiftly covers it with the
part of the screen it saved before calling the fileselector, but
you'll notice it. The memory used to store this part of the
screen lies directly behind Himem. But there are differences
between the compiler and the interpreter here, and if you Reserve
not enough bytes, the program will run correctly using the
interpreter, but will crash (out of memory error) when it has
been compiled. So use Himem + 4000 to be sure. After all this
smalltalk, let's do some programming:
The Gemdos call will tell the ST where to find the DTA buffer.
Now we will have to inform the ST which files to search for, and
what their attributes should be. A file attribute is a value
telling what kind of file we are dealing with. For our Grep, only
the value 0 (normal read-write) and 1 (read-only) are important.
The following call will do the job:
This call will return a value, and a value of 0 will tell us that
at least one file was found. The exact name of the file can be
found in positions 30 to 43 from the DTA buffer. After processing
the file, search the next one and see if there are any more files
to examine with the Gemdos(&H4F) call. The following routine can
Filename$=Filename$+Chr$(0) ! Gemdos want zeroes behind strings
Empty$=Space$(13) ! Need it to clean DTA buffer
Bmove Varptr(Empty$),Dta%+30,13 ! clean filename in DTA buffer
N$=String$(14,0) ! create string for filename
Bmove Dta%+30,Varptr(N$),13 ! get filename from DTA
Bmove Varptr(Empty$),Dta%+30,13 ! clean filename in DTA
Print "Grep: ";Filename$;" not found."
The reason that I clean the filename in the DTA buffer is that a
short filename will not completely overwrite a longer one which
was previously there. If this happens, parts of the longer
filename will appear behind the shorter one and this file will
(probably) not be found.
Processing a file is simply a matter of opening it, reading a
line, check if the search string is in the line, read the next
line until the end of the file is reached.
Lc%=0 ! file line counter
While Not Eof(#1)
Line Input #1,A$
If Asc(Inkey$)=27 ! you pressed Escape?
If Instr(Search$,A$)<>0 ! search string there?
If Prt%=0 ! print or display
Print N$;" line ";Lc%;":";A$
Add Sc%,((Len(A$)-62)/80)+1 ! adjust for longer lines
If Sc%>23 ! avoid screen-scrolling
Print "-- more --";
Lprint N$;" line ";Lc%;":";A$
The variable Sc% is used to count the number of lines that have
been printed on the screen. When the screen is full, you can
press a key and the next screenfull will be shown.
This concludes the first little program that has been used to
tell you a little more about programming the ST. You should do
one more thing and that is to compile the source and save it as a
The things that were important here:
- How to read the command line from a TTP program
- How to search files matching wildcards (*.C, X?.*, *.*, etc.)
- How to allocate fixed buffers and other stuff (Reserve, Himem)
I have to make one more remark, and that is that the command line
buffer is always there. So if your program (even if it is a .PRG)
is called by another program which also supplies a command line,
you can read and use it in the same way. If you for instance
write an editor, always read the command line, so if a program
like a C-shell calls your editor with a command line, your editor
will immediately read in and display that file.
Keep programming the ST, and when you have written a program that
might be useful to other people and you feel like writing about
it, do not hesitate to send it to our correspondence adress. It
is great to have your program and article published in a world-
wide magazine, so start programming right away!
My fingers are tired. I typed in all this in one go. My throat is
signalling symptoms of extreme dryness and my stomach could do
with some crispy-chips. I will do downstairs and raid the