PART III
THE ULTIMATE VIRUS KILLER BOOK APPENDICES
D - A SHORT GUIDE ON BOOTSECTOR VIRUS DISASSEMBLY
This appendix has been included for those of you who have
encountered a new virus and who want to find out what it does;
the boffins of machine code and Operating System programming.
Please note that this appendix is not meant for novice
programmers - and even experts will find a hex calculator and
some experience coming in handy! Note also that this is no crash
course in machine code, so some knowledge is a prerequisite.
By the way: Even if you find a virus to be harmless after
checking it by means of this appendix, please still forward it to
us for analysis, as most other people will not be able to find
out something like this!
DISASSEMBLING
All you have to do is load the .IMG file that can be created
using the "Ultimate Virus Killer", and then disassemble it using
a reassembler (e.g. "Detective" or "Easy Rider") or a debugger
program that supports external protocol files (e.g. "Templemon").
As both "Easy Rider" and "Detective" are commercial programs
with which I have little experience, I would like to explain the
necessary procedure using the Public Domain program "Templemon"
by Thomas Tempelmann and Johannes Hill. It should not be too hard
to get this program through your local PD library. It is written
by two Germans but uses only the English language. It should be
noted that "Templemon", at least in its current version, does not
work all too well on the Falcon (I use it only in ST high res
mode myself).
1) Load "Templemon" by double-clicking on the file. It will now
load and install itself, and then return to the desktop. The
program is now accessible by pressing CTRL-HELP (version 1.20 and
higher) or ALT-HELP (lower versions). This can be done at any
moment when doing anything you want. Do note that "Templemon" can
also be installed automatically from the AUTO folder.
2) Open a window on the screen, and display the directory that
contains the bootsector .IMG files. The system will now consider
that to be the default directory - which will be used by
"Templemon" for all its file operations. On the Falcon this does
not work, so you have to specify the full path below.
3) Enter "Templemon" by pressing the key combination mentioned
above.
4) Enter "l filename.ext", without the quotes (and with the real
filename, of course!). "Templemon" will now load the file at the
lowest available bit of memory it can find. If you find that
address tedious, you can just load the file again at a more
convenient, higher address (if it was $14BA8, for example, you
would want $20000). This can be done by entering "l filename.ext
20000". Note that the value of the address is in hexadecimal, but
without the "$" that normally proceeds hexadecimal values.
5) Open a protocol file. To this file, all screen output will be
echoed. Therefore you need to enter "p filename.txt". This will
echo the screen output into an ASCII disk file called
'filename.txt'.
Note: Don't use the same filename and extension as the virus
.IMG file, as this will overwrite the original!
6) Disassemble the code. If located at $20000, it will reach up
to $20200, as a bootsector is always $200 (hexadecimal) bytes
long. Use the "d 20000." command (don't forget the period,
otherwise only one line will be disassembled). The code will
scroll through the screen while simultaneously being dumped into
that protocol file. [SPACE] will toggle pause, any other will
stop. Press any key except [SPACE] when the address $20200 has
scrolled past.
Usually, a virus starts with a BRanch Always (BRA) instruction.
It is useful to skip the BIOS Parameter Block across which it
leaps, and then continue listing at the address is branches to.
7) Close the protocol file (very IMPORTANT!). This may be done
by entering "pc" (protocol close).
8) Exit "Templemon" by means of the "G" command.
You now have a disassembled source code of the bootsector. Get
rid of garbage code and all the ASCII and hexadecimal values
beyond column 40 using a text editor ("Tempus", for example). I
usually first buffer the bit of the virus that contains text
messages, then I force the editor to have "40" as new line length
(in the next alert box I select "cut off"). Then I set the line
length to 160 again, and get rid of the first four characters
(usually "!,00") at each line. In the end you will have a list of
addresses followed by machine code instructions with their
parameters. This can now be documented using a text editor like
"Tempus" or "Ed Hak" (or even a word processor). To some hack
experts this may seem an intricate method (which indeed it may
be) but it works for me.
GENERAL STRUCTURE OF A VIRUS
When looking at the disassembled listing of a virus in your text
editor or word processor, stripped of all unnecessary
information, you can quickly get some structure in it. For
starters you can add an empty line after each BRA, RTS, RTE and
JMP instruction. This should set apart the individual subroutines
rather effectively.
Every virus starts off with a branch instruction to make sure it
skips the BIOS Parameter Block. In most cases this is a BRA
(BRanch Always) instruction, but it may also be a BLS (Branch
Lower or Same) or, indeed, a BSR (Branch SubRoutine) followed by
an RTS. Other instructions are possible as well, but they all
generally just function to get across the BIOS Parameter Block
area.
Some modern viruses emulate MS-DOS bootsectors by starting off
with a word value starting with $EB (although $E9 also works).
The branch command is then usually located one word or one
longword off the beginning, as these particular MS-DOS specific
bytes may not be functional but don't crash when executed on an
680x0 processor.
The real virus starts at the address where the first real virus
instruction branches to. First, of course, it needs to install
itself in memory somewhere. The initial stage of this usually
involves a copy loop that copies x words or longwords to the
target address (for regular target addresses, please refer to
appendix F). Once that is done, the old values of the vectors
that the virus wants to bend are buffered somewhere (either on a
special longword buffer at the end of or somewhere within the
virus or directly on the last longword of the opcode for a JMP
instruction). After that, it moves the appropriate addresses of
the alternative routines contained in its own code to those same
vectors. Here it is possible to establish the addresses of the
specific virus routines, and document them in the source with
"new Hdv_bpb routine" or such.
Do note that the installation step might be more complex where
absolute addresses are concerned. A hex calculator is a
prerequisite to get that mess sorted out. Some viruses (usually
reset-resistant ones) check if they're already present in your
system. Depending on what the routine finds, they install
themselves completely or partly.
Next in the virus structure is a set of subroutines. These
contain the alternative routines belonging to the respective
vectors that have been bent. They often end with a JMP (x) to the
old address (i.e. the normal address) that was found by the virus
when installing itself and buffered consequently. This JMP
instruction, with some clever viruses, may be contained in the
middle. In case of the manipulating of a vector that functions to
read a disk sector, this may check whether the sector that was
loaded contains the virus itself - after which that virus can be
wiped from the copy of the disk sector in memory, effectively
hiding it from disk monitors and virus killers that read the
bootsector's contents!
Further routines contained in the main chunk of virus are the
trigger routine (the routine that establishes whether or not the
destruction routine should be executed) and, of course, the
destruction routine itself.
The destruction routine, which is usually the last routine
contained in the virus (ending with an RTS or never ending at
all, in the latter case of which it usually locks or crashes the
system) is generally followed by a small area of memory that is
used to buffer counters and old vector addresses.
Last: Each bootsector, whether executable or not, ends with a
word value that functions to make sure that the overall
bootsector checksum equals (or does not equal) $1234.
SYSTEM VARIABLES AND OPERATING SYSTEM CALLS
Whenever addresses between $400 and $500 are accessed, please
refer to appendix B. Whenever you encounter a TRAP command,
please refer to appendix C. GEMDOS is accessed through TRAP #1,
BIOS through TRAP #13 and XBIOS through TRAP #14.
More detailed information, the kind that may not be contained in
this book, can be found in a book like "ST Profibuch" (German) or
"ST Internals" (English). Do note that system variables may be
accessed without first getting into supervisor mode, as the
bootsector is always executed in supervisor mode.
Remember that viruses are programmed as optimally as possible
(well, most of them anyway). This means that sometimes
alternative constructions may be found to do things you would
otherwise immediately recognise. This especially tends to happen
during the access of Operating System calls (BIOS and XBIOS) -
which apart from maximum space use also function to fool some of
the older virus killers' 'Virus Probability Factor' algorithms
Optimised code: Regular code:
MOVE.L #$00040003,-(A7) MOVE.W #$0003,-(A7)
MOVE.W #$0004,-(A7)
LEA $0014(A7),A7 ADD.L #$00000014,A7
PEA $004000 MOVE.L #$00004000,-(A7)
CLR.L -(A7) MOVE.W #$0000,-(A7)
MOVE.W #$0000,-(A7)
or
MOVE.L #$0000000,-(A7)
MOVE.L #$2F1841D1,D0 MOVE.L #$31415926,D0
ADDI.L #$01482846,D0
The last example shows how virus code can be made more difficult
(though also slightly less compact). It functions to make sure
that the magic longword $31415926 is not present as such in the
boot code, making the older versions' 'Virus Probability Factor'
too low.
SIGNAL VALUES
Even when virus code seems impenetrable at first sight, you can
always recognise certain values and immediately conclude that
they belong to a certain part of a certain routine that does
certain things. Each time you see a TRAP #x command, for example,
you can work up from that point.
Some viruses really make things awfully difficult to recognise.
The following example is rather nasty (taken from the Kobold #2
Virus).
MOVE.W D1,-(A7) !Execflag
MOVE.W D4,-(A7) !Disk type
MOVE.L D4,-(A7) !Serial number
MOVE.L A4,-(A7) !Buffer address
MOVE.W #$0012,-(A7) !Protobt
TRAP #14 !XBIOS
MOVEQ #$0E,D6 !Correct stack value
ADDA.W D6,A7 !Correct stack
As you can see, it really makes things more complicated - aimed
at making it more difficult to document, and at making it less
obvious that we're dealing with a virus here. When starting at
the TRAP #14 command, it is quickly seen that we're dealing with
a Protobt function here. Appendix C will tell you what the values
have to be. It is clear that, somewhere earlier in the virus, the
data registers are supplied with their appropriate values. Do
note that the above example also corrects the stack in a rather
creative way.
Some viruses even throw in other commands in the middle that,
however, never affect the stack (of which the address is in A7).
Some others even have more values (including the function number)
put in registers. All of this with the intention to confuse, or
to make their code more flexible.
Signal value: Signals for:
$1234 Bootsector is made executable
$12123456 Virus is made reset-resistant in
the illegal way
$5678 Virus calculates the checksum for
itself to become reset-resistant in
the illegal way
$42A/$426 Virus is legally reset-resistant
$31415926 Magic longword needed for the virus
to become legally reset-resistant
TRAP #1 GEMDOS is called
TRAP #13 BIOS is called
TRAP #14 XBIOS is called
THE DIFFERENCE BETWEEN 68030 AND ST 68000 TRAP HANDLERS
Every time your computer encounters a TRAP instruction it jumps
to one of the TRAP vectors - that start at address $80 with the
address of TRAP #0 and increase with 4 bytes (one length of a
longword) for every consecutive trap. This causes a jump to a
specific address in the Operating System. On that address, the
load of values on the stack will be examined. First it checks
what the value of the function call was. It fetches a word from
the stack and jumps to another set of routines according to that
value. If the TRAP value had been #13, for example, the Operating
System would have jumped through the BIOS trap vector at address
$B4. That trap handler routine would have checked the function
value. If this would have been #4, for example, it would have
jumped to the appropriate rwabs routine to read or write (a) disk
sector(s).
The above is the same for all Atari TT/ST systems. However,
certain details differ between the 68000 processor (in the
ST/STE) and other 680x0 processors (like the 68030 in the TT and
the Falcon). These details are important for viruses, which
sometimes get values (like which device, which sector and which
track to read from or write to) directly off the stack.
When a TRAP is called while using a 68000 processor, two
additional values are popped on the stack: First the program
counter (acronym PC) with a length of 4 bytes (one longword) and
second the Status Register (acronym SR) with a length of 2 bytes
(one word). This means that the actual values that the TRAP
routine requires can be found at offset 6 on the stack. With a
680x0 processor, however, before the PC and the SR are put on the
stack it stores the offset of the TRAP vector it will jump
through (with a length of one word, i.e. 2 bytes). This means
that the TRAP routine parameters will be obtainable starting at
offset 8.