This was causing a problem where resources with strings would accumulate
the strings of previous resources in the directory.
For example, here is the output of test.py on
3f0961b7942f12bc96848509c04da2b6:
Resources: (4)
[+] MD5: (191649) 33a6345b919c7c733da9d33ee4ac64eb
Type string: BINARY
Name string:
1.165.3106.0_TO_1.165.3138.0_MPASDLTA.VDM._P
Lang: 0x0
Codepage: 0x4e4
RVA: 0x51dc
Size: 0x2eca1
First 10 bytes: 0x4d50535091ec0200c263
[+] MD5: (293587) e4c9b9aa65e0b236cb180fa489502700
Type string: BINARY
Name string: 1.165.3106.0_TO_1.165.3138.0_MPASDLTA.VDM._P1.165.3106.0_TO_1.165.3138.0_MPAVDLTA.VDM._P
The second resource has the first resources name string in it.
Teach the parser to properly handle PE32+ binaries.
The major differences are:
- Fields in the OptionalHeader which are not relative are now 64 bits.
- Base addresses should all be 64 bits.
- The BaseOfData field is not available on PE32+
There is now a 16 bit field tacked on to the end of nt_header_32 called
OptionalMagic. This is a duplicate of the Magic field in optional_header_32
and optional_header_64, but is stored in nt_header_32 to make it easier
to determine which optional header is being used.
I also added support for better error reporting. Now when something fails
to parse you can use a couple of functions to find out what happened and
where it happened:
- GetPEErr(): Return the error as an integer.
- GetPEErrString(): Return the error as a string.
- GetPEErrLoc(): Return the function and line number of the error.
Made some changes to pepy to account for these changes. The interface
into pepy is identical. Only externally visible changes are that
pepy.parse() will now return the error string and location when parsing
fails and the baseofdata attribute will throw an exception if the binary
is PE32+.
to_string.h is now included from parse.h, so remove it from dump.cpp.
While here do a bunch of cleanups to make printing consistent. Use '0x'
where appropriate and ensure exceptions are punctuated correctly.
Instead of constantly defining and redefining the macros to read values
just define them once. There are now the three main ones (READ_WORD,
READ_DWORD and READ_BYTE) along with READ_DWORD_PTR and READ_DWORD_NULL.
Each macro takes a pointer to a bounded_buffer (what to read), an offset
(where to read), a structure and member (what to read into). You should
use READ_DWORD_PTR when you have a pointer to a structure. You can
use READ_DWORD_NULL when failure to read should return NULL as all the
rest return false.
Fixes#7.
I have a UPX packed sample that corrupted the resource directory. These changes
allow the resources to be properly parsed.
They add an RVA and size to the resource struct. This is the address and size
of the resource as it is declared in the directory. If the address is invalid
create a zero-length buffer for the data. If the size is invalid (ie: it goes
off the end of the .rsrc section) create a zero-length buffer for the data.
Otherwise, return the actual data.
This allows consumers of the rsrc to figure out if the resource is corrupt
or not by comparing the length of the buffer to the size element. If the
size is greater than 0 but buffer is empty then it's invalid.
Also, it should never happen but just to be safe make pepy catch NULL
buffers (in pepy_data_converter) and return an empty bytearray.
I had initially written this in such a way that it would break if there
were multiple entries anywhere other than the first table. This change
now works across more complex samples that I have tested against.
While here, I did a little moving around and had to create a structure
that isn't used other than to know how far to move the offset when
parsing. This is because the struct into which I am parsing the data
keeps track of other things along the way, so it's size is incorrect.
While here, change parse_resource() to be parse_resource_table() as it
is more accurate to what it really does.