I have a UPX packed sample that corrupted the resource directory. These changes
allow the resources to be properly parsed.
They add an RVA and size to the resource struct. This is the address and size
of the resource as it is declared in the directory. If the address is invalid
create a zero-length buffer for the data. If the size is invalid (ie: it goes
off the end of the .rsrc section) create a zero-length buffer for the data.
Otherwise, return the actual data.
This allows consumers of the rsrc to figure out if the resource is corrupt
or not by comparing the length of the buffer to the size element. If the
size is greater than 0 but buffer is empty then it's invalid.
Also, it should never happen but just to be safe make pepy catch NULL
buffers (in pepy_data_converter) and return an empty bytearray.
I had initially written this in such a way that it would break if there
were multiple entries anywhere other than the first table. This change
now works across more complex samples that I have tested against.
While here, I did a little moving around and had to create a structure
that isn't used other than to know how far to move the offset when
parsing. This is because the struct into which I am parsing the data
keeps track of other things along the way, so it's size is incorrect.
While here, change parse_resource() to be parse_resource_table() as it
is more accurate to what it really does.
When iterating through the bytearray it would cause a python crash if
the byte value was 0x78. I have a test sample where the first 8 bytes
at the entry point are 0xe8 0xa6 0x4e 0x0 0x0 0xe9 0x78 0xfe. If I don't
do this dance it crashes when trying to get the 6th (0x78) byte out
of the array.
If get_bytes does not fill the list, get a slice of what was filled and
use that to convert to a bytearray. I still want to find a way to just
use a bytearray from the start. Luckily with the rest of this commit I
don't have a need to call get_bytes() on sections anymore.
Sections now have a data attribute which is a bytearray of the data that
makes up that section. This way you can just use section.data attribute
to get the entire contents and operate on it as you wish.
Make test.py use section.data to generate an MD5 of the section. It now
also prints the first 10 bytes of each section (if there are bytes).
It probably isn't the best way to do it but I couldn't get anything to work
when trying to generate a bytearray object directly. As a workaround I first
put each byte into a list and then convert the list to a bytearray.
Instead of having 2 macros for each object simplify by having 1 set of
macros that can work across all objects except the parsed object. I could
make this work for the parsed object by making the parsed object store
PyObject pointers to the parsed values instead of creating them on the fly
while getting an attribute.
Might as well do some general cleanup too:
Rename the len attribute of a section to length.
The section, import and export callbacks return 0 on success and anything else
on failure.
Whitespace fixes.
Fix a bunch of copy/paste mistakes in the test script.
This means I don't have to store anything in the pepy_parsed object (PyObject
pointers or native C types). Use a macro to get things out of the parsed
structures and into python objects.
There was some weird memory corruption caused by how pepy_parsed_init()
was parsing arguments. The result was that accessing attributes or methods
which didn't exist would periodically cause segfaults. This code was leftover
from an earlier way of doing things and doesn't need to be done this way.
Just parse straight to a C style string instead of this crap.
Also implement support for signature, machine support.
Also, add Py_TPFLAGS_BASETYPE as you should.
Convert the PyObject pointers used inside pepy_parsed into their corresponding
native types and use those. Teach the members array to return them accordingly.
While here might as well add support for signature and machine values.
Also, convert test.py to have shorter output by not using pprint.