pepy
====
pepy (pronounced p-pie) is a python binding to the pe-parse parser.

pepy supports Python versions 3.6 and above.

The easiest way to use pepy is to install it via pip:

```bash
$ pip3 install pepy
```

## Building

If you can build pe-parse and have a working python environment (headers and
libraries) you can build pepy.

1. Build pepy:
  * `python3 setup.py build`
2. Install pepy:
  * `python3 setup.py install`

**Building on Windows:** Python 3.x is typically installed as _python.exe_,
**NOT** _python3.exe_.

## Using

### Parsed object

There are a number of objects involved in pepy. The main one is the **parsed**
object. This object is returned by the *parse* method.

```python
import pepy
p = pepy.parse("/path/to/exe")
```

The **parsed** object has a number of methods:

* `get_entry_point`: Return the entry point address
* `get_machine_as_str`: Return the machine as a human readable string
* `get_subsystem_as_str`: Return the subsystem as a human readable string
* `get_bytes`: Return the first N bytes at a given address
* `get_sections`: Return a list of section objects
* `get_imports`: Return a list of import objects
* `get_exports`: Return a list of export objects
* `get_relocations`: Return a list of relocation objects
* `get_resources`: Return a list of resource objects

The **parsed** object has a number of attributes:

* `signature`
* `machine`
* `numberofsections`
* `timedatestamp`
* `numberofsymbols`
* `characteristics`
* `magic`
* `majorlinkerver`
* `minorlinkerver`
* `codesize`
* `initdatasize`
* `uninitdatasize`
* `entrypointaddr`
* `baseofcode`
* `baseofdata`
* `imagebase`
* `sectionalignement`
* `filealignment`
* `majorosver`
* `minorosver`
* `win32ver`
* `imagesize`
* `headersize`
* `checksum`
* `subsystem`
* `dllcharacteristics`
* `stackreservesize`
* `stackcommitsize`
* `heapreservesize`
* `heapcommitsize`
* `loaderflags`
* `rvasandsize`

Example:

```python
import time
import pepy

p = pepy.parse("/path/to/exe")
print("Timedatestamp: %s" % time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(p.timedatestamp)))
ep = p.get_entry_point()
print("Entry point: 0x%x" % ep)
```

The `get_sections`, `get_imports`, `get_exports`, `get_relocations` and
`get_resources` methods each return a list of objects. The type of object
depends upon the method called. `get_sections` returns a list of `section`
objects, `get_imports` returns a list of `import` objects, etc.

### Section Object

The `section` object has the following attributes:

* `base`
* `length`
* `virtaddr`
* `virtsize`
* `numrelocs`
* `numlinenums`
* `characteristics`
* `data`

### Import Object

The `import` object has the following attributes:

* `sym`
* `name`
* `addr`

### Export Object

The `export` object has the following attributes:

* `mod`
* `func`
* `addr`

### Relocation Object

The `relocation` object has the following attributes:

* `type`
* `addr`

### Resource Object

The `resource` object has the following attributes:

* `type_str`
* `name_str`
* `lang_str`
* `type`
* `name`
* `lang`
* `codepage`
* `RVA`
* `size`
* `data`

The `resource` object has the following methods:

* `type_as_str`

Resources are stored in a directory structure. The first three levels of the
are called `type`, `name` and `lang`. Each of these levels can have
either a pre-defined value or a custom string. The pre-defined values are
stored in the `type`, `name` and `lang` attributes. If a custom string is
found it will be stored in the `type_str`, `name_str` and `lang_str`
attributes. The `type_as_str` method can be used to convert a pre-defined
type value to a string representation.

The following code shows how to iterate through resources:

```python
import pepy

from hashlib import md5
import sys

p = pepy.parse(sys.argv[1])
resources = p.get_resources()
print("Resources: (%i)" % len(resources))
for resource in resources:
    print("[+] MD5: (%i) %s" % (len(resource.data), md5(resource.data).hexdigest()))
    if resource.type_str:
        print("\tType string: %s" % resource.type_str)
    else:
        print("\tType: %s (%s)" % (hex(resource.type), resource.type_as_str()))
    if resource.name_str:
        print("\tName string: %s" % resource.name_str)
    else:
        print("\tName: %s" % hex(resource.name))
    if resource.lang_str:
        print("\tLang string: %s" % resource.lang_str)
    else:
        print("\tLang: %s" % hex(resource.lang))
    print("\tCodepage: %s" % hex(resource.codepage))
    print("\tRVA: %s" % hex(resource.RVA))
    print("\tSize: %s" % hex(resource.size))
```

Note that some binaries (particularly packed) may have corrupt resource entries.
In these cases you may find that `len(resource.data)` is 0 but `resource.size` is
greater than 0. The `size` attribute is the size of the data as declared by the
resource data entry.

## Authors

pe-parse was designed and implemented by Andrew Ruef (andrew@trailofbits.com).

pepy was written by Wesley Shields (wxs@atarininja.org).