Reven API Examples
This book gathers the examples of the Reven Python API.
These examples are distributed with Reven and can be found in the package.
Other scripts for the Reven Python API include ltrace and File activity that are available on GitHub.
Analyze examples
The examples in this section use the reven2
Python API to analyze a Reven scenario.
Search in memory
Purpose
Search the memory at a specific context for a string or for an array of bytes.
The memory range to search in is defined by a starting address and a search_size.
All unmapped addresses are ignored during the search.
How to use
usage: search_in_memory.py [-h] --host HOST -p PORT --transition TRANSITION
--address ADDRESS --pattern PATTERN
[--search-size SEARCH_SIZE] [--backward]
optional arguments:
-h, --help show this help message and exit
--host HOST Reven host, as a string (default: "localhost")
-p PORT, --port PORT Reven port, as an int (default: 13370)
--transition TRANSITION
transition id. the context before this id will be
searched
--address ADDRESS The start address of the memory area to search in. It
can be a hex offset as 0xfff123 (same as ds:0xfff123),
hex offset prefixed by segment register as
gs:0xfff123, hex offset prefixed by hex segment index
as 0x20:0xfff123, hex offset prefixed by 'lin' for
linear address, or offset prefixed by 'phy' for
physical address.
--pattern PATTERN pattern that will be searched. It can be a normal
string as 'test', or a string of bytes as
'\x01\x02\x03\x04'.Maximum accepted length is 4096
--search-size SEARCH_SIZE
The size of memory area to search in. accepted value
can take a suffix, like 1000, 10kb or 10mb.Default
value is 1000mb
--backward If present the search will go in backward direction.
Known limitations
-
Currently, this script cannot handle logical addresses that are not aligned on a memory page (4K) with their corresponding physical address. In 64 bits, this can happen mainly for the
gs
andfs
segment registers. If you encounter this limitation, you can manually translate your virtual address using itstranslate
method, and then restart the search on the resulting physical address (limiting the search range to 4K, so as to remain in the boundaries of the virtual page). -
Pattern length must be less than or equal to the page size (4k).
Supported versions
Reven 2.6+.
Supported perimeter
Any Reven scenario.
Dependencies
None.
Source
import argparse
import sys
from copy import copy
import reven2
import reven2.address as _address
import reven2.arch.x64 as x64_regs
"""
# Search in memory
## Purpose
Search the memory at a specific context for a string or for an array of bytes.
The memory range to search in is defined by a starting address and a search_size.
All unmapped addresses are ignored during the search.
## How to use
```bash
usage: search_in_memory.py [-h] --host HOST -p PORT --transition TRANSITION
--address ADDRESS --pattern PATTERN
[--search-size SEARCH_SIZE] [--backward]
optional arguments:
-h, --help show this help message and exit
--host HOST Reven host, as a string (default: "localhost")
-p PORT, --port PORT Reven port, as an int (default: 13370)
--transition TRANSITION
transition id. the context before this id will be
searched
--address ADDRESS The start address of the memory area to search in. It
can be a hex offset as 0xfff123 (same as ds:0xfff123),
hex offset prefixed by segment register as
gs:0xfff123, hex offset prefixed by hex segment index
as 0x20:0xfff123, hex offset prefixed by 'lin' for
linear address, or offset prefixed by 'phy' for
physical address.
--pattern PATTERN pattern that will be searched. It can be a normal
string as 'test', or a string of bytes as
'\x01\x02\x03\x04'.Maximum accepted length is 4096
--search-size SEARCH_SIZE
The size of memory area to search in. accepted value
can take a suffix, like 1000, 10kb or 10mb.Default
value is 1000mb
--backward If present the search will go in backward direction.
```
## Known limitations
- Currently, this script cannot handle logical addresses that are not aligned on a memory page (4K)
with their corresponding physical address. In 64 bits, this can happen mainly for
the `gs` and `fs` segment registers.
If you encounter this limitation, you can manually translate your virtual address
using its `translate` method, and then restart the search on the resulting physical address
(limiting the search range to 4K, so as to remain in the boundaries of the virtual page).
- Pattern length must be less than or equal to the page size (4k).
## Supported versions
Reven 2.6+.
## Supported perimeter
Any Reven scenario.
## Dependencies
None.
"""
class MemoryFinder(object):
r"""
This class is a helper class to search the memory at a specific context for a string or for an array of bytes.
The memory range to search in is defined by a starting address and a search_size.
The matching addresses are returned.
Known limitation
================
Currently, this class cannot handle logical addresses that are not aligned on a memory page (4K)
with their corresponding physical address. In 64 bits, this can happen mainly for
the `gs` and `fs` segment registers.
If you encounter this limitation, you can manually translate your virtual address
using its `translate` method, and then restart the search on the resulting physical address
(limiting the search range to 4K, so as to remain in the boundaries of the virtual page).
Pattern length must be less than or equal to page size (4k).
Examples
========
>>> # Search the first context starting from the address ds:0xfffff123123 for the string 'string'
>>> # Search_size default value is 1000MB.
>>> # Memory range to search in is: [ds:0xfffff123123, ds:0xfffff123123 + 1000MB]
>>> for address, progress in MemoryFinder(context, 0xfffff123123).query('string'):
... sys.stderr.write("progress: %d%s\r" % (int(progress / finder.search_size * 100), '%'))
... if address:
... print("found match at {}".format(address))
found match at ds:0xfffff123444
...
>>> # Search the first context starting from the address lin:0xfffff123123 for the
>>> # array of bytes '\\x35\\xfe\\x0e\\x4a'
>>> # Search size default value is 1000MB
>>> # Memory range to search in is: [lin:0xfffff123123, lin:0xfffff123123 + 1000MB]
>>> address = reven2.address.LinearAddress(0xfffff123123)
>>> for address, progress in MemoryFinder(context, address).query('\\x35\\xfe\\x0e\\x4a'):
... sys.stderr.write("progress: %d%s\r" % (int(progress / finder.search_size * 100), '%'))
... if address:
... print("found match at {}".format(address))
found match at ds:0xfffff125229
...
>>> # Search the first context starting from the address gs:0x180 for the string 'string'
>>> # Search size value is 100MB
>>> # Memory range to search in is: [gs:0x180, ds:0x180 + 100MB]
>>> address = reven2.address.LogicalAddress(0x180, reven2.arch.x64.gs)
>>> for address, progress in MemoryFinder(context, address, 100*1024*1024).query('string'):
... sys.stderr.write("progress: %d%s\r" % (int(progress / finder.search_size * 100), '%'))
... if address:
... print("found match at {}".format(address))
found match at ds:0xfffff123444
...
>>> # Search the first context starting from the address ds:0xfffff123123 for the string 'string'
>>> # in backward direction.
>>> # Search_size default value is 1000MB.
>>> # Memory range to search in is: [ds:0xfffff123123, ds:0xfffff123123 + 1000MB]
>>> for address, progress in MemoryFinder(context, 0xfffff123123).query('string', False):
... sys.stderr.write("progress: %d%s\r" % (int(progress / finder.search_size * 100), '%'))
... if address:
... print("found match at {}".format(address))
found match at ds:0xfffff123004
...
"""
page_size = 0x1000
progress_step = 0x10000
def __init__(self, context, address, search_size=1000 * 1024**2):
r"""
Initialize a C{MemoryFinder} from context and address
Information
===========
@param context: C{reven2.trace.Context} where searching will be done.
@param address: a class from C{reven2.address} the address where the search will be started.
@param search_size: an C{Integer} representing the size, in bytes, of the search range.
@raises TypeError: if context is not a C{reven2.trace.Context} or address is not a C{Integer} or
one of the address classes on C{reven2.address}.
@raises RunTimeError: If the address is a virtual address that is not aligned to its
corresponding physical address.
"""
if not isinstance(context, reven2.trace.Context):
raise TypeError("context must be an instance of reven2.trace.Context class")
self._context = context
search_addr = copy(address)
if not isinstance(search_addr, _address._AbstractAddress):
try:
# if address is of type int make it a logical address with ds as segment register
search_addr = _address.LogicalAddress(address)
except TypeError:
raise TypeError(
"address must be an instance of a class from reven2.address " "module or an integer value."
)
self._search_size = search_size
self._start = search_addr
@property
def search_size(self):
return self._search_size
def query(self, pattern, is_forward=True):
r"""
Iterate the search range looking for the specified pattern.
This method returns a generator of tuples, of the form C{(A, processed_bytes)}, such that:
- C{processed_bytes} indicates the number of bytes already processed in the search range.
- C{A} is either an address of the same type as the input address, or C{None}.
If an address is returned, it corresponds to an address matching the searched pattern.
C{None} is returned every 40KB of the search range, as a means of indicating progress.
Information
===========
@param pattern: A C{str} or C{bytearray}. The pattern to look for in memory.
Note: C{str} pattern is converted to bytearray using ascci encoding.
@param is_forward: C{bool}, C{True} to search in forward direction and C{False}
to search in backward direction
@returns: a generator of tuples, where the tuples are either:
- C{(None, processed_bytes)} every 40KB of the search range,
- C{(matching_address, processed_bytes)} each time a matching_address is found.
"""
# pattern is a byte array or a string
search_pattern = copy(pattern)
if not isinstance(search_pattern, bytearray):
if isinstance(search_pattern, str):
search_pattern = bytearray(str.encode(pattern))
else:
raise RuntimeError("Cannot parse pattern, bad format.")
if len(search_pattern) > self.page_size:
raise RuntimeError("Maximum length of pattern must be less than or equal to %d." % self.page_size)
return self._search(search_pattern, is_forward)
def _search(self, pattern, is_forward):
def loop_condition(curr, end):
return curr < end if is_forward else curr > end
cross_page_addition = len(pattern) - 1
iteration_step = self.page_size if is_forward else -self.page_size
curr = self._start
end = curr + self._search_size if is_forward else curr - self._search_size
prev = None
progress = 0
# first loop detects the first mapped address, then test if it aligned
# this step is only applied for logical address
if not isinstance(curr, reven2.address.PhysicalAddress):
while loop_condition(curr, end):
phy = curr.translate(self._context)
if phy is None:
curr += iteration_step
progress += self.page_size
if progress % self.progress_step == 0:
yield None, progress
continue
# linear -> physical alignment is guaranteed on 4k boundary:
# If linear is 0xxxxx123, physical will be 0xyyyy123
# logical -> linear alignment is not guaranteed because segment offset goes down to the byte
# (or at least down to less than 4k): logical gs:0x123 could be linear 0xzzzzz456
# Problem is: gs:0x0 might not be at start of page, 0x0:0x1000 might span on two pages
# instead of one. To solve: we need to translate logical -> physical for start address,
# and take note of offset to use that to compute actual start of page
# currently, we don't treat the case where logical -> linear alignment isn't valid.
if curr.offset % self.page_size != phy.offset % self.page_size:
raise RuntimeError(
"The provided address is not aligned on a memory page (4K)"
"with their corresponding physical address. Only aligned "
"addresses can be handled."
)
break
# second loop starts the search
while loop_condition(curr, end):
# get offset between current address and the start of the page
# This offset is zero except in the first iteration may be different to zero
offset = curr.offset % self.page_size
# compute the length of the buffer to read.
# This buffer length equals the page size except in the first iteration may be different
buffer_length = self.page_size if offset == 0 else (self.page_size - offset if is_forward else offset)
# the iteration step to go forward or backward
iteration_step = buffer_length if is_forward else -buffer_length
# compute the address to read it.
# in forward this address is the current address,
# in backward we have to read until the current address so it is current - buffer length
read_address = curr if is_forward else curr - buffer_length
# if the read buffer will exceed the search range adjust it
if is_forward and read_address + buffer_length > end:
buffer_length = end.offset - read_address.offset
elif not is_forward and read_address < end:
read_address = end
buffer_length = curr.offset - read_address.offset
try:
buffer = self._context.read(read_address, buffer_length, raw=True)
except Exception:
curr += iteration_step
progress += self.page_size
prev = None
if progress % self.progress_step == 0:
yield None, progress
continue
# Add necessary bytes from previous page to allow cross-page matches
addr_offset = 0
if prev is not None:
if is_forward:
prev_buf_len = -len(prev) if cross_page_addition > len(prev) else -cross_page_addition
buffer = prev[prev_buf_len:] + buffer if prev_buf_len < 0 else buffer
addr_offset = prev_buf_len
else:
prev_buf_len = len(prev) if cross_page_addition > len(prev) else cross_page_addition
buffer = buffer + prev[:prev_buf_len]
index = 0
addr_res = []
while True:
index = buffer.find(pattern, index)
if index == -1:
break
addr_res.append(read_address + index + addr_offset)
index += 1
for addr in addr_res if is_forward else reversed(addr_res):
yield addr, progress
progress += self.page_size
prev = buffer
curr += iteration_step
def parse_address(string_address):
segments = [x64_regs.ds, x64_regs.cs, x64_regs.es, x64_regs.ss, x64_regs.gs, x64_regs.fs]
def _str_to_seg(str_reg):
for segment in segments:
if str_reg == segment.name:
return segment
return None
try:
# Try to parse address as offset only as 0xfff123.
return _address.LogicalAddress(int(string_address, base=16))
except ValueError:
pass
# Try to parse address as prefex:offset as 0x32:0xfff123, gs:0xfff123, lin:0xfff123 or phy:0xff123.
res = string_address.split(":")
if len(res) != 2:
raise RuntimeError("Cannot parse address, bad format")
try:
offset = int(res[1].strip(), base=16)
except ValueError:
raise RuntimeError("Cannot parse address, bad format")
try:
# Try to parse it as 0x32:0xfff123.
segment_index = int(res[0].strip(), base=16)
return _address.LogicalAddressSegmentIndex(segment_index, offset)
except ValueError:
pass
lower_res0 = res[0].lower().strip()
# Try parse it as ds:0xfff123, cs::0xfff123, es::0xfff123, ss::0xfff123, gs::0xfff123 or fs::0xfff123.
sreg = _str_to_seg(lower_res0)
if sreg:
return _address.LogicalAddress(offset, sreg)
elif lower_res0 == "lin":
# Try parse it as lin:0xfff123.
return _address.LinearAddress(offset)
elif lower_res0 == "phy":
# Try parse it as phy:0xfff123.
return _address.PhysicalAddress(offset)
else:
raise RuntimeError("Cannot parse address, bad format")
def parse_search_size(string_size):
try:
# try to convert it to int
return int(string_size)
except ValueError:
pass
# try to convert it to int without the two last char
lower_string = string_size.lower()
ssize = lower_string[:-2]
try:
size = int(ssize)
except ValueError:
raise RuntimeError("Cannot parse search size, bad format")
# convert it according to its suffix
if lower_string.endswith("kb"):
return size * 1024
elif lower_string.endswith("mb"):
return size * 1024 * 1024
else:
raise RuntimeError("Cannot parse search size, bad format")
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
"--host", type=str, default="localhost", required=True, help='Reven host, as a string (default: "localhost")'
)
parser.add_argument(
"-p", "--port", type=int, default="13370", required=True, help="Reven port, as an int (default: 13370)"
)
parser.add_argument(
"--transition", type=int, required=True, help="transition id. the context before this id will be searched"
)
parser.add_argument(
"--address",
type=str,
required=True,
help="The start address of the memory area to search in. "
"It can be a hex offset as 0xfff123 (same as ds:0xfff123), "
"hex offset prefixed by segment register as gs:0xfff123, "
"hex offset prefixed by hex segment index as 0x20:0xfff123, "
"hex offset prefixed by 'lin' for linear address, "
"or offset prefixed by 'phy' for physical address.",
)
parser.add_argument(
"--pattern",
type=str,
required=True,
help="pattern that will be searched. "
"It can be a normal string as 'test', "
"or a string of bytes as '\\x01\\x02\\x03\\x04'."
"Maximum accepted length is 4096",
)
parser.add_argument(
"--search-size",
type=str,
default="1000mb",
help="The size of memory area to search in. "
"accepted value can take a suffix, like 1000, 10kb or 10mb."
"Default value is 1000mb",
)
parser.add_argument(
"--backward", default=False, action="store_true", help="If present the search will go in backward direction."
)
args = parser.parse_args()
try:
pattern = bytearray(map(ord, bytearray(map(ord, args.pattern.strip())).decode("unicode_escape")))
except Exception as e:
raise RuntimeError("Cannot parse pattern, bad format(%s)" % str(e))
address = parse_address(args.address.strip())
reven_server = reven2.RevenServer(args.host, args.port)
context = reven_server.trace.context_before(args.transition)
finder = MemoryFinder(context, address, parse_search_size(args.search_size.strip()))
for address, progress in finder.query(pattern, not args.backward):
sys.stderr.write("progress: %d%s\r" % (int(progress / finder.search_size * 100), "%"))
if address:
print("found match at {}".format(address))
Bookmarks to WinDbg breakpoints
Purpose
This notebook and script are designed to convert the bookmarks of a scenario to WinDbg breakpoints. The meat of the script uses the ability of the API to iterate on the bookmarks of a Reven scenario, as well as the OSSI location, to generate a list of breakpoint commands for WinDbg where the addresses are independent of the Reven scenario itself:
for bookmark in self._server.bookmarks.all():
location = bookmark.transition.context_before().ossi.location()
print(f"bp {location.binary.name}+{location.rva:#x}\r\n")
The output of the script is a list of WinDbg breakpoint commands corresponding to the relative virtual address of the location of each of the bookmarks. This list of command can either be copy-pasted in WinDbg or output to a file, which can then be executed in WinDbg using the following syntax:
$<breakpoints.txt
How to use
Bookmark can be converted from this notebook or from the command line. The script can also be imported as a module for use from your own script or notebook.
From the notebook
- Upload the
bk2bp.ipynb
file in Jupyter. - Fill out the parameters cell of this notebook according to your scenario and desired output.
- Run the full notebook.
From the command line
- Make sure that you are in an environment that can run Reven scripts.
- Run
python bk2bp.py --help
to get a tour of available arguments. - Run
python bk2bp.py --host <your_host> --port <your_port> [<other_option>]
with your arguments of choice.
Imported in your own script or notebook
- Make sure that you are in an environment that can run Reven scripts.
- Make sure that
bk2bp.py
is in the same directory as your script or notebook. - Add
import bk2bp
to your script or notebook. You can access the various functions and classes exposed by the module from thebk2bp
namespace. - Refer to the Argument parsing cell for an example of use in a script, and to the
Parameters cell and below for an example of use in a notebook (you just need to preprend
bk2bp
in front of the functions and classes from the script).
Known limitations
- For the breakpoints to be resolved by WinDbg, the debugged program/machine/Reven scenario needs to be in a state where the corresponding modules have been loaded. Otherwise, WinDbg will add the breakpoints in an unresolved state, and may mixup module and symbols.
- When importing breakpoints generated from the bookmarks of a scenario using this script in WinDbg, make sure that the debugged system is "similar enough" to the VM that was used to record the scenario. In particular, if a binary changed and has symbols at different offsets in the debugged system, importing the breakpoints will not lead to the correct location in the binary, and may render the debugged system unstable.
Supported versions
Reven 2.8+
Supported perimeter
Any Windows Reven scenario.
Dependencies
The script requires that the target Reven scenario have:
- The Fast Search feature replayed.
- The OSSI feature replayed.
Source
# ---
# jupyter:
# jupytext:
# formats: ipynb,py:percent
# text_representation:
# extension: .py
# format_name: percent
# kernelspec:
# display_name: reven
# language: python
# name: reven-python3
# ---
# %% [markdown]
# # Bookmarks to WinDbg breakpoints
#
# ## Purpose
#
# This notebook and script are designed to convert the bookmarks of a scenario to WinDbg breakpoints.
#
# The meat of the script uses the ability of the API to iterate on the bookmarks of a Reven scenario, as well as the
# OSSI location, to generate a list of breakpoint commands for WinDbg where the addresses are independent of the Reven
# scenario itself:
#
# ```py
# for bookmark in self._server.bookmarks.all():
# location = bookmark.transition.context_before().ossi.location()
# print(f"bp {location.binary.name}+{location.rva:#x}\r\n")
# ```
#
# The output of the script is a list of WinDbg breakpoint commands corresponding to the relative virtual address
# of the location of each of the bookmarks.
#
# This list of command can either be copy-pasted in WinDbg or output to a file, which can then be executed in WinDbg
# using the following syntax:
#
# ```kd
# $<breakpoints.txt
# ```
#
# ## How to use
#
# Bookmark can be converted from this notebook or from the command line.
# The script can also be imported as a module for use from your own script or notebook.
#
#
# ### From the notebook
#
# 1. Upload the `bk2bp.ipynb` file in Jupyter.
# 2. Fill out the [parameters](#Parameters) cell of this notebook according to your scenario and desired output.
# 3. Run the full notebook.
#
#
# ### From the command line
#
# 1. Make sure that you are in an environment that can run Reven scripts.
# 2. Run `python bk2bp.py --help` to get a tour of available arguments.
# 3. Run `python bk2bp.py --host <your_host> --port <your_port> [<other_option>]` with your arguments of
# choice.
#
# ### Imported in your own script or notebook
#
# 1. Make sure that you are in an environment that can run Reven scripts.
# 2. Make sure that `bk2bp.py` is in the same directory as your script or notebook.
# 3. Add `import bk2bp` to your script or notebook. You can access the various functions and classes
# exposed by the module from the `bk2bp` namespace.
# 4. Refer to the [Argument parsing](#Argument-parsing) cell for an example of use in a script, and to the
# [Parameters](#Parameters) cell and below for an example of use in a notebook (you just need to preprend
# `bk2bp` in front of the functions and classes from the script).
#
# ## Known limitations
#
# - For the breakpoints to be resolved by WinDbg, the debugged program/machine/Reven scenario needs to be in a state
# where the corresponding modules have been loaded. Otherwise, WinDbg will add the breakpoints in an unresolved state,
# and may mixup module and symbols.
#
# - When importing breakpoints generated from the bookmarks of a scenario using this script in WinDbg,
# make sure that the debugged system is "similar enough" to the VM that was used to record the scenario.
# In particular, if a binary changed and has symbols at different offsets in the debugged system, importing
# the breakpoints will not lead to the correct location in the binary, and may render the debugged system unstable.
#
# ## Supported versions
#
# Reven 2.8+
#
# ## Supported perimeter
#
# Any Windows Reven scenario.
#
# ## Dependencies
#
# The script requires that the target Reven scenario have:
#
# * The Fast Search feature replayed.
# * The OSSI feature replayed.
# %% [markdown]
# ### Package imports
# %%
import argparse
from typing import Optional
import reven2 # type: ignore
# %% [markdown]
# ### Utility functions
# %%
# Detect if we are currently running a Jupyter notebook.
#
# This is used e.g. to display rendered results inline in Jupyter when we are executing in the context of a Jupyter
# notebook, or to display raw results on the standard output when we are executing in the context of a script.
def in_notebook():
try:
from IPython import get_ipython # type: ignore
if get_ipython() is None or ("IPKernelApp" not in get_ipython().config):
return False
except ImportError:
return False
return True
# %% [markdown]
# ### Main function
# %%
def bk2bp(server: reven2.RevenServer, output: Optional[str]):
text = ""
for bookmark in server.bookmarks.all():
ossi = bookmark.transition.context_before().ossi
if ossi is None:
continue
location = ossi.location()
if location is None:
continue
if location.binary is None:
continue
if location.rva is None:
continue
name = location.binary.name
# WinDbg requires the precise name of the kernel, which is difficult to get.
# WinDbg seems to always accept "nt" as name for the kernel, so replace that.
if name == "ntoskrnl":
name = "nt"
text += f"bp {name}+{location.rva:#x}\r\n" # for windows it is safest to have the \r
if output is None:
print(text)
else:
try:
with open(output, "w") as f:
f.write(text)
except OSError as ose:
raise ValueError(f"Could not open file {output}: {ose}")
# %% [markdown]
# ### Argument parsing
#
# Argument parsing function for use in the script context.
# %%
def script_main():
parser = argparse.ArgumentParser(
description="Convert the bookmarks of a scenario to a WinDbg breakpoints commands.",
epilog="Requires the Fast Search and the OSSI features replayed.",
)
parser.add_argument(
"--host",
type=str,
default="localhost",
required=False,
help='Reven host, as a string (default: "localhost")',
)
parser.add_argument(
"-p",
"--port",
type=int,
default="13370",
required=False,
help="Reven port, as an int (default: 13370)",
)
parser.add_argument(
"-o",
"--output-file",
type=str,
required=False,
help="The target file of the script. If absent, the results will be printed on the standard output.",
)
args = parser.parse_args()
try:
server = reven2.RevenServer(args.host, args.port)
except RuntimeError:
raise RuntimeError(f"Could not connect to the server on {args.host}:{args.port}.")
bk2bp(server, args.output_file)
# %% [markdown]
# ### Parameters
#
# These parameters have to be filled out to use in the notebook context.
# %%
# Server connection
#
host = "localhost"
port = 13370
# Output target
#
# If set to a path, writes the breakpoint commands file there
output_file = None # display bp commands inline in the Jupyter Notebook
# output_file = "breakpoints.txt" # write bp commands to a file named "breakpoints.txt" in the current directory
# %% [markdown]
# ### Execution cell
#
# This cell executes according to the [parameters](#Parameters) when in notebook context, or according to the
# [parsed arguments](#Argument-parsing) when in script context.
#
# When in notebook context, if the `output` parameter is `None`, then the output will be displayed in the last cell of
# the notebook.
# %%
if __name__ == "__main__":
if in_notebook():
try:
server = reven2.RevenServer(host, port)
except RuntimeError:
raise RuntimeError(f"Could not connect to the server on {host}:{port}.")
bk2bp(server, output_file)
else:
script_main()
# %%
Percent
Purpose
Get the transition that performs the opposite
operation to the given transition.
The opposite operations are the following:
- The transition switches between user and kernel land.
Examples:
- a
syscall
transition => the relatedsysret
transition - a
sysret
transition => the relatedsyscall
transition - a exception transition => the related
iretq
transition - a
iretq
transition => the related exception transition
- a
- The transition does memory accesses:
- case 1: a unique access. The access is selected.
- case 1: multiple write accesses. The first one is selected.
- case 2: multiple read accesses. The first one is selected.
- case 3: multiple read and write accesses.
The first write access is selected.
This enable to get the matching
ret
transition on an indirect call transition e.g.call [rax + 10]
. If the selected access is a write then the next read access on the same memory is search for. If the selected access is a read then the previous write access on the same memory search for.
Examples, percent on:
- a
call
transition => the relatedret
transition. - a
ret
transition => the relatedcall
transition. - a
push
transition => the relatedpop
transition. - a
pop
transition => the relatedpush
transition.
If no related transition is found, None
is returned.
How to use
usage: percent.py [-h] [--host HOST] [-p PORT] transition
positional arguments:
transition Transition id, as an int
optional arguments:
-h, --help show this help message and exit
--host HOST Reven host, as a string (default: "localhost")
-p PORT, --port PORT Reven port, as an int (default: 13370)
Known limitations
percent
is an heuristic that sometimes doesn't produce the expected result.
Supported versions
Reven 2.2+. For Reven 2.5+, prefer to use the Transition.find_inverse
method.
Supported perimeter
Any Reven scenario.
Dependencies
The script requires that the target Reven scenario have the Memory History feature replayed.
Source
import argparse
import reven2
import reven_api
"""
# Percent
## Purpose
Get the transition that performs the `opposite` operation to the given transition.
The opposite operations are the following:
* The transition switches between user and kernel land.
Examples:
* a `syscall` transition => the related `sysret` transition
* a `sysret` transition => the related `syscall` transition
* a exception transition => the related `iretq` transition
* a `iretq` transition => the related exception transition
* The transition does memory accesses:
* case 1: a unique access.
The access is selected.
* case 1: multiple write accesses.
The first one is selected.
* case 2: multiple read accesses.
The first one is selected.
* case 3: multiple read and write accesses.
The first write access is selected.
This enable to get the matching `ret` transition
on an indirect call transition e.g. `call [rax + 10]`.
If the selected access is a write then the next read access
on the same memory is search for.
If the selected access is a read then the previous write access
on the same memory search for.
Examples, percent on:
* a `call` transition => the related `ret` transition.
* a `ret` transition => the related `call` transition.
* a `push` transition => the related `pop` transition.
* a `pop` transition => the related `push` transition.
If no related transition is found, `None` is returned.
## How to use
```bash
usage: percent.py [-h] [--host HOST] [-p PORT] transition
positional arguments:
transition Transition id, as an int
optional arguments:
-h, --help show this help message and exit
--host HOST Reven host, as a string (default: "localhost")
-p PORT, --port PORT Reven port, as an int (default: 13370)
```
## Known limitations
`percent` is an heuristic that sometimes doesn't produce the expected result.
## Supported versions
Reven 2.2+. For Reven 2.5+, prefer to use the `Transition.find_inverse` method.
## Supported perimeter
Any Reven scenario.
## Dependencies
The script requires that the target Reven scenario have the Memory History feature replayed.
"""
def previous_register_change(reven, register, from_transition):
"""
Get the previous transition where the register's value changed.
"""
range_size = 5000000
start = reven_api.execution_point(from_transition.id)
stop = reven_api.execution_point(max(from_transition.id - range_size, 0))
result = reven._rvn.run_search_next_register_use(
start, forward=False, read=False, write=True, register_name=register.name, stop=stop
)
while result == stop:
start = stop
stop = reven_api.execution_point(max(start.sequence_identifier - range_size, 0))
result = reven._rvn.run_search_next_register_use(
start, forward=False, read=False, write=True, register_name=register.name, stop=stop
)
if result.valid():
return reven.trace.transition(result.sequence_identifier)
return None
def next_register_change(reven, register, from_transition):
"""
Get the next transition where the register's value changed.
"""
range_size = 5000000
start = reven_api.execution_point(from_transition.id)
stop = reven_api.execution_point(from_transition.id + range_size)
result = reven._rvn.run_search_next_register_use(
start, forward=True, read=False, write=True, register_name=register.name, stop=stop
)
while result == stop:
start = stop
stop = reven_api.execution_point(start.sequence_identifier + range_size)
result = reven._rvn.run_search_next_register_use(
start, forward=True, read=False, write=True, register_name=register.name, stop=stop
)
if result.valid():
return reven.trace.transition(result.sequence_identifier)
return None
def previous_memory_use(reven, address, size, from_transition, operation=None):
"""
Get the previous transition where the memory range [address ; size] is used (read/write).
"""
try:
access = next(
reven.trace.memory_accesses(address, size, from_transition, is_forward=False, operation=operation)
)
return access.transition
except StopIteration:
return None
def next_memory_use(reven, address, size, from_transition, operation=None):
"""
Get the next transition where the memory range [address ; size] is used (read/write).
"""
try:
access = next(
reven.trace.memory_accesses(address, size, from_transition, is_forward=True, operation=operation)
)
return access.transition
except StopIteration:
return None
def percent(reven, transition):
"""
This function is a helper to get the transition that performs
the `opposite` operation to the given transition.
If no opposite transition is found, `None` is returned.
Opposite operations
===================
* The transition switches between user and kernel land.
Examples:
* a `syscall` transition => the related `sysret` transition
* a `sysret` transition => the related `syscall` transition
* a exception transition => the related `iretq` transition
* a `iretq` transition => the related exception transition
* The transition does memory accesses:
* case 1: a unique access.
The access is selected.
* case 1: multiple write accesses.
The first one is selected.
* case 2: multiple read accesses.
The first one is selected.
* case 3: multiple read and write accesses.
The first write access is selected.
This enable to get the matching `ret` transition
on an indirect call transition e.g. `call [rax + 10]`.
If the selected access is a write then the next read access
on the same memory is search for.
If the selected access is a read then the previous write access
on the same memory search for.
Examples, percent on:
* a `call` transition => the related `ret` transition.
* a `ret` transition => the related `call` transition.
* a `push` transition => the related `pop` transition.
* a `pop` transition => the related `push` transition.
Dependencies
============
The script requires that the target Reven scenario have the Memory History feature replayed.
Usage
=====
It can be combined with other features like backtrace to obtain interesting results.
For example, to jump to the end of the current function:
>>> import reven2
>>> from percent import percent
>>> reven_server = reven2.RevenServer('localhost', 13370)
>>> current_transition = reven_server.trace.transition(10000000)
>>> ret_transition = percent(reven_server,
... current_transition.context_before().stack.frames[0].creation_transition)
"""
ctx_b = transition.context_before()
ctx_a = transition.context_after()
# cs basic heuristic to handle sysenter/sysexit
cs_b = ctx_b.read(reven2.arch.x64.cs)
cs_a = ctx_a.read(reven2.arch.x64.cs)
if cs_b > cs_a:
# ss is modified by transition
return next_register_change(reven, reven2.arch.x64.cs, transition)
if cs_b < cs_a:
# ss is modified by transition
return previous_register_change(reven, reven2.arch.x64.cs, transition)
# memory heuristic
# first: check write accesses (get the first one)
# this is to avoid failure on indirect call (1 read access then 1 write access)
for access in transition.memory_accesses(operation=reven2.memhist.MemoryAccessOperation.Write):
if access.virtual_address is None:
# ignoring physical access
continue
return next_memory_use(
reven, access.virtual_address, access.size, transition, reven2.memhist.MemoryAccessOperation.Read
)
# second: check read accesses (get the first one)
for access in transition.memory_accesses(operation=reven2.memhist.MemoryAccessOperation.Read):
if access.virtual_address is None:
# ignoring physical access
continue
return previous_memory_use(
reven, access.virtual_address, access.size, transition, reven2.memhist.MemoryAccessOperation.Write
)
return None
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--host", type=str, default="localhost", help='Reven host, as a string (default: "localhost")')
parser.add_argument("-p", "--port", type=int, default="13370", help="Reven port, as an int (default: 13370)")
parser.add_argument("transition", type=int, help="Transition id, as an int")
args = parser.parse_args()
rvn = reven2.RevenServer(args.host, args.port)
transition = rvn.trace.transition(args.transition)
result = percent(rvn, transition)
if result is not None:
if result >= transition:
print("=> {}".format(transition))
print("<= {}".format(result))
else:
print("<= {}".format(transition))
print("=> {}".format(result))
else:
print("No result found for {}".format(transition))
Reporting
Examples in this section gather information to output synthetic reports.
Export bookmarks
Purpose
This notebook and script are designed to export the bookmarks of a scenario, for example for inclusion in a report. The meat of the script uses the ability of the API to iterate on the bookmarks of a Reven scenario:
for bookmark in self._server.bookmarks.all():
# do something with the bookmark.id, bookmark.transition and bookmark.description
See the Document class and in particular its add_bookmarks
function for details.
How to use
Bookmark can be exported from this notebook or from the command line. The script can also be imported as a package for use from your own script or notebook.
From the notebook
- Upload the
export_bookmarks.ipynb
file in Jupyter. - Fill out the parameters cell of this notebook according to your scenario and desired output.
- Run the full notebook.
From the command line
- Make sure that you are in an environment that can run Reven scripts.
- Run
python export_bookmarks.py --help
to get a tour of available arguments. - Run
python export_bookmarks.py --host <your_host> --port <your_port> [<other_option>]
with your arguments of choice.
Imported in your own script or notebook
- Make sure that you are in an environment that can run Reven scripts.
- Make sure that
export_bookmarks.py
is in the same directory as your script or notebook. - Add
import export_bookmarks
to your script or notebook. You can access the various functions and classes exposed byexport_bookmarks.py
from theexport_bookmarks
namespace. - Refer to the Argument parsing cell for an example of use in a script, and to the
Parameters cell and below for an example of use in a notebook (you just need to preprend
export_bookmarks
in front of the functions and classes from the script).
Customizing the notebook/script
To add a new format or change the output, you may want to:
- Modify the various enumeration types that control the output to add your new format or option.
- Modify the Formatter class to account for your new format.
- Modify the Document class to account for your new output control option.
Known limitations
N/A.
Supported versions
Reven 2.8+
Supported perimeter
Any Reven scenario.
Dependencies
None.
Source
# -*- coding: utf-8 -*-
# ---
# jupyter:
# jupytext:
# formats: ipynb,py:percent
# text_representation:
# extension: .py
# format_name: percent
# kernelspec:
# display_name: reven
# language: python
# name: reven-python3
# ---
# %% [markdown]
# # Export bookmarks
#
# ## Purpose
#
# This notebook and script are designed to export the bookmarks of a scenario, for example for inclusion in a report.
#
# The meat of the script uses the ability of the API to iterate on the bookmarks of a Reven scenario:
#
# ```py
# for bookmark in self._server.bookmarks.all():
# # do something with the bookmark.id, bookmark.transition and bookmark.description
# ```
#
# See the [Document](#Document) class and in particular its `add_bookmarks` function for details.
#
# ## How to use
#
# Bookmark can be exported from this notebook or from the command line.
# The script can also be imported as a package for use from your own script or notebook.
#
# ### From the notebook
#
# 1. Upload the `export_bookmarks.ipynb` file in Jupyter.
# 2. Fill out the [parameters](#Parameters) cell of this notebook according to your scenario and desired output.
# 3. Run the full notebook.
#
#
# ### From the command line
#
# 1. Make sure that you are in an environment that can run Reven scripts.
# 2. Run `python export_bookmarks.py --help` to get a tour of available arguments.
# 3. Run `python export_bookmarks.py --host <your_host> --port <your_port> [<other_option>]` with your arguments of
# choice.
#
# ### Imported in your own script or notebook
#
# 1. Make sure that you are in an environment that can run Reven scripts.
# 2. Make sure that `export_bookmarks.py` is in the same directory as your script or notebook.
# 3. Add `import export_bookmarks` to your script or notebook. You can access the various functions and classes
# exposed by `export_bookmarks.py` from the `export_bookmarks` namespace.
# 4. Refer to the [Argument parsing](#Argument-parsing) cell for an example of use in a script, and to the
# [Parameters](#Parameters) cell and below for an example of use in a notebook (you just need to preprend
# `export_bookmarks` in front of the functions and classes from the script).
#
# ## Customizing the notebook/script
#
# To add a new format or change the output, you may want to:
#
# - Modify the various [enumeration types](#Output-option-types) that control the output to add your new format or
# option.
# - Modify the [Formatter](#Formatter) class to account for your new format.
# - Modify the [Document](#Document) class to account for your new output control option.
#
#
# ## Known limitations
#
# N/A.
#
# ## Supported versions
#
# Reven 2.8+
#
# ## Supported perimeter
#
# Any Reven scenario.
#
# ## Dependencies
#
# None.
# %% [markdown]
# ### Package imports
# %%
import argparse # for argument parsing
import datetime # Date generation
import sys # printing to stderr
from enum import Enum
from html import escape as html_escape
from typing import Iterable, Optional
import reven2 # type: ignore
try:
# Jupyter rendering
from IPython.display import display, HTML, Markdown # type: ignore
except ImportError:
pass
# %% [markdown]
# ### Utility functions
# %%
# Detect if we are currently running a Jupyter notebook.
#
# This is used to display rendered results inline in Jupyter when we are executing in the context of a Jupyter
# notebook, or to display raw results on the standard output when we are executing in the context of a script.
def in_notebook():
try:
from IPython import get_ipython # type: ignore
if get_ipython() is None or ("IPKernelApp" not in get_ipython().config):
return False
except ImportError:
return False
return True
# %% [markdown]
# ### Output option types
#
# The enum types below are used to control the output of the script.
#
# Modify these enums to add more options if you want to add e.g. new output formats.
# %%
class HeaderOption(Enum):
NoHeader = 0
Simple = 1
class OutputFormat(Enum):
Raw = 0
Markdown = 1
Html = 2
class SortOrder(Enum):
Transition = 0
Creation = 1
# %% [markdown]
# ### Formatter
#
# This is the rendering boilerplate.
#
# Modify this if you e.g. need to add new output formats.
# %%
class Formatter:
def __init__(
self,
format: OutputFormat,
):
self._format = format
def header(self, title: str) -> str:
if self._format == OutputFormat.Html:
return f"<h1>{title}</h1>"
elif self._format == OutputFormat.Markdown:
return f"# {title}\n\n"
elif self._format == OutputFormat.Raw:
return f"{title}\n\n"
raise NotImplementedError(f"'header' with {self._format}")
def paragraph(self, paragraph: str) -> str:
if self._format == OutputFormat.Html:
return f"<p>{paragraph}</p>"
elif self._format == OutputFormat.Markdown:
return f"\n\n{paragraph}\n\n"
elif self._format == OutputFormat.Raw:
return f"\n{paragraph}\n"
raise NotImplementedError(f"'paragraph' with {self._format}")
def horizontal_ruler(self) -> str:
if self._format == OutputFormat.Html:
return "<hr/>"
elif self._format == OutputFormat.Markdown:
return "\n---\n"
elif self._format == OutputFormat.Raw:
return "\n---\n"
raise NotImplementedError(f"'horizontal_ruler' with {self._format}")
def transition(self, transition: reven2.trace.Transition) -> str:
if transition.instruction is not None:
tr_desc = str(transition.instruction)
else:
tr_desc = str(transition.exception)
if self._format == OutputFormat.Html:
if in_notebook():
tr_id = f"{transition.format_as_html()}"
else:
tr_id = f"#{transition.id} "
return f"{tr_id} <code>{tr_desc}</code>"
elif self._format == OutputFormat.Markdown:
return f"`#{transition.id}` `{tr_desc}`"
elif self._format == OutputFormat.Raw:
return f"#{transition.id}\t{tr_desc}"
raise NotImplementedError(f"'transition' with {self._format}")
def newline(self) -> str:
if self._format == OutputFormat.Html:
return "<br/>"
elif self._format == OutputFormat.Markdown:
return " \n" # EOL spaces to have a newline in markdown
elif self._format == OutputFormat.Raw:
return "\n"
raise NotImplementedError(f"'newline' with {self._format}")
def paragraph_begin(self) -> str:
if self._format == OutputFormat.Html:
return "<p>"
elif self._format == OutputFormat.Markdown:
return "\n\n"
elif self._format == OutputFormat.Raw:
return "\n"
raise NotImplementedError(f"'paragraph_begin' with {self._format}")
def paragraph_end(self) -> str:
if self._format == OutputFormat.Html:
return "</p>"
elif self._format == OutputFormat.Markdown:
return "\n\n"
elif self._format == OutputFormat.Raw:
return "\n"
raise NotImplementedError(f"'paragraph_end' with {self._format}")
def important(self, important: str) -> str:
if self._format == OutputFormat.Html:
return f"<strong>{important}</strong>"
elif self._format == OutputFormat.Markdown:
return f"**{important}**"
elif self._format == OutputFormat.Raw:
return f"{important} <- HERE"
raise NotImplementedError(f"'important' with {self._format}")
def warning(self, warning: str) -> str:
if self._format == OutputFormat.Html:
return f'<div class="alert alert-warning"><strong>Warning:</strong> {warning}</div>'
elif self._format == OutputFormat.Markdown:
return f"**Warning: {warning}**"
elif self._format == OutputFormat.Raw:
return f"WARNING: {warning}"
raise NotImplementedError(f"'warning' with {self._format}")
def code(self, code: str) -> str:
if self._format == OutputFormat.Html:
return f"<code>{code}</code>"
elif self._format == OutputFormat.Markdown:
return f"`{code}`"
elif self._format == OutputFormat.Raw:
return f"{code}"
raise NotImplementedError(f"'code' with {self._format}")
def render_error(self, text):
if text == "":
return
if in_notebook():
if self._format == OutputFormat.Html:
display(HTML(text))
elif self._format == OutputFormat.Markdown:
display(Markdown(text))
elif self._format == OutputFormat.Raw:
display(text)
else:
raise NotImplementedError(f"inline error rendering with {self._format}")
else:
print(text, file=sys.stderr)
def render(self, text, output):
if text == "":
return
if output is None:
if in_notebook():
if self._format == OutputFormat.Html:
display(HTML(text))
elif self._format == OutputFormat.Markdown:
display(Markdown(text))
elif self._format == OutputFormat.Raw:
display(text)
else:
raise NotImplementedError(f"inline rendering with {self._format}")
else:
print(text)
else:
try:
with open(output, "w") as f:
f.write(text)
except OSError as ose:
raise ValueError(f"Could not open file {output}: {ose}")
# %% [markdown]
# ### Document
#
# This is the main logic of the script.
# %%
class Document:
def __init__(
self,
server: reven2.RevenServer,
sort: SortOrder,
context: Optional[int],
header: HeaderOption,
format: OutputFormat,
output: Optional[str],
escape_description: bool,
):
self._text = ""
self._warning = ""
self._server = server
if context is None:
self._context = 0
else:
self._context = context
self._header_opt = header
self._escape_description = escape_description
self._output = output
self._sort = sort
self._formatter = Formatter(format)
def add_bookmarks(self):
if self._sort == SortOrder.Creation:
for bookmark in sorted(self._server.bookmarks.all(), key=lambda bookmark: bookmark.id):
self.add_bookmark(bookmark)
else:
for bookmark in sorted(self._server.bookmarks.all(), key=lambda bookmark: bookmark.transition):
self.add_bookmark(bookmark)
def add_bookmark(self, bookmark: reven2.bookmark.Bookmark):
self._text += self._formatter.paragraph_begin()
self.add_bookmark_header(bookmark)
self.add_location(bookmark.transition)
if bookmark.transition.id < self._context:
first_transition = self._server.trace.first_transition
else:
first_transition = bookmark.transition - self._context
self.add_transitions(
transition for transition in self._server.trace.transitions(first_transition, bookmark.transition)
)
self.add_bookmark_transition(bookmark.transition)
# Catch possible transitions that would out of the trace due to the value of context
if bookmark.transition != self._server.trace.last_transition:
if bookmark.transition.id + self._context > self._server.trace.last_transition.id:
last_transition = self._server.trace.last_transition
else:
last_transition = bookmark.transition + 1 + self._context
self.add_transitions(
transition for transition in self._server.trace.transitions(bookmark.transition + 1, last_transition)
)
self._text += self._formatter.paragraph_end()
self._text += self._formatter.horizontal_ruler()
def add_header(self):
if self._header_opt == HeaderOption.NoHeader:
return
elif self._header_opt == HeaderOption.Simple:
scenario_name = self._server.scenario_name
self._text += self._formatter.header(f"Bookmarks for scenario {scenario_name}")
date = datetime.datetime.now()
self._text += self._formatter.paragraph(f"Generated on {str(date)}")
self._text += self._formatter.horizontal_ruler()
def add_transitions(self, transitions: Iterable[reven2.trace.Transition]):
for transition in transitions:
self._text += self._formatter.transition(transition)
self._text += self._formatter.newline()
def add_bookmark_transition(self, transition: reven2.trace.Transition):
tr_format = self._formatter.transition(transition)
alone = self._context == 0
self._text += self._formatter.important(tr_format) if not alone else tr_format
self._text += self._formatter.newline()
def add_bookmark_header(self, bookmark: reven2.bookmark.Bookmark):
if self._escape_description:
bookmark_description = html_escape(bookmark.description)
else:
bookmark_description = bookmark.description
self._text += f"{bookmark_description}"
self._text += self._formatter.newline()
def add_location(self, transition: reven2.trace.Transition):
ossi = transition.context_before().ossi
try:
if ossi and ossi.location():
location = self._formatter.code(html_escape(str(ossi.location())))
self._text += self._formatter.paragraph(f"Location: {location}")
except RuntimeError:
pass
def add_warnings(self):
ossi = self._server.trace.first_context.ossi
try:
if ossi and ossi.location():
pass
except RuntimeError:
self._warning += self._formatter.warning("OSSI not replayed, locations not available in bookmarks.")
def render(self):
self._formatter.render_error(self._warning)
self._formatter.render(self._text, self._output)
# %% [markdown]
# ### Main function
#
# This function is called with parameters from the [Parameters](#Parameters) cell in the notebook context,
# or with parameters from the command line in the script context.
# %%
def export_bookmarks(
server: reven2.RevenServer,
sort: SortOrder,
context: Optional[int],
header: HeaderOption,
format: OutputFormat,
escape_description: bool,
suppress_warnings: bool,
output: Optional[str],
):
document = Document(
server,
sort=sort,
context=context,
header=header,
format=format,
output=output,
escape_description=escape_description,
)
if not suppress_warnings:
document.add_warnings()
document.add_header()
document.add_bookmarks()
document.render()
# %% [markdown]
# ### Argument parsing
#
# Argument parsing function for use in the script context.
# %%
def get_sort(sort: str) -> SortOrder:
if sort.lower() == "transition":
return SortOrder.Transition
if sort.lower() in ["creation", "id"]:
return SortOrder.Creation
raise ValueError(f"'order' value should be 'transition' or 'creation'. Received '{sort}'.")
def get_header(header: str) -> HeaderOption:
if header.lower() == "no":
return HeaderOption.NoHeader
elif header.lower() == "simple":
return HeaderOption.Simple
raise ValueError(f"'header' value should be 'no' or 'simple'. Received '{header}'.")
def get_format(format: str) -> OutputFormat:
if format.lower() == "html":
return OutputFormat.Html
elif format.lower() == "md" or format.lower() == "markdown":
return OutputFormat.Markdown
elif format.lower() == "raw" or format.lower() == "text":
return OutputFormat.Raw
raise ValueError("'format' value should be one of 'html', 'md' or 'raw'. Received '{format}'.")
def script_main():
parser = argparse.ArgumentParser(description="Export the bookmarks of a scenario to a report.")
parser.add_argument(
"--host",
type=str,
default="localhost",
required=False,
help='Reven host, as a string (default: "localhost")',
)
parser.add_argument(
"-p",
"--port",
type=int,
default="13370",
required=False,
help="Reven port, as an int (default: 13370)",
)
parser.add_argument(
"-C",
"--context",
type=int,
required=False,
help="Print CONTEXT lines of surrounding context around the bookmark's instruction",
)
parser.add_argument(
"--header",
type=str,
default="no",
required=False,
choices=["no", "simple"],
help="Whether to preprend the output with a header or not (default: no)",
)
parser.add_argument(
"--format",
type=str,
default="html",
required=False,
choices=["html", "md", "raw"],
help="The output format (default: html).",
)
parser.add_argument(
"--order",
type=str,
default="transition",
choices=["transition", "creation"],
required=False,
help="The sort order of bookmarks in the report (default: transition).",
)
parser.add_argument(
"--no-escape-description",
action="store_true",
default=False,
required=False,
help="If present, don't escape the HTML in the bookmark descriptions.",
)
parser.add_argument(
"--suppress-warnings",
action="store_true",
default=False,
required=False,
help="If present, don't print warnings to the standard error output.",
)
parser.add_argument(
"-o",
"--output-file",
type=str,
required=False,
help="The target file of the report. If absent, the report will be printed on the standard output.",
)
args = parser.parse_args()
try:
server = reven2.RevenServer(args.host, args.port)
except RuntimeError:
raise RuntimeError(f"Could not connect to the server on {args.host}:{args.port}.")
sort = get_sort(args.order)
header = get_header(args.header)
format = get_format(args.format)
export_bookmarks(
server,
sort,
args.context,
header,
format,
escape_description=(not args.no_escape_description),
suppress_warnings=args.suppress_warnings,
output=args.output_file,
)
# %% [markdown]
# ## Parameters
#
# These parameters have to be filled out to use in the notebook context.
# %%
# Server connection
#
host = "localhost"
port = 37103
# Output target
#
# If set to a path, writes the report file there
output_file = None # display report inline in the Jupyter Notebook
# output_file = "report.html" # export report to a file named "report.html" in the current directory
# Output control
#
# Sort order of bookmarks
order = SortOrder.Transition # Bookmarks will be displayed in increasing transition number.
# order = SortOrder.Creation # Bookmarks will be displayed in their order of creation.
# Number of transitions to display around the transition of each bookmark
context = 0 # Only display the bookmark transition
# context = 3 # Displays 3 lines above and 3 lines below the bookmark transition
# Whether to prepend a header at the top of the report
header = HeaderOption.Simple # Display a simple header with the scenario name and generation date
# header = HeaderOption.NoHeader # Don't display any header
# The format of the report.
# When the output target is set to a file, this specifies the format of that file.
# When the output target is `None` (report rendered inline), the difference between HTML and Markdown
# mostly influences how the description of the bookmarks is interpreted.
format = OutputFormat.Html # Bookmark description and output file rendered as HTML
# format = export_bookmarks.OutputFormat.Markdown # Bookmark description and output file rendered as Markdown
# format = export_bookmarks.OutputFormat.Raw # Everything rendered as raw text
# Whether to escape HTML in the description of bookmarks.
escape_description = False # HTML will not be escaped in description
# escape_description = True # HTML will be escaped in description
# Whether or not to suppress the warnings that can be displayed (e.g. in case of missing OSSI)
suppress_warnings = False # Display warnings at the top of the report
# suppress_warnings = True # Don't display warnings at the top of the report
# %% [markdown]
# ### Execution cell
#
# This cell executes according to the [parameters](#Parameters) when in notebook context, or according to the
# [parsed arguments](#Argument-parsing) when in script context.
#
# When in notebook context, if the `output` parameter is `None`, then the report will be displayed in the last cell of
# the notebook.
# %%
if __name__ == "__main__":
if in_notebook():
try:
server = reven2.RevenServer(host, port)
except RuntimeError:
raise RuntimeError(f"Could not connect to the server on {host}:{port}.")
export_bookmarks(server, order, context, header, format, escape_description, suppress_warnings, output_file)
else:
script_main()
Thread synchronization
Purpose
Trace calls to Windows Synchronization APIs
-
Critical sections: 'InitializeCriticalSection', 'InitializeCriticalSectionEx', 'EnterCriticalSection', 'LeaveCriticalSection', 'SleepConditionVariableCS', 'SleepConditionVariableCSRW'
-
Slim reader/writer locks: 'RtlInitializeSRWLock', 'RtlAcquireSRWLockShared', 'RtlAcquireSRWLockExclusive', 'RtlReleaseSRWLockShared', 'RtlReleaseSRWLockExclusive'
-
Mutex: 'CreateMutexW', 'OpenMutexW', 'WaitForSingleObject',
ReleaseMutex
-
Condition variables: 'InitializeConditionVariable', 'WakeConditionVariable', 'WakeAllConditionVariable', 'RtlWakeConditionVariable', 'RtlWakeAllConditionVariable'
How to use
usage: threadsync.py [-h] [--host HOST] [-p PORT] --pid PID
[--from_id FROM_ID] [--to_id TO_ID] [--binary BINARY]
[--sync PRIMITIVE]
optional arguments:
-h, --help show this help message and exit
--host HOST Reven host, as a string (default: "localhost")
-p PORT, --port PORT Reven port, as an int (default: 13370)
--pid PID Process id
--from_id FROM_ID Transition to start searching from
--to_id TO_ID Transition to start searching to
--binary BINARY Process binary
--sync PRIMITIVE Synchronization primitives to check.
Each repeated use of `--sync <primitive>` adds the corresponding primitive to check.
If no `--sync` option is passed, then all supported primitives are checked.
Supported primitives:
- cs: critical section
- cv: condition variable
- mx: mutex
- srw: slim read/write lock
Known limitations
N/A
Supported versions
Reven 2.12+
Supported perimeter
Any Windows 10/11 on x86-64 scenario.
Dependencies
- OSSI (with symbols
ntdll.dll
,kernelbase.dll
,ntoskrnl.exe
resolved) feature replayed. - Fast Search feature replayed.
Source
import argparse
import reven2
from reven2.address import LogicalAddress
from reven2.arch import x64
from reven2.util import collate
"""
# Thread synchronization
## Purpose
Trace calls to Windows Synchronization APIs
- Critical sections: 'InitializeCriticalSection', 'InitializeCriticalSectionEx',
'EnterCriticalSection', 'LeaveCriticalSection',
'SleepConditionVariableCS', 'SleepConditionVariableCSRW'
- Slim reader/writer locks: 'RtlInitializeSRWLock',
'RtlAcquireSRWLockShared', 'RtlAcquireSRWLockExclusive',
'RtlReleaseSRWLockShared', 'RtlReleaseSRWLockExclusive'
- Mutex: 'CreateMutexW', 'OpenMutexW', 'WaitForSingleObject', `ReleaseMutex`
- Condition variables: 'InitializeConditionVariable',
'WakeConditionVariable', 'WakeAllConditionVariable',
'RtlWakeConditionVariable', 'RtlWakeAllConditionVariable'
## How to use
```bash
usage: threadsync.py [-h] [--host HOST] [-p PORT] --pid PID
[--from_id FROM_ID] [--to_id TO_ID] [--binary BINARY]
[--sync PRIMITIVE]
optional arguments:
-h, --help show this help message and exit
--host HOST Reven host, as a string (default: "localhost")
-p PORT, --port PORT Reven port, as an int (default: 13370)
--pid PID Process id
--from_id FROM_ID Transition to start searching from
--to_id TO_ID Transition to start searching to
--binary BINARY Process binary
--sync PRIMITIVE Synchronization primitives to check.
Each repeated use of `--sync <primitive>` adds the corresponding primitive to check.
If no `--sync` option is passed, then all supported primitives are checked.
Supported primitives:
- cs: critical section
- cv: condition variable
- mx: mutex
- srw: slim read/write lock
```
## Known limitations
N/A
## Supported versions
Reven 2.12+
## Supported perimeter
Any Windows 10/11 on x86-64 scenario.
## Dependencies
- OSSI (with symbols `ntdll.dll`, `kernelbase.dll`, `ntoskrnl.exe` resolved) feature replayed.
- Fast Search feature replayed.
"""
class SyncOSSI(object):
def __init__(self, ossi, trace):
# basic OSSI
bin_symbol_names = {
"c:/windows/system32/ntdll.dll": {
# critical section
"RtlInitializeCriticalSection",
"RtlInitializeCriticalSectionEx",
"RtlEnterCriticalSection",
"RtlLeaveCriticalSection",
# slim read/write lock
"RtlInitializeSRWLock",
"RtlAcquireSRWLockShared",
"RtlAcquireSRWLockExclusive",
"RtlReleaseSRWLockShared",
"RtlReleaseSRWLockExclusive",
# condition variable (general)
"RtlWakeConditionVariable",
"RtlWakeAllConditionVariable",
},
"c:/windows/system32/kernelbase.dll": {
# mutex
"CreateMutexW",
"OpenMutexW",
"ReleaseMutex",
"WaitForSingleObject",
# condition variable (general)
"InitializeConditionVariable",
"WakeConditionVariable",
"WakeAllConditionVariable",
# condition variable on critical section
"SleepConditionVariableCS",
# condition variable on slim read/write lock
"SleepConditionVariableSRW",
},
}
self.symbols = {}
for bin, symbol_names in bin_symbol_names.items():
try:
exec_bin = next(ossi.executed_binaries(f"^{bin}$"))
except StopIteration:
raise RuntimeError(f"{bin} not found")
for name in symbol_names:
try:
sym = next(exec_bin.symbols(f"^{name}$"))
self.symbols[name] = sym
except StopIteration:
print(f"Warning: {name} not found in {bin}")
self.has_debugger = True
try:
trace.first_transition.step_over()
except RuntimeError:
print(
"Warning: the debugger interface is not available, so the script cannot determine \
function return values.\nMake sure the stack events and PC range resources are replayed for this scenario."
)
self.has_debugger = False
self.search_symbol = trace.search.symbol
self.trace = trace
# tool functions
def context_cr3(ctxt):
return ctxt.read(x64.cr3)
def is_kernel_mode(ctxt):
return ctxt.read(x64.cs) & 0x3 == 0
def find_process_ranges(rvn, pid, from_id, to_id):
"""
Traversing over the trace, looking for ranges of the interested process
Parameters:
- rvn: RevenServer
- pid: int (process id)
Output: yielding ranges
"""
try:
ntoskrnl = next(rvn.ossi.executed_binaries("^c:/windows/system32/ntoskrnl.exe$"))
except StopIteration:
raise RuntimeError("ntoskrnl.exe not found")
try:
ki_swap_context = next(ntoskrnl.symbols("^KiSwapContext$"))
except StopIteration:
raise RuntimeError("KiSwapContext not found")
if from_id is None:
ctxt_low = rvn.trace.first_context
else:
try:
ctxt_low = rvn.trace.transition(from_id).context_before()
except IndexError:
raise RuntimeError(f"Transition of id {from_id} not found")
if to_id is None:
ctxt_hi = None
else:
try:
ctxt_hi = rvn.trace.transition(to_id).context_before()
except IndexError:
raise RuntimeError(f"Transition of id {to_id} not found")
if ctxt_hi is not None and ctxt_low >= ctxt_hi:
return
for ctxt in rvn.trace.search.symbol(
ki_swap_context,
from_context=ctxt_low,
to_context=None if ctxt_hi is None or ctxt_hi == rvn.trace.last_context else ctxt_hi,
):
if ctxt_low.ossi.process().pid == pid:
yield (ctxt_low, ctxt)
ctxt_low = ctxt
if ctxt_low.ossi.process().pid == pid:
if ctxt_hi is not None and ctxt_low < ctxt_hi:
yield (ctxt_low, ctxt_hi)
else:
yield (ctxt_low, None)
def find_usermode_ranges(ctxt_low, ctxt_high):
if ctxt_high is None:
if not is_kernel_mode(ctxt_low):
yield (ctxt_low, None)
return
ctxt_current = ctxt_low
while ctxt_current < ctxt_high:
ctxt_next = ctxt_current.find_register_change(x64.cs, is_forward=True)
if not is_kernel_mode(ctxt_current):
if ctxt_next is None or ctxt_next > ctxt_high:
yield (ctxt_current, ctxt_high)
break
else:
yield (ctxt_current, ctxt_next - 1)
if ctxt_next is None:
break
ctxt_current = ctxt_next
def get_tid(ctxt):
return ctxt.read(LogicalAddress(0x48, x64.gs), 4)
def caller_context_in_binary(ctxt, binary):
if binary is None:
return True
caller_ctxt = ctxt - 1
if caller_ctxt is None:
return False
caller_binary = caller_ctxt.ossi.location().binary
if caller_binary is None:
return False
return binary in [caller_binary.name, caller_binary.filename, caller_binary.path]
def build_ordered_api_calls(syncOSSI, ctxt_low, ctxt_high, apis):
def gen(api):
return (
(api, ctxt)
for ctxt in syncOSSI.search_symbol(syncOSSI.symbols[api], from_context=ctxt_low, to_context=ctxt_high)
)
api_contexts = (
# Using directly a generator expression appears to give wrong results,
# while using a function works as expected.
gen(api)
for api in apis
if api in syncOSSI.symbols
)
return collate(api_contexts, key=lambda name_ctxt: name_ctxt[1])
def get_return_value(syncOSSI, ctxt):
if not syncOSSI.has_debugger:
return False
try:
trans_after = ctxt.transition_after()
except IndexError:
return None
if trans_after is None:
return None
trans_ret = trans_after.step_out()
if trans_ret is None:
return None
ctxt_ret = trans_ret.context_after()
return ctxt_ret.read(x64.rax)
# Values of primitive_handle and return_value fall into one of the following:
# - None
# - a numeric value
# - a string literal
# where None is used usually in cases the script cannot reliably get the actual
# value of the object.
def print_csv(csv_file, ctxt, sync_primitive, primitive_handle, return_value):
try:
transition_id = ctxt.transition_before().id + 1
except IndexError:
transition_id = 0
if primitive_handle is None:
formatted_primitive_handle = "unknown"
else:
formatted_primitive_handle = (
primitive_handle if isinstance(primitive_handle, str) else f"{primitive_handle:#x}"
)
if return_value is None:
formatted_ret_value = "unknown"
else:
formatted_ret_value = return_value if isinstance(return_value, str) else f"{return_value:#x}"
output = f"{transition_id}, {sync_primitive}, {formatted_primitive_handle}, {formatted_ret_value}\n"
try:
csv_file.write(output)
except OSError:
print(f"Failed to write to {csv_file}")
def check_critical_section(syncOSSI, ctxt_low, ctxt_high, binary, csv_file):
critical_section_apis = {
"RtlInitializeCriticalSection",
"RtlInitializeCriticalSectionEx",
"RtlEnterCriticalSection",
"RtlLeaveCriticalSection",
"SleepConditionVariableCS",
}
found = False
ordered_api_contexts = build_ordered_api_calls(syncOSSI, ctxt_low, ctxt_high, critical_section_apis)
for api, ctxt in ordered_api_contexts:
if not caller_context_in_binary(ctxt, binary):
continue
found = True
if api in {
"RtlInitializeCriticalSection",
"RtlInitializeCriticalSectionEx",
"RtlEnterCriticalSection",
"RtlLeaveCriticalSection",
}:
# the critical section handle is the first argument
cs_handle = ctxt.read(x64.rcx)
if csv_file is None:
print(f"{ctxt}: {api}, critical section 0x{cs_handle:x}")
else:
# any API of this group returns void, then "unused" is passed as the return value
print_csv(csv_file, ctxt, "cs", cs_handle, "unused")
elif api in {"SleepConditionVariableCS"}:
# the condition variable is the first argument
cv_handle = ctxt.read(x64.rcx)
# the critical section is the second argument
cs_handle = ctxt.read(x64.rcx)
if csv_file is None:
print(f"{ctxt}: {api}, critical section 0x{cs_handle:x}, condition variable 0x{cv_handle:x}")
else:
# go to the return point (if possible) to get the return value
ret_val = get_return_value(syncOSSI, ctxt)
print_csv(csv_file, ctxt, "cs", cs_handle, ret_val)
print_csv(csv_file, ctxt, "cv", cv_handle, ret_val)
return found
def check_srw_lock(syncOSSI, ctxt_low, ctxt_high, binary, csv_file):
srw_apis = [
"RtlInitializeSRWLock",
"RtlAcquireSRWLockShared",
"RtlAcquireSRWLockExclusive",
"RtlReleaseSRWLockShared",
"RtlReleaseSRWLockExclusive",
]
found = False
ordered_api_contexts = build_ordered_api_calls(syncOSSI, ctxt_low, ctxt_high, srw_apis)
for api, ctxt in ordered_api_contexts:
if not caller_context_in_binary(ctxt, binary):
continue
found = True
# the srw lock handle is the first argument
srw_handle = ctxt.read(x64.rcx)
if csv_file is None:
print(f"{ctxt}: {api}, lock 0x{srw_handle:x}")
else:
# any API of this group returns void, then "unused" is passed as the return value
print_csv(csv_file, ctxt, "srw", srw_handle, "unused")
return found
def check_mutex(syncOSSI, ctxt_low, ctxt_high, binary, used_handles, csv_file):
mutex_apis = {"CreateMutexW", "OpenMutexW", "WaitForSingleObject", "ReleaseMutex"}
found = False
ordered_api_contexts = build_ordered_api_calls(syncOSSI, ctxt_low, ctxt_high, mutex_apis)
for api, ctxt in ordered_api_contexts:
if not caller_context_in_binary(ctxt, binary):
continue
found = True
if api in {"CreateMutexW", "OpenMutexW"}:
# go to the return point (if possible) to get the mutex handle
mx_handle = get_return_value(syncOSSI, ctxt)
if mx_handle is not None and mx_handle != 0:
used_handles.add(mx_handle)
if csv_file is None:
if mx_handle is None:
print(f"{ctxt}: {api}, mutex handle unknown")
else:
if mx_handle == 0:
print(f"{ctxt}: {api}, failed")
else:
print(f"{ctxt}: {api}, mutex handle 0x{mx_handle:x}")
else:
print_csv(csv_file, ctxt, "mx", mx_handle, mx_handle)
elif api in {"ReleaseMutex"}:
mx_handle = ctxt.read(x64.rcx)
used_handles.add(mx_handle)
if csv_file is None:
print(f"{ctxt}: {api}, mutex handle 0x{mx_handle:x}")
else:
# go to the return point (if possible) to get the return value
ret_val = get_return_value(syncOSSI, ctxt)
print_csv(csv_file, ctxt, "mx", mx_handle, ret_val)
elif api in {"WaitForSingleObject"}:
handle = ctxt.read(x64.rcx)
if handle in used_handles:
if csv_file is None:
print(f"{ctxt}: {api}, mutex handle 0x{handle:x}")
else:
ret_val = get_return_value(syncOSSI, ctxt)
print_csv(csv_file, ctxt, "mx", handle, ret_val)
return found
def check_condition_variable(syncOSSI, ctxt_low, ctxt_high, binary, csv_file):
cond_var_apis = {
"InitializeConditionVariable",
"WakeConditionVariable",
"WakeAllConditionVariable",
"RtlWakeConditionVariable",
"RtlWakeAllConditionVariable",
}
found = False
ordered_api_contexts = build_ordered_api_calls(syncOSSI, ctxt_low, ctxt_high, cond_var_apis)
for api, ctxt in ordered_api_contexts:
if not caller_context_in_binary(ctxt, binary):
continue
found = True
# the condition variable is the first argument
cv_handle = ctxt.read(x64.rcx)
if csv_file is None:
print(f"{ctxt}: {api}, condition variable 0x{cv_handle:x}")
else:
# any API of this group returns void, then "unused" is passed as the return value
print_csv(csv_file, ctxt, "cv", cv_handle, "unused")
return found
def check_lock_unlock(syncOSSI, ctxt_low, ctxt_high, binary, sync_primitives, used_mutex_handles, csv_file):
transition_low = ctxt_low.transition_after().id
transition_high = ctxt_high.transition_after().id if ctxt_high is not None else syncOSSI.trace.last_transition.id
tid = get_tid(ctxt_low)
if csv_file is None:
print(
"\n==== checking the transition range [#{}, #{}] (thread id: {}) ====".format(
transition_low, transition_high, tid
)
)
found = False
if "cs" in sync_primitives:
cs_found = check_critical_section(syncOSSI, ctxt_low, ctxt_high, binary, csv_file)
found = found or cs_found
if "srw" in sync_primitives:
srw_found = check_srw_lock(syncOSSI, ctxt_low, ctxt_high, binary, csv_file)
found = found or srw_found
if "cv" in sync_primitives:
cv_found = check_condition_variable(syncOSSI, ctxt_low, ctxt_high, binary, csv_file)
found = found or cv_found
if "mx" in sync_primitives:
mx_found = check_mutex(syncOSSI, ctxt_low, ctxt_high, binary, used_mutex_handles, csv_file)
found = found or mx_found
if not found and csv_file is None:
print("\tnothing found")
def run(rvn, proc_id, proc_bin, from_id, to_id, sync_primitives, csv_file):
syncOSSI = SyncOSSI(rvn.ossi, rvn.trace)
used_mutex_handles = set()
process_ranges = find_process_ranges(rvn, proc_id, from_id, to_id)
for low, high in process_ranges:
user_mode_ranges = find_usermode_ranges(low, high)
for low_usermode, high_usermode in user_mode_ranges:
check_lock_unlock(
syncOSSI, low_usermode, high_usermode, proc_bin, sync_primitives, used_mutex_handles, csv_file
)
def parse_args():
parser = argparse.ArgumentParser(formatter_class=argparse.RawTextHelpFormatter)
parser.add_argument(
"--host",
type=str,
default="localhost",
help='Reven host, as a string (default: "localhost")',
)
parser.add_argument(
"-p",
"--port",
type=int,
default="13370",
help="Reven port, as an int (default: 13370)",
)
parser.add_argument("--pid", type=int, required=True, help="Process id")
parser.add_argument("--from_id", type=int, default=None, help="Transition to start searching from")
parser.add_argument("--to_id", type=int, default=None, help="Transition to start searching to")
parser.add_argument(
"--binary",
type=str,
default=None,
help="Process binary",
)
parser.add_argument(
"--sync",
type=str,
metavar="PRIMITIVE",
action="append",
choices=["cs", "cv", "mx", "srw"],
default=None,
help="Synchronization primitives to check.\n"
+ "Each repeated use of `--sync <primitive>` adds the corresponding primitive to check.\n"
+ "If no `--sync` option is passed, then all supported primitives are checked.\n"
+ "Supported primitives:\n"
+ " - cs: critical section\n"
+ " - cv: condition variable\n"
+ " - mx: mutex\n"
+ " - srw: slim read/write lock",
)
parser.add_argument("--raw", type=str, default=None, help="CSV file as output")
return parser.parse_args()
if __name__ == "__main__":
args = parse_args()
if args.sync is None or not args.sync:
args.sync = ["cs", "cv", "mx", "srw"]
if args.raw is None:
csv_file = None
else:
try:
csv_file = open(args.raw, "w")
except OSError:
raise RuntimeError(f"Failed to open {args.raw}")
try:
csv_file.write("transition id, primitive, handle, return value\n")
except OSError:
raise RuntimeError(f"Failed to write to {csv_file}")
rvn = reven2.RevenServer(args.host, args.port)
rvn.ossi.os().expect(
reven2.ossi.Os(
architecture=reven2.ossi.Architecture.X64,
family=reven2.ossi.OsFamily.Windows,
windows_version=reven2.ossi.WindowsVersion.Windows10,
),
reven2.ossi.Os(
architecture=reven2.ossi.Architecture.X64,
family=reven2.ossi.OsFamily.Windows,
windows_version=reven2.ossi.WindowsVersion.Windows11,
),
)
run(rvn, args.pid, args.binary, args.from_id, args.to_id, args.sync, csv_file)
Strings statistics
Purpose
Display statistics about strings accesses such as:
- binary that read/write a string.
- number of read/write accesses a binary does on a string.
How to use
usage: strings_stat.py [-h] [--host HOST] [--port PORT] [-v] [pattern]
Display statistics about strings accesses.
positional arguments:
pattern Pattern of the string, looking for "*pattern*", does not
support Regular Expression. If no pattern provided, all
strings will be used.
optional arguments:
-h, --help show this help message and exit
--host HOST Reven host, as a string (default: "localhost")
--port PORT Reven port, as an int (default: 13370)
-v, --verbose Increase output verbosity
Known limitations
N/A
Supported versions
Reven 2.2+
Supported perimeter
Any Reven scenario.
Dependencies
The script requires that the target Reven scenario have: * The Strings feature replayed. * The OSSI feature replayed.
Source
import argparse
import builtins
import logging
import reven2
"""
# Strings statistics
## Purpose
Display statistics about strings accesses such as:
* binary that read/write a string.
* number of read/write accesses a binary does on a string.
## How to use
```bash
usage: strings_stat.py [-h] [--host HOST] [--port PORT] [-v] [pattern]
Display statistics about strings accesses.
positional arguments:
pattern Pattern of the string, looking for "*pattern*", does not
support Regular Expression. If no pattern provided, all
strings will be used.
optional arguments:
-h, --help show this help message and exit
--host HOST Reven host, as a string (default: "localhost")
--port PORT Reven port, as an int (default: 13370)
-v, --verbose Increase output verbosity
```
## Known limitations
N/A
## Supported versions
Reven 2.2+
## Supported perimeter
Any Reven scenario.
## Dependencies
The script requires that the target Reven scenario have:
* The Strings feature replayed.
* The OSSI feature replayed.
"""
class BinaryStringOperations(object):
def __init__(self, binary):
self.binary = binary
self.read_count = 0
self.write_count = 0
def strings_stat(reven_server, pattern=""):
for string in reven_server.trace.strings(args.pattern):
binaries = builtins.dict()
try:
# iterates on all accesses to all binaries that access to the string (read/write).
for memory_access in string.memory_accesses():
ctx = memory_access.transition.context_before()
if ctx.ossi.location() is None:
continue
binary = ctx.ossi.location().binary
try:
binary_operations = binaries[binary.path]
except KeyError:
binaries[binary.path] = BinaryStringOperations(binary)
binary_operations = binaries[binary.path]
if memory_access.operation == reven2.memhist.MemoryAccessOperation.Read:
binary_operations.read_count += 1
else:
binary_operations.write_count += 1
except RuntimeError:
# Limitation of `memory_accesses` method that raise a RuntimeError when
# the service timeout.
pass
yield (string, binaries)
def parse_args():
parser = argparse.ArgumentParser(description="Display statistics about strings accesses.")
parser.add_argument(
"--host", dest="host", help='Reven host, as a string (default: "localhost")', default="localhost", type=str
)
parser.add_argument("--port", dest="port", help="Reven port, as an int (default: 13370)", type=int, default=13370)
parser.add_argument(
"-v",
"--verbose",
dest="log_level",
help="Increase output verbosity",
action="store_const",
const=logging.DEBUG,
default=logging.INFO,
)
parser.add_argument(
"pattern",
nargs="?",
help="Pattern of the string, looking for "
'"*pattern*", does not support Regular Expression. If no pattern provided, '
"all strings will be used.",
default="",
type=str,
)
return parser.parse_args()
if __name__ == "__main__":
args = parse_args()
logging.basicConfig(format="%(message)s", level=args.log_level)
logging.debug('##### Getting stat for all strings containing "{0}" #####\n'.format(args.pattern))
# Get a server instance
reven_server = reven2.RevenServer(args.host, args.port)
# Print strings
for string, binaries in strings_stat(reven_server, args.pattern):
logging.info('"{}":'.format(string.data))
for binary_operations in binaries.values():
logging.info(
"\t- {} (Read: {} - Write: {})".format(
binary_operations.binary.filename, binary_operations.read_count, binary_operations.write_count
)
)
logging.info("")
logging.debug("##### Done #####")
Crash Detection
Purpose
Detect and report crashes and exceptions that occur during the trace. This script detects system crashes that occur inside of a trace, as well as exceptions thrown in user space.
How to use
Usage: crash_detection.py [-h] [--host host] [--port port] [--mode mode]
[--header]
Detect and report crashes and exceptions that appear during a Reven scenario.
optional arguments:
-h, --help show this help message and exit
--host host Reven host, as a string (default: "localhost")
--port port Reven port, as an int (default: 13370)
--mode mode Whether to look for "user" crash, "system" crash, or "all"
--header If present, display a header with the meaning of each column
Known limitations
Because user space processes can catch exceptions, a user exception reported by this script does not necessarily means that the involved user space process crashed after causing the exception.
Supported versions
Reven 2.12+
Supported perimeter
Any Windows 10/11 x64 Reven scenario.
Dependencies
The script requires that the target Reven scenario have:
- The Fast Search feature replayed.
- The OSSI feature replayed.
- The Backtrace feature replayed.
Source
#!/usr/bin/env python3
import argparse
import reven2
# %% [markdown]
# # Crash Detection
#
# ## Purpose
#
# Detect and report crashes and exceptions that occur during the trace.
#
# This script detects system crashes that occur inside of a trace, as well as exceptions thrown in user space.
#
# ## How to use
#
# ```bash
# Usage: crash_detection.py [-h] [--host host] [--port port] [--mode mode]
# [--header]
#
# Detect and report crashes and exceptions that appear during a Reven scenario.
#
# optional arguments:
# -h, --help show this help message and exit
# --host host Reven host, as a string (default: "localhost")
# --port port Reven port, as an int (default: 13370)
# --mode mode Whether to look for "user" crash, "system" crash, or "all"
# --header If present, display a header with the meaning of each column
# ```
#
# ## Known limitations
#
# Because user space processes can catch exceptions, a user exception reported by this script does not necessarily
# means that the involved user space process crashed after causing the exception.
#
# ## Supported versions
#
# Reven 2.12+
#
# ## Supported perimeter
#
# Any Windows 10/11 x64 Reven scenario.
#
# ## Dependencies
#
# The script requires that the target Reven scenario have:
# - The Fast Search feature replayed.
# - The OSSI feature replayed.
# - The Backtrace feature replayed.
HIGH_LEVEL_EXCEPTION_CODES = {
0x80000003: "breakpoint",
0x80000004: "single step debug",
0xC000001D: "illegal instruction",
0xC0000094: "integer division by zero",
0xC0000005: "access violation",
0xC0000409: "stack buffer overrun",
}
# Obtained by reversing the transformations performed on high level exception codes
# Some of the crashes find the low-level exception codes rather than the high-level ones
LOW_LEVEL_EXCEPTION_CODES = {
0x80000003: "breakpoint",
0x80000004: "single step debug",
0x10000002: "illegal instruction",
0x10000003: "integer division by zero",
0x10000004: "access violation",
}
class SystemCrash:
# Code values recovered here:
# https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/bug-check-code-reference2
PF_BUG_CHECK_CODES = [0x50, 0xCC, 0xCD, 0xD5, 0xD6]
EXCEPTION_BUG_CHECK_CODES = [0x1E, 0x7E, 0x8E, 0x8E, 0x135, 0x1000007E, 0x1000008E]
SYSTEM_SERVICE_EXCEPTION = 0x3B
KERNEL_SECURITY_CHECK_FAILURE = 0x139
def __init__(self, trace, dispatcher_ctx):
self._trace = trace
self._dispatch_ctx = dispatcher_ctx
self._bug_check_code = dispatcher_ctx.read(reven2.arch.x64.ecx)
self._error_code = None
self._page_fault_address = None
self._page_fault_operation = None
self._process = dispatcher_ctx.ossi.process()
if self._bug_check_code in SystemCrash.PF_BUG_CHECK_CODES:
# page fault address is the 2nd parameter of KeBugCheckEx call for PAGE_FAULT bug checks.
# operation is 3rd parameter of KeBugCheckEx. See for instance:
# https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/bug-check-0xcc--page-fault-in-freed-special-pool
self._page_fault_address = dispatcher_ctx.read(reven2.arch.x64.rdx)
self._page_fault_operation = dispatcher_ctx.read(reven2.arch.x64.r8)
elif self._bug_check_code in SystemCrash.EXCEPTION_BUG_CHECK_CODES:
# error code is the 2nd parameter of KeBugCheckEx call for EXCEPTION bug checks. See for instance:
# https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/bug-check-0x1e--kmode-exception-not-handled
self._error_code = dispatcher_ctx.read(reven2.arch.x64.edx)
elif self._bug_check_code == SystemCrash.KERNEL_SECURITY_CHECK_FAILURE:
# error code can be found as the first member of the exception structure that is 4th parameter of
# KeBugCheckEx call. See:
# https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/bug-check-0x139--kernel-security-check-failure
# https://docs.microsoft.com/en-us/windows/win32/api/winnt/ns-winnt-exception_record
self._error_code = dispatcher_ctx.deref(reven2.arch.x64.r9, reven2.types.Pointer(reven2.types.U32))
elif self._bug_check_code == SystemCrash.SYSTEM_SERVICE_EXCEPTION:
self._error_code = dispatcher_ctx.read(reven2.arch.x64.edx)
# Look for the exception transition in the backtrace, if any is found
self._exception_transition = None
for frame in dispatcher_ctx.transition_before().context_before().stack.frames():
if frame.creation_transition is not None and frame.creation_transition.exception:
self._exception_transition = frame.creation_transition
break
@property
def dispatch_ctx(self):
return self._dispatch_ctx
@property
def exception_transition(self):
return self._exception_transition
@property
def error_code(self):
return self._error_code
@property
def bug_check_code(self):
return self._bug_check_code
@property
def page_fault_address(self):
return self._page_fault_address
@property
def page_fault_operation(self):
return self._page_fault_operation
@property
def process(self):
return self._process
class UserCrash:
def __init__(self, trace, dispatcher_ctx):
self._trace = trace
self._dispatch_ctx = dispatcher_ctx
self._exception_transition = None
self._process = dispatcher_ctx.ossi.process()
try:
frame = next(dispatcher_ctx.transition_before().context_before().stack.frames())
self._exception_transition = frame.first_context.transition_before()
except StopIteration:
pass
self._error_code = None
# heuristic: go back some transitions to get good stack trace
ctx_before_rsp_changed = dispatcher_ctx - 8
frames = ctx_before_rsp_changed.stack.frames()
try:
ki_exception_dispatch_ctx = next(frames).first_context
self._error_code = ki_exception_dispatch_ctx.read(reven2.arch.x64.ecx)
except StopIteration:
pass
@property
def dispatch_ctx(self):
return self._dispatch_ctx
@property
def exception_transition(self):
return self._exception_transition
@property
def error_code(self):
return self._error_code
@property
def process(self):
return self._process
def detect_system_crashes(server):
try:
ntoskrnl = next(server.ossi.executed_binaries("ntoskrnl"))
except StopIteration:
raise RuntimeError("Could not find the ntoskrnl binary. " "Is this a Windows 10 trace with OSSI enabled?")
try:
ke_bug_check_ex = next(ntoskrnl.symbols("KeBugCheckEx"))
except StopIteration:
raise RuntimeError(
"Could not find the KeBugCheckEx symbol in ntoskrnl. " "Is this a Windows 10 trace with OSSI enabled?"
)
for call in server.trace.search.symbol(ke_bug_check_ex):
yield SystemCrash(server.trace, call)
def detect_user_crashes(server):
try:
ntdll = next(server.ossi.executed_binaries("ntdll"))
except StopIteration:
raise RuntimeError("Could not find the ntdll binary. " "Is this a Windows 10 trace with OSSI enabled?")
try:
ki_user_exception_dispatcher = next(ntdll.symbols("KiUserExceptionDispatch"))
except StopIteration:
raise RuntimeError(
"Could not find the KiUserExceptionDispatch symbol in ntdll. "
"Is this a Windows 10 trace with OSSI enabled?"
)
for call in server.trace.search.symbol(ki_user_exception_dispatcher):
yield UserCrash(server.trace, call)
def format_exception_code(error_code):
if error_code is None:
return None
if error_code in HIGH_LEVEL_EXCEPTION_CODES:
return "{} ({:#x})".format(HIGH_LEVEL_EXCEPTION_CODES[error_code], error_code)
elif error_code in LOW_LEVEL_EXCEPTION_CODES:
return "{} ({:#x})".format(LOW_LEVEL_EXCEPTION_CODES[error_code], error_code)
return "unknown or incorrect exception code: {:#x}".format(error_code)
def format_page_fault(page_fault_address, page_fault_operation):
if page_fault_address is None:
return None
# operations changed recently for bug check 0x50. It should work with any version though. See:
# https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/bug-check-0x50--page-fault-in-nonpaged-area#page_fault_in_nonpaged_area-parameters
if page_fault_operation == 0:
operation = "reading"
elif page_fault_operation == 1 or page_fault_operation == 2:
operation = "writing"
elif page_fault_operation == 10:
operation = "executing"
else:
return "page fault on address {:#x}".format(page_fault_address)
return "page fault while {} address {:#x}".format(operation, page_fault_address)
def format_cause(error_code=None, page_fault_address=None, page_fault_operation=None):
exception_fmt = format_exception_code(error_code)
if exception_fmt is not None:
return "{}".format(exception_fmt)
page_fault_fmt = format_page_fault(page_fault_address, page_fault_operation)
if page_fault_fmt is not None:
return "{}".format(page_fault_fmt)
return "Unknown"
def detect_crashes(server, has_system, has_user, has_header=False):
if has_header:
print("Mode | Process | Context | BugCheck | Cause | Exception transition")
print("-----|---------|---------|----------|-------|---------------------")
if has_system:
for system_crash in detect_system_crashes(server):
print(
"System | {} | {} | {:#x} | {} | {}".format(
system_crash.process,
system_crash.dispatch_ctx,
system_crash.bug_check_code,
format_cause(
system_crash.error_code, system_crash.page_fault_address, system_crash.page_fault_operation
),
system_crash.exception_transition,
)
)
if has_user:
for user_crash in detect_user_crashes(server):
print(
"User | {} | {} | N/A | {} | {}".format(
user_crash.process,
user_crash.dispatch_ctx,
format_exception_code(user_crash.error_code),
user_crash.exception_transition,
)
)
CRASH_MODE_DICT = {"all": (True, True), "user": (False, True), "system": (True, False)}
def parse_args():
parser = argparse.ArgumentParser(
description="Detect and report crashes and exceptions that appear during a " "Reven scenario.\n",
formatter_class=argparse.RawTextHelpFormatter,
)
parser.add_argument(
"--host",
metavar="host",
dest="host",
help='Reven host, as a string (default: "localhost")',
default="localhost",
type=str,
)
parser.add_argument(
"--port", metavar="port", dest="port", help="Reven port, as an int (default: 13370)", type=int, default=13370
)
parser.add_argument(
"--mode",
metavar="mode",
dest="mode",
help='Whether to look for "user" crash, "system" crash, or "all"',
type=str,
default="all",
)
parser.add_argument(
"--header",
action="store_true",
dest="header",
help="If present, display a header with the meaning of each column",
)
args = parser.parse_args()
return args
if __name__ == "__main__":
args = parse_args()
if args.mode not in CRASH_MODE_DICT:
raise ValueError(
'Wrong "mode" value "{}". Mode must be "all",' ' "user" or "system" (defaults to "all").'.format(args.mode)
)
(has_system, has_user) = CRASH_MODE_DICT[args.mode]
# Get a server instance
reven_server = reven2.RevenServer(args.host, args.port)
reven_server.ossi.os().expect(
reven2.ossi.Os(
architecture=reven2.ossi.Architecture.X64,
family=reven2.ossi.OsFamily.Windows,
windows_version=reven2.ossi.WindowsVersion.Windows10,
),
reven2.ossi.Os(
architecture=reven2.ossi.Architecture.X64,
family=reven2.ossi.OsFamily.Windows,
windows_version=reven2.ossi.WindowsVersion.Windows11,
),
)
detect_crashes(reven_server, has_system, has_user, args.header)
Networking examples
The examples in this section analyze the network activity of a Reven scenario to produce useful information, such as PCAP files that can then be analyzed in Wireshark.
Taint PCAP
Purpose
Display the list of functions that handle each network packet sent/received during a Reven scenario.
How to use
usage: taint_pcap.py [-h] [--host host] [--port port] [--symbol symbols]
[--recv-only]
Taint every pcap from a trace.
optional arguments:
-h, --help show this help message and exit
--host host Reven host, as a string (default: "localhost")
--port port Reven port, as an int (default: 13370)
--symbol symbols Symbol name that maybe the packet goes through, as a
string (default ""). This argument can be repeated.
--recv-only If specified, handle receivedpackets only
Known limitations
N/A
Supported versions
Reven 2.12+
Supported perimeter
Any Windows 10/11 x64 scenario.
Dependencies
- The script requires the scapy package.
- The
network_packet_tools.py
file distributed alongside thisexample must be provided (e.g. in the same directory). - The script requires that the target Reven scenario have:
- The Fast Search feature replayed.
- The OSSI feature replayed.
- An access to the binary 'e1g6032e.sys' and its PDB file.
Source
import argparse
import itertools
import network_packet_tools as nw_tools
import reven2
from reven2.preview.taint import TaintedMemories, Tainter
"""
# Taint PCAP
## Purpose
Display the list of functions that handle each network packet sent/received during a Reven scenario.
## How to use
```bash
usage: taint_pcap.py [-h] [--host host] [--port port] [--symbol symbols]
[--recv-only]
Taint every pcap from a trace.
optional arguments:
-h, --help show this help message and exit
--host host Reven host, as a string (default: "localhost")
--port port Reven port, as an int (default: 13370)
--symbol symbols Symbol name that maybe the packet goes through, as a
string (default ""). This argument can be repeated.
--recv-only If specified, handle receivedpackets only
```
## Known limitations
N/A
## Supported versions
Reven 2.12+
## Supported perimeter
Any Windows 10/11 x64 scenario.
## Dependencies
- The script requires the scapy package.
- The `network_packet_tools.py` file distributed alongside thisexample must be provided (e.g. in the same directory).
- The script requires that the target Reven scenario have:
- The Fast Search feature replayed.
- The OSSI feature replayed.
- An access to the binary 'e1g6032e.sys' and its PDB file.
"""
def parse_args():
parser = argparse.ArgumentParser(description="Taint every pcap from a trace.")
parser.add_argument(
"--host",
metavar="host",
dest="host",
help='Reven host, as a string (default: "localhost")',
default="localhost",
type=str,
)
parser.add_argument(
"--port", metavar="port", dest="port", help="Reven port, as an int (default: 13370)", type=int, default=13370
)
parser.add_argument(
"--symbol",
metavar="symbols",
action="append",
dest="symbols",
help="Symbol name that maybe the packet "
'goes through, as a string (default ""). This argument can be repeated.',
type=str,
default=[],
)
parser.add_argument(
"--recv-only", dest="recv_only", action="store_true", help="If specified, handle received" "packets only "
)
args = parser.parse_args()
return args
def get_memory_range_of_received_network_packet(ctx):
info = nw_tools.get_memory_address_and_size_of_received_network_packet(ctx)
if info is None:
return None
return TaintedMemories(info[0], info[1])
def get_memory_range_of_sent_network_packet(ctx):
infos = nw_tools.get_memory_addresses_and_sizes_of_sent_network_packet(ctx)
tainted_mems = []
for info in infos:
tainted_mems.append(TaintedMemories(info[0], info[1]))
return tainted_mems
def get_all_send_recv(rvn, recv_only=False):
print("[+] Get all sent/received packets...")
send_queries, recv_queries = nw_tools.get_all_send_recv_packet_context(rvn)
# `reven2.util.collate` enables to iterate over multiple generators in a sorted way
if recv_only:
return zip(reven2.util.collate(recv_queries), itertools.repeat("recv"))
send_results = zip(reven2.util.collate(send_queries), itertools.repeat("send"))
recv_results = zip(reven2.util.collate(recv_queries), itertools.repeat("recv"))
# Return a sorted generator of both results regarding their context
return reven2.util.collate([send_results, recv_results], lambda ctx_type: ctx_type[0])
def found_symbol(current_symbol, user_symbols):
for sym in user_symbols:
if sym.lower() in current_symbol.name.lower():
return True
return False
def taint_pcap(reven_server, recv_only=False, user_symbols=[]):
# Initialize Tainter
tainter = Tainter(reven_server.trace)
# Get all send and recv from the trace
results = list(get_all_send_recv(reven_server, recv_only))
if len(results) == 0:
print("[+] Finished: no network packets were sent/received in the trace")
return
# Get packets memory range
for ctx, ty in results:
# Just detect if send or recv context
# Taint packet in forward when it is received and in backward when it is sent
is_forward = False if ty == "send" else True
mem_range = (
get_memory_range_of_sent_network_packet(ctx)
if ty == "send"
else get_memory_range_of_received_network_packet(ctx)
)
if mem_range is None or isinstance(mem_range, list) and len(mem_range) == 0:
continue
taint = tainter.simple_taint(tag0=mem_range, from_context=ctx, is_forward=is_forward)
print("\n=====================================================================================")
print(
"[+]{} - {} packet at address {}".format(
ctx.transition_before(),
"Received" if is_forward else "Sent",
mem_range if is_forward else ["{}".format(mem) for mem in mem_range],
)
)
last_symbol = None
for change in taint.accesses(changes_only=True).all():
loc = change.transition.context_before().ossi.location()
if loc is None:
continue
symbol = loc.symbol
if symbol is None or last_symbol is not None and symbol == last_symbol:
continue
if len(user_symbols) == 0 or found_symbol(symbol, user_symbols):
# if user_symbols is an empty list then no requested symbols, don't filter output
print("{}: {}".format(change.transition, symbol))
last_symbol = symbol
print("=====================================================================================\n")
print("[+] Finished: tainting all pcap in the trace")
if __name__ == "__main__":
args = parse_args()
print("[+] Start tainting pcap from trace...")
# Get a server instance
rvn = reven2.RevenServer(args.host, args.port)
rvn.ossi.os().expect(
reven2.ossi.Os(
architecture=reven2.ossi.Architecture.X64,
family=reven2.ossi.OsFamily.Windows,
windows_version=reven2.ossi.WindowsVersion.Windows10,
),
reven2.ossi.Os(
architecture=reven2.ossi.Architecture.X64,
family=reven2.ossi.OsFamily.Windows,
windows_version=reven2.ossi.WindowsVersion.Windows11,
),
)
taint_pcap(rvn, args.recv_only, args.symbols)
This module introduces two functions. The first one is used to get the memory address and the size of the buffer used to receive a network packet. While the second one is used to return a list of memory addresses and sizes of buffers used to send a network packet.
Source
"""
This module introduces two functions. The first one is used to get the memory address and the size of the buffer
used to receive a network packet. While the second one is used to return a list of memory addresses and
sizes of buffers used to send a network packet.
"""
from typing import Iterator as _Iterator, List as _List, Optional as _Optional, Tuple as _Tuple, cast as _cast
import reven2
def get_memory_address_and_size_of_received_network_packet(
ctx: reven2.trace.Context,
) -> _Optional[reven2.MemoryRange[reven2.address._AbstractAddress]]:
"""
This function returns a pair of memory address, size of the received packet
'ctx' must be a context resulting from searching "RxPacketAssemble" symbol in the trace
Information
===========
@param ctx: the context used to retrieve the list of memory address and size of sent packet buffer
"""
# To get the memory address of received packets, we need to dereference multiple times some pointers in memory.
# The first one is rcx, as argument. It points to a huge structure.
# We don't know its type so we can't use the type API.
# at rcx+0x308 is a pointer to a structure, which contains the size at +8
# at rcx+0x328 is a pointer index that points the right structure to get for the buffer
# at rcx+0x328+8 * index is a pointer to the network buffer.
# at rcx+0x308, then deref +0xc is a byte that is tested at 1 and 2. If 0, then the call to RxPacketAssemble is
# the last one of a serie and doesn't contain any buffer to fetch.
# Get a pointer to the huge structure
pHugeStruct = reven2.address.LogicalAddress(ctx.read(reven2.arch.x64.rcx, reven2.types.USize))
# Deref to get a pointer to the structure that contains the size
pSizeStruct: reven2.address.LogicalAddress = reven2.address.LogicalAddress(
ctx.read(pHugeStruct + 0x308, reven2.types.USize)
)
u8Flag: int = ctx.read(pSizeStruct + 0xC, reven2.types.U8)
# Is last packet part?
if (u8Flag & 0x3) == 0:
return None
u32Size: int = ctx.read(pSizeStruct + 0x8, reven2.types.U32)
# Next get the index in the structure
pu32IndexRaw = reven2.address.LogicalAddress(ctx.read(pHugeStruct + 0x328, reven2.types.USize))
# The index is a dword (eax is used)
u32IndexRaw: int = ctx.read(pu32IndexRaw, reven2.types.U32)
# Now, the system perform an operation on this index.
u32Index = (u32IndexRaw + u32IndexRaw * 2) * 2
# Now get a pointer to the buffer
pArray = reven2.address.LogicalAddress(ctx.read(pHugeStruct + 0x328, reven2.types.USize) + 8)
pBuffer: reven2.address.LogicalAddress = reven2.address.LogicalAddress(
ctx.read(pArray + 8 * u32Index + 0x20, reven2.types.USize)
)
return reven2.MemoryRange(pBuffer, u32Size)
def get_memory_addresses_and_sizes_of_sent_network_packet(
reven_server: reven2.RevenServer, ctx: reven2.trace.Context
) -> _List[reven2.MemoryRange[reven2.address._AbstractAddress]]:
"""
This Function returns a list of memory address, size of a sent packet
'ctx' must be a context resulting from searching "E1000SendNetBufferLists" symbol in the trace
Information
===========
@param ctx: the context used to retrieve the memory address and size of received packet buffer
"""
ndis = next(reven_server.ossi.executed_binaries("ndis.sys"))
net_buffer_list_type = _cast(reven2.types.Struct, ndis.exact_type("_NET_BUFFER_LIST"))
net_buffer_list: reven2.types.StructInstance = ctx.deref(
reven2.arch.x64.rdx, reven2.types.Pointer(net_buffer_list_type)
)
net_buffer = net_buffer_list.field("FirstNetBuffer").deref_struct()
mdl = net_buffer.field("CurrentMdl").deref_struct()
mdlOffset = net_buffer.field("CurrentMdlOffset").read_int()
packet_memory_addresses = []
packet_memory_addresses.append(_get_network_packet_address_from_mdl(ctx, mdl, mdlOffset=mdlOffset))
pNextMdl = mdl.field("Next").read_ptr().assert_struct()
while pNextMdl.address.offset != 0:
nextMdl = pNextMdl.deref()
packet_memory_addresses.append(_get_network_packet_address_from_mdl(ctx, nextMdl))
pNextMdl = nextMdl.field("Next").read_ptr().assert_struct()
return packet_memory_addresses
def get_all_send_recv_packet_context(
reven_server: reven2.RevenServer,
) -> _Tuple[_List[_Iterator[reven2.trace.Context]], _List[_Iterator[reven2.trace.Context]]]:
"""
This function return a list of all contexts used to send or receive network packets.
To get these contexts, this function searches the symbol `E1000SendNetBufferLists` to get contexts of
sent network packets. and searches the symbol `RxPacketAssemble` to get contexts of received network packets.
This function requires that the trace has the PDB of `e1g6032e.sys` binary otherwise no context will be found.
'reven_server' is the L{reven2.RevenServer} instance on which to perform the search
Information
===========
@param reven_server: L{reven2.RevenServer} instance on which to search packets
"""
# Get generators of search results
send_queries = [
reven_server.trace.search.symbol(symbol)
for symbol in reven_server.ossi.symbols(pattern="E1000SendNetBufferLists", binary_hint="e1g6032e.sys")
]
recv_queries = [
reven_server.trace.search.symbol(symbol)
for symbol in reven_server.ossi.symbols(pattern="RxPacketAssemble", binary_hint="e1g6032e.sys")
]
if len(send_queries) == 0 and len(recv_queries) == 0:
print(
"No network packets exist in this trace, make sure that this trace is a network trace,"
" and if it is, make sure that the PDB of `e1g6032e.sys` binary is available in the scenario"
)
return send_queries, recv_queries
def _get_network_packet_address_from_mdl(
ctx: reven2.trace.Context, mdl: reven2.types.StructInstance, mdlOffset: int = 0
) -> reven2.MemoryRange[reven2.address._AbstractAddress]:
pBufferStartVa = mdl.field("MappedSystemVa").read_ptr()
u32Size = mdl.field("ByteCount").read_int()
return reven2.MemoryRange(pBufferStartVa.address + mdlOffset, u32Size - mdlOffset)
Dump PCAP
Purpose
Generate a PCAP file containing all network packets that were sent/received in a trace.
The timestamp of packets is replaced by the transition id where the packet was sent/received.
How to use
usage: dump_pcap.py [-h] [--host host] [--port port] [--filename file_name]
[--fix-checksum]
Dump a PCAP file from a Windows 10 x64 trace. To get the time as transition ID in wireshark, select:
View->Time display format->Seconds since 1970-01-01
optional arguments:
-h, --help show this help message and exit
--host host Reven host, as a string (default: "localhost")
--port port Reven port, as an int (default: 13370)
--filename file_name the output file name (default: "output.pcap"). Will be created if it doesn't exist
--fix-checksum If not specified, the packet checksum won't be fixed and you will have the buffer
that has been dumped from memory, and a lot of ugly packets in Wireshark,
that you can also ignore if needed.
Known limitations
N/A
Supported versions
Reven 2.12+
Supported perimeter
Any Windows 10/11 x64 scenario.
Dependencies
- The script requires the scapy package
- The
network_packet_tools.py
file distributed alongside this example must be provided (e.g. in the same directory). - The script requires that the target Reven scenario have:
- The Fast Search feature replayed.
- The OSSI feature replayed.
- An access to the binary
e1g6032e.sys
and its PDB file.
Source
#!/usr/bin/env python3
import argparse
import itertools
import os
from typing import Iterator as _Iterator, List as _List, Optional as _Optional, Tuple as _Tuple
import network_packet_tools as nw_tools
import reven2
from scapy.all import Ether, TCP, wrpcap
"""
# Dump PCAP
## Purpose
Generate a PCAP file containing all network packets that were sent/received in a trace.
The timestamp of packets is replaced by the transition id where the packet was sent/received.
## How to use
```bash
usage: dump_pcap.py [-h] [--host host] [--port port] [--filename file_name]
[--fix-checksum]
Dump a PCAP file from a Windows 10 x64 trace. To get the time as transition ID in wireshark, select:
View->Time display format->Seconds since 1970-01-01
optional arguments:
-h, --help show this help message and exit
--host host Reven host, as a string (default: "localhost")
--port port Reven port, as an int (default: 13370)
--filename file_name the output file name (default: "output.pcap"). Will be created if it doesn't exist
--fix-checksum If not specified, the packet checksum won't be fixed and you will have the buffer
that has been dumped from memory, and a lot of ugly packets in Wireshark,
that you can also ignore if needed.
```
## Known limitations
N/A
## Supported versions
Reven 2.12+
## Supported perimeter
Any Windows 10/11 x64 scenario.
## Dependencies
- The script requires the scapy package
- The `network_packet_tools.py` file distributed alongside this example must be provided (e.g. in the same directory).
- The script requires that the target Reven scenario have:
- The Fast Search feature replayed.
- The OSSI feature replayed.
- An access to the binary `e1g6032e.sys` and its PDB file.
"""
def parse_args():
parser = argparse.ArgumentParser(
description="Dump a PCAP file from a Windows 10 x64 trace. "
"To get the time as transition ID in "
"wireshark, select:\nView->Time display format->Seconds since "
"1970-01-01\n",
formatter_class=argparse.RawTextHelpFormatter,
)
parser.add_argument(
"--host",
metavar="host",
dest="host",
help='Reven host, as a string (default: "localhost")',
default="localhost",
type=str,
)
parser.add_argument(
"--port", metavar="port", dest="port", help="Reven port, as an int (default: 13370)", type=int, default=13370
)
parser.add_argument(
"--filename",
metavar="file_name",
dest="file_name",
help="the output " 'file name (default: "output.pcap"). Will be created if it doesn\'t exist',
default="output.pcap",
)
parser.add_argument(
"--fix-checksum",
dest="fix_checksum",
action="store_true",
help="If not specified, the packet checksum won't be fixed and you will have the buffer that \
has been dumped from memory, and a lot of ugly packets in Wireshark, that you can also ignore \
if needed.",
)
args = parser.parse_args()
return args
def get_network_buffer_recv_RxPacketAssemble(
ctx: reven2.trace.Context,
) -> _Tuple[_List[reven2.MemoryRange[reven2.address._AbstractAddress]], _Optional[bytearray]]:
packet_memory_range = nw_tools.get_memory_address_and_size_of_received_network_packet(ctx)
Buffer: _Optional[bytearray] = None
sources = []
if packet_memory_range is not None:
sources = [packet_memory_range]
# Get the buffer
Buffer = ctx.read(packet_memory_range, raw=True)
return sources, Buffer
def get_network_buffer_send_NdisSendNetBufferLists(
reven_server: reven2.RevenServer, ctx: reven2.trace.Context
) -> _Tuple[_List[reven2.MemoryRange[reven2.address._AbstractAddress]], _Optional[bytearray]]:
packet_memory_ranges = nw_tools.get_memory_addresses_and_sizes_of_sent_network_packet(reven_server, ctx)
Buffer: _Optional[bytearray] = None
sources = []
# read buffer and join them
for memory_range in packet_memory_ranges:
sources.append(memory_range)
if Buffer is None:
Buffer = ctx.read(memory_range, raw=True)
else:
Buffer += ctx.read(memory_range, raw=True)
return sources, Buffer
def get_all_send_recv(reven_server: reven2.RevenServer) -> _Iterator[_Tuple[reven2.trace.Context, str]]:
print("[+] Get all sent/received packets...")
send_queries, recv_queries = nw_tools.get_all_send_recv_packet_context(reven_server)
# `reven2.util.collate` enables to iterate over multiple generators in a sorted way
send_results = zip(reven2.util.collate(send_queries), itertools.repeat("send"))
recv_results = zip(reven2.util.collate(recv_queries), itertools.repeat("recv"))
# Return a sorted generator of both results regarding their context
return reven2.util.collate([send_results, recv_results], lambda ctx_type: ctx_type[0])
def dump_pcap(reven_server: reven2.RevenServer, output_file: str = "output.pcap", fix_checksum: bool = False) -> None:
if os.path.isfile(output_file):
raise RuntimeError(
'"{}" already exists. Choose an other output file or remove it before running the script.'.format(
output_file
)
)
print("[+] Creating pcap from trace...")
# Get all send and recv from the trace
results = list(get_all_send_recv(reven_server))
if len(results) == 0:
print("[+] Finished: no network packets were sent/received in the trace")
return
# Get packets buffers and create the pcap file.
print("[+] Convert packets to pcap format and write to file...")
for ctx, ty in results:
# Just detect if send or recv context
if ty == "send":
sources, buf = get_network_buffer_send_NdisSendNetBufferLists(reven_server, ctx)
else:
sources, buf = get_network_buffer_recv_RxPacketAssemble(ctx)
if buf is not None:
packet = Ether(bytes(buf))
# Here we check wether or not we have to fix checksum.
if fix_checksum:
if TCP in packet:
del packet[TCP].chksum
# Replace the time in the packet by the transition ID, so that we get
# it in Wireshark in a nice way.
transition = ctx.transition_before().id
packet.time = transition
# Write packet to pcap file
wrpcap(output_file, packet, append=True)
# Print packet information
sources_str = ", ".join(
[
"{size} bytes at {address}".format(size=memory_range.size, address=memory_range.address)
for memory_range in sources
]
)
print("#{transition} [{type}] {sources}".format(transition=transition, type=ty, sources=sources_str))
print("[+] Finished: PCAP file is '{}'.".format(output_file))
if __name__ == "__main__":
args = parse_args()
# Get a server instance
reven_server = reven2.RevenServer(args.host, args.port)
reven_server.ossi.os().expect(
reven2.ossi.Os(
architecture=reven2.ossi.Architecture.X64,
family=reven2.ossi.OsFamily.Windows,
windows_version=reven2.ossi.WindowsVersion.Windows10,
),
reven2.ossi.Os(
architecture=reven2.ossi.Architecture.X64,
family=reven2.ossi.OsFamily.Windows,
windows_version=reven2.ossi.WindowsVersion.Windows11,
),
)
# Generate the PCAP file
dump_pcap(reven_server, output_file=args.file_name, fix_checksum=args.fix_checksum)
OSSI
Examples in this section demonstrate OS-Specific Information capabilities, such as browsing processes, binaries and symbols.
Binary Coverage
Purpose
This script is designed to build the coverage of a binary executed in a Reven scenario.
How to use
usage: bin_coverage.py [-h] [--host HOST] [-p PORT] binary
positional arguments:
binary Binary on which to compute coverage
optional arguments:
-h, --help show this help message and exit
--host HOST Reven host, as a string (default: "localhost")
-p PORT, --port PORT Reven port, as an int (default: 13370)
Known limitations
N/A
Supported versions
Reven 2.2+
Supported perimeter
Any Reven scenario.
Dependencies
None.
Source
import argparse
import builtins
from collections import defaultdict
import reven2 as reven
# %% [markdown]
# # Binary Coverage
#
# ## Purpose
#
# This script is designed to build the coverage of a binary executed in a Reven scenario.
#
# ## How to use
#
# ```bash
# usage: bin_coverage.py [-h] [--host HOST] [-p PORT] binary
#
# positional arguments:
# binary Binary on which to compute coverage
#
# optional arguments:
# -h, --help show this help message and exit
# --host HOST Reven host, as a string (default: "localhost")
# -p PORT, --port PORT Reven port, as an int (default: 13370)
# ```
#
# ## Known limitations
#
# N/A
#
# ## Supported versions
#
# Reven 2.2+
#
# ## Supported perimeter
#
# Any Reven scenario.
#
# ## Dependencies
#
# None.
def find_and_choose_binary(ossi, requested_binary):
binaries = list(ossi.executed_binaries(requested_binary))
if len(binaries) == 0:
raise RuntimeError('Binary "{}" not executed in the trace.'.format(requested_binary))
if len(binaries) == 1:
return binaries[0]
print('Multiple matches for "{}":'.format(requested_binary))
for index, binary in enumerate(binaries):
print("{}: {}".format(index, binary.path))
answer = builtins.input("Please choose one binary: ")
return binaries[int(answer)]
def compute_binary_coverages(trace, binary):
coverages = {}
for ctx in trace.search.binary(binary):
asid = ctx.read(reven.arch.x64.cr3)
loc = ctx.ossi.location()
symbol = "unknown" if loc.symbol is None else loc.symbol.name
asid_coverage = coverages.setdefault(asid, [loc.base_address, defaultdict(int)])[1]
asid_coverage[symbol] += 1
return coverages
def binary_coverage(reven_server, binary):
binary = find_and_choose_binary(reven_server.ossi, binary)
return compute_binary_coverages(reven_server.trace, binary)
def print_binary_coverages(coverages):
for asid, asid_coverage in coverages.items():
print("***** Coverage for CR3 = {:#x}: base address = {:#x} *****\n".format(asid, asid_coverage[0]))
for symbol, symbol_coverage in asid_coverage[1].items():
print(" {}: {}".format(symbol, symbol_coverage))
print("\n")
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--host", type=str, default="localhost", help='Reven host, as a string (default: "localhost")')
parser.add_argument("-p", "--port", type=int, default="13370", help="Reven port, as an int (default: 13370)")
parser.add_argument("binary", type=str, help="Binary on which to compute coverage")
args = parser.parse_args()
reven_server = reven.RevenServer(args.host, args.port)
coverages = binary_coverage(reven_server, args.binary)
print_binary_coverages(coverages)
List created processes
Purpose
List all processes created in the trace.
How to use
usage: list_created_processes.py [-h] [--host HOST] [-p PORT]
optional arguments:
-h, --help show this help message and exit
--host HOST Reven host, as a string (default: "localhost")
-p PORT, --port PORT Reven port, as an int (default: 13370)
Known limitations
N/A
Supported versions
Reven 2.12+
Supported perimeter
Any Windows x64 scenario.
Dependencies
The script requires that the target Reven scenario have:
- The Memory History feature replayed.
- The Fast Search feature replayed.
- The OSSI feature replayed.
- An access to the binaries 'kernel32.dll' and 'kernelbase.dll' and their PDB files.
Source
import argparse
import reven2
"""
# List created processes
## Purpose
List all processes created in the trace.
## How to use
```bash
usage: list_created_processes.py [-h] [--host HOST] [-p PORT]
optional arguments:
-h, --help show this help message and exit
--host HOST Reven host, as a string (default: "localhost")
-p PORT, --port PORT Reven port, as an int (default: 13370)
```
## Known limitations
N/A
## Supported versions
Reven 2.12+
## Supported perimeter
Any Windows x64 scenario.
## Dependencies
The script requires that the target Reven scenario have:
* The Memory History feature replayed.
* The Fast Search feature replayed.
* The OSSI feature replayed.
* An access to the binaries 'kernel32.dll' and 'kernelbase.dll' and their PDB files.
"""
class Process(object):
def __init__(self, name, pid, tid):
self.name = name
self.pid = pid
self.tid = tid
def created_processes(reven):
"""
Get all processes that were created during the trace (Windows 64 only).
This is based on the call to the `CreateProcessInternalW` function of kernelbase.dll` and `kernel32.dll`.
```
BOOL CreateProcessInternalW(
LPCWSTR lpApplicationName, (rdx)
LPWSTR lpCommandLine, (r8)
...,
LPPROCESS_INFORMATION lpProcessInformation (rsp + 0x58)
)
typedef struct _PROCESS_INFORMATION {
HANDLE hProcess; (+0x0)
HANDLE hThread; (+0x8)
DWORD dwProcessId; (+0x10)
DWORD dwThreadId; (+0x14)
} PROCESS_INFORMATION, *PPROCESS_INFORMATION, *LPPROCESS_INFORMATION;
```
Dependencies
============
The script requires that the target Reven scenario have:
* The Memory History feature replayed.
* The Fast Search feature replayed.
* The OSSI feature replayed.
* An access to the binaries 'kernel32.dll' and 'kernelbase.dll' and their PDB files.
"""
queries = [
rvn.trace.search.symbol(symbol)
for symbol in rvn.ossi.symbols(pattern="^CreateProcessInternalW$", binary_hint="kernelbase.dll")
]
queries += [
rvn.trace.search.symbol(symbol)
for symbol in rvn.ossi.symbols(pattern="^CreateProcessInternalW$", binary_hint="kernel32.dll")
]
for match in reven2.util.collate(queries):
call_tr = match.transition_before()
instruction = call_tr.instruction
if instruction is None:
# This case should not happen since the RIP register of the context after an exception transition
# is generally pointing to exception handling code, not `kernel32!CreateProcessInternalW` or
# `kernelbase!CreateProcessInternalW` code.
continue
if instruction.mnemonic != "call":
# Certainly comes from a code page fault on the call instruction.
# Never seen but possible.
continue
# Get process name from arguments lpApplicationName or lpCommandLine
try:
name = match.deref(
reven2.arch.x64.rdx,
reven2.types.Pointer(reven2.types.CString(encoding=reven2.types.Encoding.Utf16, max_size=256)),
)
except RuntimeError:
name = match.deref(
reven2.arch.x64.r8,
reven2.types.Pointer(reven2.types.CString(encoding=reven2.types.Encoding.Utf16, max_size=256)),
)
# Get pointer to PROCESS_INFORMATION struct
stack_pointer = match.read(reven2.arch.x64.rsp, reven2.types.Pointer(reven2.types.USize))
process_info_pointer = match.read(stack_pointer + 0x58, reven2.types.Pointer(reven2.types.USize))
# Go to the end of the function
return_tr = call_tr.find_inverse()
if return_tr is None:
# Create process does not finish before the end of the trace
continue
return_value = return_tr.context_after().read(reven2.arch.x64.rax)
if return_value == 0:
# Create process failed
continue
# Get PID and TID from PROCESS_INFORMATION structure
pid = return_tr.context_before().read(process_info_pointer + 0x10, 4)
tid = return_tr.context_before().read(process_info_pointer + 0x14, 4)
yield (call_tr, Process(name, pid, tid))
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--host", type=str, default="localhost", help='Reven host, as a string (default: "localhost")')
parser.add_argument("-p", "--port", type=int, default="13370", help="Reven port, as an int (default: 13370)")
args = parser.parse_args()
rvn = reven2.RevenServer(args.host, args.port)
rvn.ossi.os().expect(
reven2.ossi.Os(architecture=reven2.ossi.Architecture.X64, family=reven2.ossi.OsFamily.Windows)
)
for transition, process in created_processes(rvn):
print("#{}: name = {}, pid = {}, tid = {}".format(transition.id, process.name, process.pid, process.tid))
Thread id
Purpose
Detect the current thread at a point in the trace and find when the thread is created.
How to use
usage: thread_id.py [-h] [--host HOST] [-p PORT] TRANSITION_ID
positional arguments:
TRANSITION_ID Get thread id at transition (before)
optional arguments:
-h, --help show this help message and exit
--host HOST Reven host, as a string (default: "localhost")
-p PORT, --port PORT Reven port, as an int (default: 13370)
Known limitations
- Current thread is not detected if the given point is in ring0.
Supported versions
Reven 2.12+
Supported perimeter
Any Windows 10/11 on x86-64 scenario. Given point must be in a 64 bit process.
Dependencies
The script requires that the target Reven scenario have: * The Fast Search feature replayed. * The OSSI feature replayed. * An access to the binary 'ntdll.dll' and its PDB file.
Source
import argparse
import reven2
from reven2.address import LogicalAddress
from reven2.arch import x64
"""
# Thread id
## Purpose
Detect the current thread at a point in the trace
and find when the thread is created.
## How to use
```bash
usage: thread_id.py [-h] [--host HOST] [-p PORT] TRANSITION_ID
positional arguments:
TRANSITION_ID Get thread id at transition (before)
optional arguments:
-h, --help show this help message and exit
--host HOST Reven host, as a string (default: "localhost")
-p PORT, --port PORT Reven port, as an int (default: 13370)
```
## Known limitations
- Current thread is not detected if the given point is in ring0.
## Supported versions
Reven 2.12+
## Supported perimeter
Any Windows 10/11 on x86-64 scenario. Given point must be in a 64 bit process.
## Dependencies
The script requires that the target Reven scenario have:
* The Fast Search feature replayed.
* The OSSI feature replayed.
* An access to the binary 'ntdll.dll' and its PDB file.
"""
class ThreadInfo(object):
def __init__(self, ctxt):
self.cr3 = ctxt.read(x64.cr3)
self.pid = ctxt.read(LogicalAddress(0x40, x64.gs), 4)
self.tid = ctxt.read(LogicalAddress(0x48, x64.gs), 4)
def __eq__(self, other):
return (self.cr3, self.pid, self.tid) == (
other.cr3,
other.pid,
other.tid,
)
def __ne__(self, other):
return not self == other
def context_ring(ctxt):
return ctxt.read(x64.cs) & 0x3
def all_start_thread_calls(ossi, trace):
# look for RtlUserThreadStart
ntdll_dll = next(ossi.executed_binaries("c:/windows/system32/ntdll.dll"))
rtl_user_thread_start = next(ntdll_dll.symbols("RtlUserThreadStart"))
return trace.search.symbol(rtl_user_thread_start)
def thread_search_pc(trace, thread_info, pc, from_context=None, to_context=None):
matches = trace.search.pc(pc, from_context=from_context, to_context=to_context)
for ctxt in matches:
# ensure current match is in requested thread
if ThreadInfo(ctxt) == thread_info:
yield ctxt
def find_thread_starting_transition(rvn, thread_info):
for start_thread_ctxt in all_start_thread_calls(rvn.ossi, rvn.trace):
if ThreadInfo(start_thread_ctxt) == thread_info:
# the first argument is the start address of the thread
thread_start_address = start_thread_ctxt.read(x64.rcx)
matches = thread_search_pc(
rvn.trace,
thread_info,
pc=thread_start_address,
from_context=start_thread_ctxt,
)
for match in matches:
return match.transition_after()
return None
def print_thread_info(rvn, tr_id):
ctxt = rvn.trace.transition(tr_id).context_before()
if context_ring(ctxt) == 0:
print("(User) thread may not count in ring 0")
return
# pid, tid at the transition
thread = ThreadInfo(ctxt)
start_transition = find_thread_starting_transition(rvn, thread)
if start_transition is None:
print("TID: {thread.tid} (PID: {thread.pid}) starting transition not found".format(thread=thread))
return
print(
"TID: {thread.tid} (PID: {thread.pid}), starts at: {transition}".format(
thread=thread, transition=start_transition
)
)
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument(
"--host",
type=str,
default="localhost",
help='Reven host, as a string (default: "localhost")',
)
parser.add_argument(
"-p",
"--port",
type=int,
default="13370",
help="Reven port, as an int (default: 13370)",
)
parser.add_argument(
"transition_id",
metavar="TRANSITION_ID",
type=int,
help="Get thread id at transition (before)",
)
return parser.parse_args()
if __name__ == "__main__":
args = parse_args()
rvn = reven2.RevenServer(args.host, args.port)
rvn.ossi.os().expect(
reven2.ossi.Os(
architecture=reven2.ossi.Architecture.X64,
family=reven2.ossi.OsFamily.Windows,
windows_version=reven2.ossi.WindowsVersion.Windows10,
),
reven2.ossi.Os(
architecture=reven2.ossi.Architecture.X64,
family=reven2.ossi.OsFamily.Windows,
windows_version=reven2.ossi.WindowsVersion.Windows11,
),
)
tr_id = args.transition_id
print_thread_info(rvn, tr_id)
Search
Purpose
Search in a whole trace one of the following points of interest:
- An executed symbol.
- An executed binary.
- An executed virtual address.
How to use
usage: search.py [-h] [--host HOST] [-p PORT] [-s SYMBOL] [-b BINARY] [-a PC]
[--case-sensitive]
optional arguments:
-h, --help show this help message and exit
--host HOST Reven host, as a string (default: "localhost")
-p PORT, --port PORT Reven port, as an int (default: 13370)
-s SYMBOL, --symbol SYMBOL
symbol pattern
-b BINARY, --binary BINARY
binary pattern
-a PC, --pc PC pc address
--case-sensitive case sensitive symbol search
Known limitations
N/A
Supported versions
Reven 2.2+
Supported perimeter
Any Reven scenario.
Dependencies
The script requires that the target Reven scenario have:
- The Fast Search feature replayed.
- The OSSI feature replayed.
Source
import argparse
import reven2
"""
# Search
## Purpose
Search in a whole trace one of the following points of interest:
* An executed symbol.
* An executed binary.
* An executed virtual address.
## How to use
```bash
usage: search.py [-h] [--host HOST] [-p PORT] [-s SYMBOL] [-b BINARY] [-a PC]
[--case-sensitive]
optional arguments:
-h, --help show this help message and exit
--host HOST Reven host, as a string (default: "localhost")
-p PORT, --port PORT Reven port, as an int (default: 13370)
-s SYMBOL, --symbol SYMBOL
symbol pattern
-b BINARY, --binary BINARY
binary pattern
-a PC, --pc PC pc address
--case-sensitive case sensitive symbol search
```
## Known limitations
N/A
## Supported versions
Reven 2.2+
## Supported perimeter
Any Reven scenario.
## Dependencies
The script requires that the target Reven scenario have:
* The Fast Search feature replayed.
* The OSSI feature replayed.
"""
def search(reven_server, symbol=None, binary=None, pc=None, case_sensitive=False):
r"""
This function is a helper to search easily one of the following points of interest:
* executed symbols
* executed binaries
* an executed virtual address
The matching contexts are returned in ascending order.
Examples
========
>>> # Search for RIP = 0x7fff57263b2f
>>> for ctx in search(reven_server, pc=0x7fff57263b2f):
... print(ctx)
Context before #240135
Context before #281211
Context before #14608067
Context before #14690369
Context before #15756067
Context before #15787089
...
>>> # Search for binary "kernelbase.dll"
>>> for ctx in search(reven_server, binary=r'kernelbase\.dll'):
... print(ctx)
Context before #240135
Context before #240136
Context before #240137
Context before #240138
Context before #240139
Context before #240140
Context before #240141
...
>>> # Search for binaries that contains ".exe"
>>> for ctx in search(reven_server, binary=r'\.exe'):
... print(ctx)
Context before #1537879110
Context before #1537879111
Context before #1537879112
Context before #1537879113
Context before #1537879372
Context before #1537879373
Context before #1537879374
...
>>> # Search for all symbol symbols that contains "acpi"
>>> for ctx in search(reven_server, symbol='acpi'):
... print(ctx)
Context before #1471900961
Context before #1471903808
Context before #1471908093
Context before #1471914935
Context before #1472413834
Context before #1472416173
Context before #1472419063
...
>>> # Search for symbol "CreateProcessW" in binary "kernelbase.dll"
>>> for ctx in search(reven_server, symbol='^CreateProcessW$', binary=r'kernelbase\.dll'):
... print(ctx)
Context before #23886919
Context before #1370448535
Context before #2590849986
Information
===========
@param reven_server: A C{reven2.RevenServer} instance.
@param symbol: A symbol regex pattern.
Can be complete with the `binary` argument.
@param binary: A binary regex pattern.
@param pc: A virtual address integer.
@param case_sensitive: Whether the symbol pattern comparison is case sensitive or not.
@return: A generator of C{reven2.trace.Context} instances.
"""
search = reven_server.trace.search
if pc is not None:
return search.pc(pc)
if binary is not None:
if symbol is not None:
queries = [
search.symbol(rsymbol)
for rsymbol in reven_server.ossi.symbols(
pattern=symbol, binary_hint=binary, case_sensitive=case_sensitive
)
]
else:
queries = [search.binary(rbinary) for rbinary in reven_server.ossi.executed_binaries(pattern=binary)]
return reven2.util.collate(queries)
if symbol is not None:
queries = [
search.symbol(rsymbol)
for rsymbol in reven_server.ossi.symbols(pattern=symbol, case_sensitive=case_sensitive)
]
return reven2.util.collate(queries)
raise ValueError("You must provide something to search")
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--host", type=str, default="localhost", help='Reven host, as a string (default: "localhost")')
parser.add_argument("-p", "--port", type=int, default="13370", help="Reven port, as an int (default: 13370)")
parser.add_argument("-s", "--symbol", type=str, help="symbol pattern")
parser.add_argument("-b", "--binary", type=str, help="binary pattern")
parser.add_argument("-a", "--pc", type=lambda a: int(a, 0), help="pc address")
parser.add_argument("--case-sensitive", action="store_true", help="case sensitive symbol search")
args = parser.parse_args()
reven_server = reven2.RevenServer(args.host, args.port)
for ctx in search(
reven_server, symbol=args.symbol, binary=args.binary, pc=args.pc, case_sensitive=args.case_sensitive
):
try:
tr = ctx.transition_after()
print("#{}: {}".format(tr.id, ctx.ossi.location()))
except IndexError:
tr = ctx.transition_before()
print("#{}: {}".format(tr.id + 1, ctx.ossi.location()))
Trace symbol coverage
Purpose
Display a list of executed binaries and symbols in a Reven scenario, indicating how many transitions where spent in each symbol/binary.
How to use
usage: trace_coverage.py [-h] [--host HOST] [-p PORT] [-m MAX_TRANSITION]
optional arguments:
-h, --help show this help message and exit
--host HOST Reven host, as a string (default: "localhost")
-p PORT, --port PORT Reven port, as an int (default: 13370)
-m MAX_TRANSITION, --max-transition MAX_TRANSITION
Maximum number of transitions
Known limitations
N/A
Supported versions
Reven 2.2+
Supported perimeter
Any Reven scenario.
Dependencies
None.
Source
import argparse
import reven2 as reven
"""
# Trace symbol coverage
## Purpose
Display a list of executed binaries and symbols in a Reven scenario,
indicating how many transitions where spent in each symbol/binary.
## How to use
```bash
usage: trace_coverage.py [-h] [--host HOST] [-p PORT] [-m MAX_TRANSITION]
optional arguments:
-h, --help show this help message and exit
--host HOST Reven host, as a string (default: "localhost")
-p PORT, --port PORT Reven port, as an int (default: 13370)
-m MAX_TRANSITION, --max-transition MAX_TRANSITION
Maximum number of transitions
```
# Known limitations
N/A
## Supported versions
Reven 2.2+
## Supported perimeter
Any Reven scenario.
## Dependencies
None.
"""
def trace_coverage(reven_server, max_transition=None):
if max_transition is None:
max_transition = reven_server.trace.transition_count
else:
max_transition = min(max_transition, reven_server.trace.transition_count)
transition_id = 0
coverages = {}
while transition_id < max_transition:
ctx = reven_server.trace.context_before(transition_id)
transition_id += 1
asid = ctx.read(reven.arch.x64.cr3)
loc = ctx.ossi.location()
unknown = True if loc is None else False
binary = "unknown" if unknown else loc.binary.path
symbol = "unknown" if unknown or loc.symbol is None else loc.symbol.name
try:
asid_coverage = coverages[asid]
except KeyError:
coverages[asid] = {}
asid_coverage = coverages[asid]
try:
binary_coverage = asid_coverage[binary]
binary_coverage[0] += 1
if binary == "unknown":
continue
except KeyError:
if binary == "unknown":
asid_coverage[binary] = [1, None]
continue
asid_coverage[binary] = [1, {}]
binary_coverage = asid_coverage[binary]
try:
binary_coverage[1][symbol] += 1
except KeyError:
binary_coverage[1][symbol] = 1
return coverages
def print_coverages(coverages):
for asid, asid_coverage in coverages.items():
print("***** Coverage for CR3 = {:#x} *****\n".format(asid))
for binary, binary_coverage in asid_coverage.items():
print("- {}: {}".format(binary, binary_coverage[0]))
if binary_coverage[1] is None:
continue
for symbol, symbol_coverage in binary_coverage[1].items():
print(" - {}: {}".format(symbol, symbol_coverage))
print("\n")
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--host", type=str, default="localhost", help='Reven host, as a string (default: "localhost")')
parser.add_argument("-p", "--port", type=int, default="13370", help="Reven port, as an int (default: 13370)")
parser.add_argument("-m", "--max-transition", type=int, help="Maximum number of transitions")
args = parser.parse_args()
reven_server = reven.RevenServer(args.host, args.port)
coverages = trace_coverage(reven_server, args.max_transition)
print_coverages(coverages)
Vulnerability detection
Examples in this section attempt to detect some vulnerabilities present in Reven scenarios.
Use-after-Free vulnerabilities detection
This notebook allows to search for potential Use-after-Free vulnerabilities in a Reven trace.
Prerequisites
- This notebook should be run in a jupyter notebook server equipped with a Revenpython kernel.
Reven comes with a jupyter notebook server accessible with the
Open Python
button in theAnalyze
page of any scenario. - This notebook depends on
capstone
being installed in the Revenpython kernel. To install capstone in the current environment, please execute the capstone cell of this notebook. - This notebook requires the Memory History resource for your target scenario.
Running the notebook
Fill out the parameters in the Parameters cell below, then run all the cells of this notebook.
Note
Although the script is designed to limit false positive results, you may still encounter a few ones. Please check the results and feel free to report any issue, be it a false positive or a false negative ;-).
Source
# -*- coding: utf-8 -*-
# ---
# jupyter:
# jupytext:
# formats: ipynb,py:percent
# text_representation:
# extension: .py
# format_name: percent
# kernelspec:
# display_name: reven
# language: python
# name: reven-python3
# ---
# %% [markdown]
# # Use-after-Free vulnerabilities detection
#
# This notebook allows to search for potential Use-after-Free vulnerabilities in a Reven trace.
#
# ## Prerequisites
#
# - This notebook should be run in a jupyter notebook server equipped with a Revenpython kernel.
# Reven comes with a jupyter notebook server accessible with the `Open Python` button in the `Analyze` page of any
# scenario.
#
# - This notebook depends on `capstone` being installed in the Revenpython kernel.
# To install capstone in the current environment, please execute the [capstone cell](#Capstone-Installation) of this
# notebook.
#
# - This notebook requires the Memory History resource for your target scenario.
#
#
# ## Running the notebook
#
# Fill out the parameters in the [Parameters cell](#Parameters) below, then run all the cells of this notebook.
#
#
# ## Note
#
# Although the script is designed to limit false positive results, you may still encounter a few ones. Please check
# the results and feel free to report any issue, be it a false positive or a false negative ;-).
# %% [markdown]
# # Capstone Installation
#
# Check for capstone's presence. If missing, attempt to get it from pip
# %%
try:
import capstone
print("capstone already installed")
except ImportError:
print("Could not find capstone, attempting to install it from pip")
import sys
import subprocess
command = [f"{sys.executable}", "-m", "pip", "install", "capstone"]
p = subprocess.run(command)
if int(p.returncode) != 0:
raise RuntimeError("Error installing capstone")
import capstone # noqa
print("Successfully installed capstone")
# %% [markdown]
# # Parameters
# %%
# Server connection
# Host of the Reven server running the scenario.
# When running this notebook from the Project Manager, '127.0.0.1' should be the correct value.
reven_backend_host = '127.0.0.1'
# Port of the Reven server running the scenario.
# After starting a Reven server on your scenario, you can get its port on the Analyze page of that scenario.
reven_backend_port = 13370
# Range control
# First transition considered for the detection of allocation/deallocation pairs
# If set to None, then the first transition of the trace
uaf_from_tr = None
# uaf_from_tr = 3984021008 # ex: vlc scenario
# uaf_from_tr = 8655210 # ex: bksod scenario
# First transition **not** considered for the detection of allocation/deallocation pairs
# If set to None, then the last transition of the trace
uaf_to_tr = None
# uaf_to_tr = 3986760400 # ex: vlc scenario
# uaf_to_tr = 169673257 # ex: bksod scenario
# Filter control
# Beware that, when filtering, if an allocation happens in the specified process and/or binary,
# the script will fail if the deallocation happens in a different process and/or binary.
# This issue should only happen for allocations in the kernel.
# Specify on which PID the allocation/deallocation pairs should be kept.
# Set a value of None to not filter on the PID.
faulty_process_pid = None
# faulty_process_pid = 2620 # ex: vlc scenario
# faulty_process_pid = None # ex: bksod
# We can't filter on process for BKSOD as we have alloc in a process and free in another process
# As termdd.sys and the process svchost.exe (pid: 1004) is using kernel workers to free some resources
# So the allocs are done in the process svchost.exe and some of the frees in the "System" process (pid: 4)
# Specify on which process name the allocation/deallocation pairs should be kept.
# Set a value of None to not filter on the process name.
# If both a process PID and a process name are specified, please make sure that they both
# refer to the same process, otherwise all allocations will be filtered and no results
# will be produced.
faulty_process_name = None
# faulty_process_name = "vlc.exe" # ex: vlc scenario
# faulty_process_name = None # ex: bksod scenario
# Specify on which binary name the allocation/deallocation pairs should be kept.
# Set a value of None to not filter on the binary name.
# Only allocation/deallocation taking place in the binaries whose filename,
# path or name contain the specified value are kept.
# If filtering on both a process and a binary, please make sure that there are
# allocations taking place in that binary in the selected process, otherwise all
# allocations will be filtered and no result will be produced.
faulty_binary = None
# faulty_binary = "libmkv_plugin.dll" # ex: vlc scenario
# faulty_binary = "termdd.sys" # ex: bksod scenario
# Address control
# Specify a **physical** address suspected of being faulty here,
# to only test UaF for this specific address, instead of all (filtered) allocations.
# The address should still be returned by an allocation/deallocation pair.
# To get a physical address from a virtual address, find a context where the address
# is mapped, then use `virtual_address.translate(ctx)`.
uaf_faulty_physical_address = None
# uaf_faulty_physical_address = 0x5a36cd20 # ex: vlc scenario
# uaf_faulty_physical_address = 0x7fb65010 # ex: bksod scenario
# Allocator control
# The script can use two allocators to find allocation/deallocation pairs.
# The following booleans allow to enable the search for allocations by these
# allocators for a scenario.
# Generally it is expected to have only a single allocator enabled for a given
# scenario.
# To add your own allocator, please look at how the two provided allocators were
# added.
# Whether or not to look for malloc/free allocation/deallocation pairs.
search_malloc = True
# search_malloc = True # ex: vlc scenario, user space scenario
# search_malloc = False # ex: bksod scenario
# Whether or not to look for ExAllocatePoolWithTag/ExFreePoolWithTag
# allocation/deallocation pairs.
# This allocator is used by the Windows kernel.
search_pool_allocation = False
# search_pool_allocation = False # ex: vlc scenario
# search_pool_allocation = True # ex: bksod scenario, kernel scenario
# Taint control
# Technical parameter: number of accesses in a taint after which the script gives up on
# that particular taint.
# As long taints degrade the performance of the script significantly, it is recommended to
# give up on a taint after it exceeds a certain number of operations.
# If you experience missing results, try increasing the value.
# If you experience a very long runtime for the script, try decreasing the value.
# The default value should be appropriate for most cases.
taint_max_length = 100000
# %%
import itertools # noqa: E402
from collections import OrderedDict # noqa: E402
from typing import Dict, List # noqa: E402
import reven2 # noqa: E402
import reven2.preview.taint # noqa: E402
# %%
# Python script to connect to this scenario:
server = reven2.RevenServer(reven_backend_host, reven_backend_port)
print(server.trace.transition_count)
# %%
class MemoryRange:
page_size = 4096
page_mask = ~(page_size - 1)
def __init__(self, logical_address, size):
self.logical_address = logical_address
self.size = size
self.pages = [{
'logical_address': self.logical_address,
'size': self.size,
'physical_address': None,
'ctx_physical_address_mapped': None,
}]
# Compute the pages
while (((self.pages[-1]['logical_address'] & ~MemoryRange.page_mask) + self.pages[-1]['size'] - 1)
>= MemoryRange.page_size):
# Compute the size of the new page
new_page_size = ((self.pages[-1]['logical_address'] & ~MemoryRange.page_mask) + self.pages[-1]['size']
- MemoryRange.page_size)
# Reduce the size of the previous page and create the new one
self.pages[-1]['size'] -= new_page_size
self.pages.append({
'logical_address': self.pages[-1]['logical_address'] + self.pages[-1]['size'],
'size': new_page_size,
'physical_address': None,
'ctx_physical_address_mapped': None,
})
def try_translate_first_page(self, ctx):
if self.pages[0]['physical_address'] is not None:
return True
physical_address = reven2.address.LogicalAddress(self.pages[0]['logical_address']).translate(ctx)
if physical_address is None:
return False
self.pages[0]['physical_address'] = physical_address.offset
self.pages[0]['ctx_physical_address_mapped'] = ctx
return True
def try_translate_all_pages(self, ctx):
return_value = True
for page in self.pages:
if page['physical_address'] is not None:
continue
physical_address = reven2.address.LogicalAddress(page['logical_address']).translate(ctx)
if physical_address is None:
return_value = False
continue
page['physical_address'] = physical_address.offset
page['ctx_physical_address_mapped'] = ctx
return return_value
def is_physical_address_range_in_translated_pages(self, physical_address, size):
for page in self.pages:
if page['physical_address'] is None:
continue
if (
physical_address >= page['physical_address']
and physical_address + size <= page['physical_address'] + page['size']
):
return True
return False
def __repr__(self):
return "MemoryRange(0x%x, %d)" % (self.logical_address, self.size)
# Utils to translate the physical address of an address allocated just now
# - ctx should be the ctx where the address is located in `rax`
# - memory_range should be the range of memory of the newly allocated buffer
#
# We are using the translate API to translate it but sometimes just after the allocation
# the address isn't mapped yet. For that we are using the slicing and for all slice access
# we are trying to translate the address.
def translate_first_page_of_allocation(ctx, memory_range):
if memory_range.try_translate_first_page(ctx):
return
tainter = reven2.preview.taint.Tainter(server.trace)
taint = tainter.simple_taint(
tag0="rax",
from_context=ctx,
to_context=None,
is_forward=True
)
for access in taint.accesses(changes_only=False).all():
if memory_range.try_translate_first_page(access.transition.context_after()):
taint.cancel()
return
raise RuntimeError("Couldn't find the physical address of the first page")
# %%
class AllocEvent:
def __init__(self, memory_range, tr_begin, tr_end):
self.memory_range = memory_range
self.tr_begin = tr_begin
self.tr_end = tr_end
class FreeEvent:
def __init__(self, logical_address, tr_begin, tr_end):
self.logical_address = logical_address
self.tr_begin = tr_begin
self.tr_end = tr_end
def retrieve_events_for_symbol(
alloc_dict,
event_class,
symbol,
retrieve_event_info,
event_filter=None,
):
for ctx in server.trace.search.symbol(
symbol,
from_context=None if uaf_from_tr is None else server.trace.context_before(uaf_from_tr),
to_context=None if uaf_to_tr is None else server.trace.context_before(uaf_to_tr)
):
# We don't want hit on exception (code pagefault, hardware interrupts, etc)
if ctx.transition_after().exception is not None:
continue
previous_location = (ctx - 1).ossi.location()
previous_process = (ctx - 1).ossi.process()
# Filter by process pid/process name/binary name
# Filter by process pid
if faulty_process_pid is not None and previous_process.pid != faulty_process_pid:
continue
# Filter by process name
if faulty_process_name is not None and previous_process.name != faulty_process_name:
continue
# Filter by binary name / filename / path
if faulty_binary is not None and faulty_binary not in [
previous_location.binary.name,
previous_location.binary.filename,
previous_location.binary.path
]:
continue
# Filter the event with the argument filter
if event_filter is not None:
if event_filter(ctx.ossi.location(), previous_location):
continue
# Retrieve the call/ret
# The heuristic is that the ret is the end of our function
# - If the call is inlined it should be at the end of the caller function, so the ret is the ret of our
# function
# - If the call isn't inlined, the ret should be the ret of our function
ctx_call = next(ctx.stack.frames()).creation_transition.context_after()
tr_ret = ctx_call.transition_before().find_inverse()
# Finding the inverse operation can fail
# for instance if the end of the allocation function was not recorded in the trace
if tr_ret is None:
continue
ctx_ret = tr_ret.context_before()
# Build the event by reading the needed registers
if event_class == AllocEvent:
current_address, size = retrieve_event_info(ctx, ctx_ret)
# Filter the alloc failing
if current_address == 0x0:
continue
memory_range = MemoryRange(current_address, size)
translate_first_page_of_allocation(ctx_ret, memory_range)
if memory_range.pages[0]['physical_address'] not in alloc_dict:
alloc_dict[memory_range.pages[0]['physical_address']] = []
alloc_dict[memory_range.pages[0]['physical_address']].append(
AllocEvent(
memory_range,
ctx.transition_after(), ctx_ret.transition_after()
)
)
elif event_class == FreeEvent:
current_address = retrieve_event_info(ctx, ctx_ret)
# Filter the free of NULL
if current_address == 0x0:
continue
current_physical_address = reven2.address.LogicalAddress(current_address).translate(ctx).offset
if current_physical_address not in alloc_dict:
alloc_dict[current_physical_address] = []
alloc_dict[current_physical_address].append(
FreeEvent(
current_address,
ctx.transition_after(), ctx_ret.transition_after()
)
)
else:
raise RuntimeError("Unknown event class: %s" % event_class.__name__)
# %%
# %%time
alloc_dict: Dict = {}
# Basic functions to retrieve the arguments
# They are working for the allocations/frees functions but won't work for all functions
# Particularly because on x86 we don't handle the size of the arguments
# nor if they are pushed left to right or right to left
def retrieve_first_argument(ctx):
if ctx.is64b():
return ctx.read(reven2.arch.x64.rcx)
else:
esp = ctx.read(reven2.arch.x64.esp)
return ctx.read(reven2.address.LogicalAddress(esp + 4, reven2.arch.x64.ss), 4)
def retrieve_second_argument(ctx):
if ctx.is64b():
return ctx.read(reven2.arch.x64.rdx)
else:
esp = ctx.read(reven2.arch.x64.esp)
return ctx.read(reven2.address.LogicalAddress(esp + 8, reven2.arch.x64.ss), 4)
def retrieve_return_value(ctx):
if ctx.is64b():
return ctx.read(reven2.arch.x64.rax)
else:
return ctx.read(reven2.arch.x64.eax)
def retrieve_alloc_info_with_size_as_first_argument(ctx_begin, ctx_end):
return (
retrieve_return_value(ctx_end),
retrieve_first_argument(ctx_begin)
)
def retrieve_alloc_info_with_size_as_second_argument(ctx_begin, ctx_end):
return (
retrieve_return_value(ctx_end),
retrieve_second_argument(ctx_begin)
)
def retrieve_alloc_info_for_calloc(ctx_begin, ctx_end):
return (
retrieve_return_value(ctx_end),
retrieve_first_argument(ctx_begin) * retrieve_second_argument(ctx_begin)
)
def retrieve_free_info_with_address_as_first_argument(ctx_begin, ctx_end):
return retrieve_first_argument(ctx_begin)
if search_malloc:
def filter_in_realloc(location, caller_location):
return location.binary == caller_location.binary and caller_location.symbol.name == "realloc"
# Search for allocations with malloc
for symbol in server.ossi.symbols(r'^_?malloc$', binary_hint=r'msvcrt.dll'):
retrieve_events_for_symbol(alloc_dict, AllocEvent, symbol, retrieve_alloc_info_with_size_as_first_argument,
filter_in_realloc)
# Search for allocations with calloc
for symbol in server.ossi.symbols(r'^_?calloc(_crt)?$', binary_hint=r'msvcrt.dll'):
retrieve_events_for_symbol(alloc_dict, AllocEvent, symbol, retrieve_alloc_info_for_calloc)
# Search for deallocations with free
for symbol in server.ossi.symbols(r'^_?free$', binary_hint=r'msvcrt.dll'):
retrieve_events_for_symbol(alloc_dict, FreeEvent, symbol, retrieve_free_info_with_address_as_first_argument,
filter_in_realloc)
# Search for re-allocations with realloc
for symbol in server.ossi.symbols(r'^_?realloc$', binary_hint=r'msvcrt.dll'):
retrieve_events_for_symbol(alloc_dict, AllocEvent, symbol, retrieve_alloc_info_with_size_as_second_argument)
retrieve_events_for_symbol(alloc_dict, FreeEvent, symbol, retrieve_free_info_with_address_as_first_argument)
if search_pool_allocation:
# Search for allocations with ExAllocatePool...
def filter_ex_allocate_pool(location, caller_location):
return location.binary == caller_location.binary and caller_location.symbol.name.startswith("ExAllocatePool")
for symbol in server.ossi.symbols(r'^ExAllocatePool', binary_hint=r'ntoskrnl.exe'):
retrieve_events_for_symbol(alloc_dict, AllocEvent, symbol, retrieve_alloc_info_with_size_as_second_argument,
filter_ex_allocate_pool)
# Search for deallocations with ExFreePool...
def filter_ex_free_pool(location, caller_location):
return location.binary == caller_location.binary and caller_location.symbol.name == "ExFreePool"
for symbol in server.ossi.symbols(r'^ExFreePool', binary_hint=r'ntoskrnl.exe'):
retrieve_events_for_symbol(alloc_dict, FreeEvent, symbol, retrieve_free_info_with_address_as_first_argument,
filter_ex_free_pool)
# Sort the events per address and event type
for physical_address in alloc_dict.keys():
alloc_dict[physical_address] = list(sorted(
alloc_dict[physical_address],
key=lambda event: (event.tr_begin.id, 0 if isinstance(event, FreeEvent) else 1)
))
# Sort the dict by address
alloc_dict = OrderedDict(sorted(alloc_dict.items()))
# %%
def get_alloc_free_pairs(events, errors=None):
previous_event = None
for event in events:
if isinstance(event, AllocEvent):
if previous_event is None:
pass
elif isinstance(previous_event, AllocEvent):
if errors is not None:
errors.append("Two consecutives allocs found")
elif isinstance(event, FreeEvent):
if previous_event is None:
continue
elif isinstance(previous_event, FreeEvent):
if errors is not None:
errors.append("Two consecutives frees found")
elif isinstance(previous_event, AllocEvent):
yield (previous_event, event)
else:
assert 0, ("Unknown event type: %s" % type(event))
previous_event = event
# %%
# %%time
# Basic checks of the events
for physical_address, events in alloc_dict.items():
for event in events:
if not isinstance(event, AllocEvent) and not isinstance(event, FreeEvent):
raise RuntimeError("Unknown event type: %s" % type(event))
errors: List[str] = []
for (alloc_event, free_event) in get_alloc_free_pairs(events, errors):
# Check the uniformity of the logical address between the alloc and the free
if alloc_event.memory_range.logical_address != free_event.logical_address:
errors.append("Phys:0x%x: Alloc #%d - Free #%d with different logical address: 0x%x != 0x%x" % (
physical_address,
alloc_event.tr_begin.id, free_event.tr_begin.id,
alloc_event.memory_range.logical_address, free_event.logical_address))
# Check size of 0x0
if alloc_event.memory_range.size == 0x0 or alloc_event.memory_range.size is None:
errors.append("Phys:0x%x: Alloc #%d - Free #%d with weird size %s" % (
physical_address,
alloc_event.tr_begin.id, free_event.tr_begin.id,
alloc_event.memory_range.size))
if len(errors) > 0:
print("Phys:0x%x: Error(s) detected:" % (physical_address))
for error in errors:
print(" - %s" % error)
# %%
# Print the events
for physical_address, events in alloc_dict.items():
print("Phys:0x%x" % (physical_address))
print(" Events:")
for event in events:
if isinstance(event, AllocEvent):
print(" - Alloc at #%d (0x%x of size 0x%x)" % (event.tr_begin.id,
event.memory_range.logical_address, event.memory_range.size))
elif isinstance(event, FreeEvent):
print(" - Free at #%d (0x%x)" % (event.tr_begin.id, event.logical_address))
print(" Pairs:")
for (alloc_event, free_event) in get_alloc_free_pairs(events):
print(" - Allocated at #%d (0x%x of size 0x%x) and freed at #%d (0x%x)" % (alloc_event.tr_begin.id,
alloc_event.memory_range.logical_address, alloc_event.memory_range.size, free_event.tr_begin.id,
free_event.logical_address))
print()
# %%
# This function is used to ignore the changes from the `free` in the taint of the address
# as the `free` will store the address on some internal structures re-used to malloc
# others addresses leading to false positives
def start_alloc_address_taint(server, alloc_event, free_event):
# Setup the first taint [alloc; free]
tainter = reven2.preview.taint.Tainter(server.trace)
taint = tainter.simple_taint(
tag0="rax" if alloc_event.tr_end.mode == reven2.trace.Mode.X86_64 else "eax",
from_context=alloc_event.tr_end.context_before(),
to_context=free_event.tr_begin.context_before() + 1,
is_forward=True
)
# Setup the second taint [free; [
state_before_free = taint.state_at(free_event.tr_begin.context_before())
# `rcx` is lost during the execution of the free in theory
# `rsp` is non-useful if we have it
tag0_regs = filter(
lambda x: x[0].register.name not in ['rcx', 'rsp'],
state_before_free.tainted_registers()
)
# We don't want to keep memory inside the allocated object as accessing them will trigger a UAF anyway
# It is also used to remove some false-positive because the memory will be used by the alloc/free functions
# TODO: Handle addresses different than PhysicalAddress (is that even possible?)
tag0_mems = filter(
lambda x: not alloc_event.memory_range.is_physical_address_range_in_translated_pages(
x[0].address.offset, x[0].size),
state_before_free.tainted_memories()
)
# Only keep the slices
tag0 = map(
lambda x: x[0],
itertools.chain(tag0_regs, tag0_mems)
)
tainter = reven2.preview.taint.Tainter(server.trace)
return tainter.simple_taint(
tag0=list(tag0),
from_context=free_event.tr_end.context_before(),
to_context=None,
is_forward=True
)
# %%
# Capstone utilities
def get_reven_register_from_name(name):
for reg in reven2.arch.helpers.x64_registers():
if reg.name == name:
return reg
raise RuntimeError("Unknown register: %s" % name)
def read_reg(tr, reg):
if reg in [reven2.arch.x64.rip, reven2.arch.x64.eip]:
return tr.pc
else:
return tr.context_before().read(reg)
def compute_dereferenced_address(tr, cs_insn, cs_op):
dereferenced_address = 0
if cs_op.value.mem.base != 0:
base_reg = get_reven_register_from_name(cs_insn.reg_name(cs_op.value.mem.base))
dereferenced_address += read_reg(tr, base_reg)
if cs_op.value.mem.index != 0:
index_reg = get_reven_register_from_name(cs_insn.reg_name(cs_op.value.mem.index))
dereferenced_address += (cs_op.value.mem.scale * read_reg(tr, index_reg))
dereferenced_address += cs_op.value.mem.disp
return dereferenced_address & 0xFFFFFFFFFFFFFFFF
# %%
def uaf_analyze_function(physical_address, alloc_events):
uaf_count = 0
# Setup capstone
md_64 = capstone.Cs(capstone.CS_ARCH_X86, capstone.CS_MODE_64)
md_64.detail = True
md_32 = capstone.Cs(capstone.CS_ARCH_X86, capstone.CS_MODE_32)
md_32.detail = True
errors = []
for (alloc_event, free_event) in get_alloc_free_pairs(alloc_events, errors):
# Get the memory accesses access of the allocated block after the free
# The optimization is disabled if we can't translate all the pages of the memory range
mem_access = None
mem_accesses = None
if alloc_event.memory_range.try_translate_all_pages(free_event.tr_begin.context_before()):
mem_accesses = reven2.util.collate(map(
lambda page: server.trace.memory_accesses(
reven2.address.PhysicalAddress(page['physical_address']),
page['size'],
from_transition=free_event.tr_end,
to_transition=None,
is_forward=True,
operation=None
),
alloc_event.memory_range.pages
), key=lambda access: access.transition)
try:
mem_access = next(mem_accesses)
except StopIteration:
continue
else:
print("Phys:0x%x: Allocated at #%d (0x%x of size 0x%x) and freed at #%d (0x%x)" % (physical_address,
alloc_event.tr_begin.id, alloc_event.memory_range.logical_address, alloc_event.memory_range.size,
free_event.tr_begin.id, free_event.logical_address))
print(" Warning: Memory history optimization disabled because we couldn't "
"translate all the memory pages")
print()
# Setup the slicing
taint = start_alloc_address_taint(server, alloc_event, free_event)
# Iterate on the slice
access_count = 0
for access in taint.accesses(changes_only=False).all():
access_count += 1
access_transition = access.transition
if access_count > taint_max_length:
print("Phys:0x%x: Allocated at #%d (0x%x of size 0x%x) and freed at #%d (0x%x)" % (physical_address,
alloc_event.tr_begin.id, alloc_event.memory_range.logical_address, alloc_event.memory_range.size,
free_event.tr_begin.id, free_event.logical_address))
print(" Warning: Allocation skipped: post-free taint stopped after %d accesses" % access_count)
print()
break
if mem_accesses is not None:
# Check that we have an access on the same transition as the taint access
# if not the memory operand won't be an UAF anyway so we can skip it
while mem_access.transition < access_transition:
try:
mem_access = next(mem_accesses)
except StopIteration:
break
if mem_access.transition != access_transition:
continue
md = md_64 if access_transition.mode == reven2.trace.Mode.X86_64 else md_32
cs_insn = next(md.disasm(access_transition.instruction.raw, access_transition.instruction.size))
# Skip `lea` instructions are they are not really memory read/write and the taint
# will propagate the taint anyway so that we will see the dereference of the computed value
if cs_insn.mnemonic == "lea":
continue
registers_in_state = {}
for reg_slice, _ in access.state_before().tainted_registers():
registers_in_state[reg_slice.register.name] = reg_slice
for cs_op in cs_insn.operands:
if cs_op.type != capstone.x86.X86_OP_MEM:
continue
uaf_reg = None
if cs_op.value.mem.base != 0:
base_reg_name = cs_insn.reg_name(cs_op.value.mem.base)
if base_reg_name in registers_in_state:
uaf_reg = registers_in_state[base_reg_name]
if uaf_reg is None and cs_op.value.mem.index != 0:
index_reg_name = cs_insn.reg_name(cs_op.value.mem.index)
if index_reg_name in registers_in_state:
uaf_reg = registers_in_state[index_reg_name]
if uaf_reg is None:
continue
dereferenced_address = compute_dereferenced_address(access_transition, cs_insn, cs_op)
if mem_accesses is None:
# As we don't have the memory access optimization we need to check if the dereferenced address
# is in the allocated buffer
# We only check on the translated pages as the taint won't return an access with a pagefault
# so the dereferenced address should be translated
dereferenced_physical_address = reven2.address.LogicalAddress(dereferenced_address).translate(
access_transition.context_before())
if dereferenced_physical_address is None:
continue
if not alloc_event.memory_range.is_physical_address_range_in_translated_pages(
dereferenced_physical_address.offset, 1):
continue
print("Phys:0x%x: Allocated at #%d (0x%x of size 0x%x) and freed at #%d (0x%x)" % (physical_address,
alloc_event.tr_begin.id, alloc_event.memory_range.logical_address, alloc_event.memory_range.size,
free_event.tr_begin.id, free_event.logical_address))
print(" UAF coming from reg %s[%d-%d] leading to dereferenced address = 0x%x" % (
uaf_reg.register.name, uaf_reg.begin, uaf_reg.end, dereferenced_address))
print(" ", end="")
print(access_transition, end=" ")
print(access_transition.context_before().ossi.location())
print(" Accessed %d transitions after the free" % (access_transition.id - free_event.tr_end.id))
print()
uaf_count += 1
if len(errors) > 0:
print("Phys:0x%x: Error(s) detected:" % (physical_address))
for error in errors:
print(" - %s" % error)
return uaf_count
# %%
# %%time
uaf_count = 0
if uaf_faulty_physical_address is None:
for physical_address, alloc_events in alloc_dict.items():
uaf_count += uaf_analyze_function(physical_address, alloc_events)
else:
if uaf_faulty_physical_address not in alloc_dict:
raise KeyError("The passed physical address was not detected during the allocation search")
uaf_count += uaf_analyze_function(uaf_faulty_physical_address, alloc_dict[uaf_faulty_physical_address])
print("---------------------------------------------------------------------------------")
uaf_begin_range = "the beginning of the trace" if uaf_from_tr is None else "#{}".format(uaf_to_tr)
uaf_end_range = "the end of the trace" if uaf_to_tr is None else "#{}".format(uaf_to_tr)
uaf_range = ("on the whole trace" if uaf_from_tr is None and uaf_to_tr is None else
"between {} and {}".format(uaf_begin_range, uaf_end_range))
uaf_range_size = server.trace.transition_count
if uaf_from_tr is not None:
uaf_range_size -= uaf_from_tr
if uaf_to_tr is not None:
uaf_range_size -= server.trace.transition_count - uaf_to_tr
if uaf_faulty_physical_address is None:
searched_memory_addresses = "with {} searched memory addresses".format(len(alloc_dict))
else:
searched_memory_addresses = "on {:#x}".format(uaf_faulty_physical_address)
print("{} UAF(s) found {} ({} transitions) {}".format(
uaf_count, uaf_range, uaf_range_size, searched_memory_addresses
))
print("---------------------------------------------------------------------------------")
# %%
Detecting critical section deadlocks
This notebook checks for potential deadlocks that may occur when using critical sections as the synchronization primitive.
Prerequisites
- This notebook should be run in a jupyter notebook server equipped with a Reven python kernel.
Reven comes with a jupyter notebook server accessible with the
Open Python
button in theAnalyze
page of any scenario. - The following resources must be replayed for the analyzed scenario:
- Trace
- OSSI
- Fast Search
Limits
- Only support
RtlEnterCriticalSection
andRtlLeaveCriticalSection
as the lock (resp. unlock) primitive. - Locks and unlocks must be not nested: a critical section can be locked and (or) unlocked then, but it must not be relocked multiple times, e.g.
- lock A, lock B, unlock A, lock A => OK
- lock A, lock B, lock A => not OK (since A is locked again)
Running
Fill out the parameters in the Parameters cell below, then run all the cells of this notebook.
Source
# ---
# jupyter:
# jupytext:
# formats: ipynb,py:percent
# text_representation:
# extension: .py
# format_name: percent
# kernelspec:
# display_name: reven
# language: python
# name: reven-python3
# ---
# %% [markdown]
# # Detecting critical section deadlocks
# This notebook checks for potential deadlocks that may occur when using critical sections as the synchronization
# primitive.
#
# ## Prerequisites
# - This notebook should be run in a jupyter notebook server equipped with a Reven python kernel.
# Reven comes with a jupyter notebook server accessible with the `Open Python` button
# in the `Analyze` page of any scenario.
# - The following resources must be replayed for the analyzed scenario:
# - Trace
# - OSSI
# - Fast Search
#
# ## Limits
# - Only support `RtlEnterCriticalSection` and `RtlLeaveCriticalSection` as the lock (resp. unlock) primitive.
# - Locks and unlocks must be not nested: a critical section can be locked and (or) unlocked then, but it must not be
# relocked multiple times, e.g.
# - lock A, lock B, unlock A, lock A => OK
# - lock A, lock B, lock A => not OK (since A is locked again)
#
# ## Running
# Fill out the parameters in the [Parameters cell](#Parameters) below, then run all the cells of this notebook.
# %%
# For Python's type annotation
from typing import Dict, Iterator, List, Optional, Set, Tuple
# Reven specific
import reven2
from reven2.address import LogicalAddress
from reven2.arch import x64
from reven2.trace import Context, Trace, Transition
from reven2.util import collate
# %% [markdown]
# # Parameters
# %%
# Host and port of the running the scenario
host = '127.0.0.1'
port = 35083
# The PID and (or) the name of the binary of interest: if the binary name is given (not None),
# only locks and unlocks called directly from the binary are counted.
pid = 2044
binary = None
# The begin and end transition numbers between them the deadlock detection processes.
begin_trans_id = None # if None, start from the first transition of the trace
end_trans_id = None # if None, stop at the last transition
# %%
# Helper class which wraps Reven's runtime objects and give methods helping get information about calls to
# RtlEnterCriticalSection and RtlLeaveCriticalSection
class RuntimeHelper:
def __init__(self, host: str, port: int):
try:
server = reven2.RevenServer(host, port)
except RuntimeError:
raise RuntimeError(f'Cannot connect to the scenario at {host}:{port}')
self.trace = server.trace
self.ossi = server.ossi
self.search_symbol = self.trace.search.symbol
self.search_binary = self.trace.search.binary
try:
ntdll = next(self.ossi.executed_binaries('^c:/windows/system32/ntdll.dll$'))
except StopIteration:
raise RuntimeError('ntdll.dll not found')
try:
self.__rtl_enter_critical_section = next(ntdll.symbols("^RtlEnterCriticalSection$"))
self.__rtl_leave_critical_section = next(ntdll.symbols("^RtlLeaveCriticalSection$"))
except StopIteration:
raise RuntimeError('Rtl(Enter|Leave)CriticalSection symbol not found')
def get_critical_section_locks(self, from_context: Context, to_context: Context) -> Iterator[Context]:
# For ease, if the from_context is the first context of the trace and also the beginning of a lock,
# then omit it since the caller is unknown
if from_context == self.trace.first_context:
if from_context == self.trace.last_context:
return
else:
from_context = from_context + 1
for ctxt in self.search_symbol(self.__rtl_enter_critical_section, from_context, to_context):
# Any context correspond to the entry of RtlEnterCriticalSection, since the first context
# does not count, decrease by 1 is safe.
yield ctxt - 1
def get_critical_section_unlocks(self, from_context: Context, to_context: Context) -> Iterator[Context]:
# For ease, if the from_context is the first context of the trace and also the beginning of an unlock,
# then omit it since the caller is unknown
if from_context == self.trace.first_context:
if from_context == self.trace.last_context:
return
else:
from_context = from_context + 1
for ctxt in self.search_symbol(self.__rtl_leave_critical_section, from_context, to_context):
# Any context correspond to the entry of RtlLeaveCriticalSection, since the first context
# does not count, decrease by 1 is safe.
yield ctxt - 1
@staticmethod
def get_critical_section_handle(ctxt: Context) -> int:
return ctxt.read(x64.rcx)
@staticmethod
def thread_id(ctxt: Context) -> int:
return ctxt.read(LogicalAddress(0x48, x64.gs), 4)
@staticmethod
def is_kernel_mode(ctxt: Context) -> bool:
return ctxt.read(x64.cs) & 0x3 == 0
# %%
# Find the lower/upper bound of contexts on which the deadlock detection processes
def find_begin_end_context(sco: RuntimeHelper, pid: int, binary: Optional[str],
begin_id: Optional[int], end_id: Optional[int]) -> Tuple[Context, Context]:
begin_ctxt = None
if begin_id is not None:
try:
begin_trans = sco.trace.transition(begin_id)
begin_ctxt = begin_trans.context_after()
except IndexError:
begin_ctxt = None
if begin_ctxt is None:
if binary is not None:
for name in sco.ossi.executed_binaries(binary):
for ctxt in sco.search_binary(name):
ctx_process = ctxt.ossi.process()
assert ctx_process is not None
if ctx_process.pid == pid:
begin_ctxt = ctxt
break
if begin_ctxt is not None:
break
if begin_ctxt is None:
begin_ctxt = sco.trace.first_context
end_ctxt = None
if end_id is not None:
try:
end_trans = sco.trace.transition(end_id)
end_ctxt = end_trans.context_before()
except IndexError:
end_ctxt = None
if end_ctxt is None:
end_ctxt = sco.trace.last_context
if (end_ctxt <= begin_ctxt):
raise RuntimeError("The begin transition must be smaller than the end.")
return (begin_ctxt, end_ctxt)
# Get all execution contexts of a given process
def find_process_ranges(sco: RuntimeHelper, pid: int, first_ctxt: Context, last_context: Context) \
-> Iterator[Tuple[Context, Context]]:
ctxt_low = first_ctxt
ctxt_high: Optional[Context] = None
while True:
current_process = ctxt_low.ossi.process()
assert current_process is not None
current_pid = current_process.pid
ctxt_high = ctxt_low.find_register_change(x64.cr3, is_forward=True)
if ctxt_high is None:
if current_pid == pid:
if ctxt_low < last_context - 1:
yield (ctxt_low, last_context - 1)
break
if ctxt_high >= last_context:
break
if current_pid == pid:
yield (ctxt_low, ctxt_high)
ctxt_low = ctxt_high + 1
# Start from a transition, return the first transition that is not a non-instruction, or None if there isn't one.
def ignore_non_instructions(trans: Transition, trace: Trace) -> Optional[Transition]:
while trans.instruction is None:
if trans == trace.last_transition:
return None
trans = trans + 1
return trans
# Extract user mode only context ranges from a context range (which may include also kernel mode ranges)
def find_usermode_ranges(sco: RuntimeHelper, ctxt_low: Context, ctxt_high: Context) \
-> Iterator[Tuple[Context, Context]]:
trans = ignore_non_instructions(ctxt_low.transition_after(), sco.trace)
if trans is None:
return
ctxt_current = trans.context_before()
while ctxt_current < ctxt_high:
ctxt_next = ctxt_current.find_register_change(x64.cs, is_forward=True)
if not RuntimeHelper.is_kernel_mode(ctxt_current):
if ctxt_next is None or ctxt_next > ctxt_high:
yield (ctxt_current, ctxt_high)
break
else:
# It's safe to decrease ctxt_next by 1 because it was obtained from a forward find_register_change
yield (ctxt_current, ctxt_next - 1)
if ctxt_next is None:
break
ctxt_current = ctxt_next
# Get user mode only execution contexts of a given process
def find_process_usermode_ranges(trace: RuntimeHelper, pid: int, first_ctxt: Context, last_ctxt: Context) \
-> Iterator[Tuple[Context, Context]]:
for (ctxt_low, ctxt_high) in find_process_ranges(trace, pid, first_ctxt, last_ctxt):
usermode_ranges = find_usermode_ranges(trace, ctxt_low, ctxt_high)
for usermode_range in usermode_ranges:
yield usermode_range
def context_is_in_binary(ctxt: Optional[Context], binary: Optional[str]) -> bool:
if ctxt is None:
return False
if binary is None:
return True
ctxt_loc = ctxt.ossi.location()
if ctxt_loc is None:
return False
ctxt_binary = ctxt_loc.binary
if ctxt_binary is not None:
return (binary in [ctxt_binary.name, ctxt_binary.filename, ctxt_binary.path])
return False
# Get locks (i.e. RtlEnterCriticalSection) called by the binary in a range of context (defined by ctxt_low and
# ctxt_high).
# Note that for a process (of a binary), there are also locks called by libraries loaded by PE loader, such calls
# are considered "uninteresting" if the binary name is given.
def get_in_binary_locks(sco: RuntimeHelper, ctxt_low: Context, ctxt_high: Context, binary: Optional[str]) \
-> Iterator[Context]:
for ctxt in sco.get_critical_section_locks(ctxt_low, ctxt_high):
if context_is_in_binary(ctxt, binary):
yield ctxt
# Get unlocks (i.e. RtlLeaveCriticalSection) called by the binary in a range of context (defined by ctxt_low and
# ctxt_high).
# Note that for a process (of a binary), there are also unlocks called by libraries loaded by PE loader, such calls
# are considered "uninteresting" if the binary name is given.
def get_in_binary_unlocks(sco: RuntimeHelper, ctxt_low: Context, ctxt_high: Context, binary: Optional[str]) \
-> Iterator[Context]:
for ctxt in sco.get_critical_section_unlocks(ctxt_low, ctxt_high):
if context_is_in_binary(ctxt, binary):
yield ctxt
# Sort lock and unlock contexts (called in a range of contexts) in a correct order.
# Return a generator of pairs (bool, Context): True for lock, False for unlock.
def get_in_binary_locks_unlocks(sco: RuntimeHelper, ctxt_low: Context, ctxt_high: Context, binary: Optional[str]) \
-> Iterator[Tuple[bool, Context]]:
def generate_locks():
for ctxt in get_in_binary_locks(sco, ctxt_low, ctxt_high, binary):
yield (True, ctxt)
def generate_unlocks():
for ctxt in get_in_binary_unlocks(sco, ctxt_low, ctxt_high, binary):
yield (False, ctxt)
return collate([generate_locks(), generate_unlocks()], key=lambda bool_context: bool_context[1])
# Generate all locks and unlocks called by the binary
def get_thread_usermode_in_binary_locks_unlocks(sco: RuntimeHelper, ranges: List[Tuple[Context, Context]],
binary: Optional[str]) -> Iterator[Tuple[bool, Context]]:
for (ctxt_low, ctxt_high) in ranges:
for lock_unlock in get_in_binary_locks_unlocks(sco, ctxt_low, ctxt_high, binary):
yield lock_unlock
# %%
# Check for RAII in locks/unlocks (i.e. critical sections are unlocked in reverse order of lock), warn if RAII is
# violated as it is good practice.
#
# In synchronization with critical sections, RAII is an idiomatic technique used to make the lock order consistent,
# then avoid deadlock in case of there is a total order of locks between threads. For example: the following threads
# where the lock/unlock of critical sections A and B follows RAII
# - thread 0: lock A, lock B, unlock B, unlock A, lock A, lock B
# - thread 1: lock A, lock B, unlock B, unlock A, lock A, lock B
# are deadlock free. But
# - thread 0: lock A, lock B, unlock A, lock A, unlock B
# - thread 1: lock A, lock B, unlock A, lock A, unlock B
# have deadlock. Let's consider the interleaving:
# - lock A, lock B, unlock A (thread 0), lock A (thread 1)
# now thread 1 try to lock B (but it cannot since B is already locked by thread 0), thread 0 cannot unlock B neither
# since it needs to lock A first (but it cannot since A is already locked by thread 1).
#
# Note that RAII cannot guarantee the deadlock free synchronization if the condition about single total order of
# critical sections is not satisfied.
def check_intrathread_lock_unlock_matching(trace: RuntimeHelper, thread_ranges: List[Tuple[Context, Context]],
binary: Optional[str]):
locks_unlocks = get_thread_usermode_in_binary_locks_unlocks(trace, thread_ranges, binary)
# lock/unlock should (though not obliged) follow RAII
corresponding_lock_stack: List[Context] = []
mismatch_lock_unlock_pcs: Set[Tuple[int, int]] = set()
mismatch_unlocks_pcs: Set[int] = set()
ok = True
for (is_lock, ctxt) in locks_unlocks:
if is_lock:
# push lock context
corresponding_lock_stack.append(ctxt)
else:
if corresponding_lock_stack:
last_lock_ctxt = corresponding_lock_stack[-1]
last_lock_handle = RuntimeHelper.get_critical_section_handle(last_lock_ctxt)
current_unlock_handle = RuntimeHelper.get_critical_section_handle(ctxt)
if last_lock_handle == current_unlock_handle:
# lock and unlock on the same critical section
corresponding_lock_stack.pop()
else:
# It's safe to decrease by 1 since the first context of the trace is never counted as a lock
# nor unlock (c.f. RuntimeHelper::get_critical_section_(locks|unlocks)).
in_binary_lock_pc = last_lock_ctxt.read(x64.rip)
in_binary_unlock_pc = ctxt.read(x64.rip)
if (in_binary_lock_pc, in_binary_unlock_pc) in mismatch_lock_unlock_pcs:
continue
mismatch_lock_unlock_pcs.add((in_binary_lock_pc, in_binary_unlock_pc))
print(f'Warning:\n\t#{last_lock_ctxt.transition_after().id}: lock at 0x{in_binary_lock_pc:x} \
(on critical section handle 0x{last_lock_handle:x}) followed by\n\t#{ctxt.transition_after().id}: \
unlock at 0x{in_binary_unlock_pc:x} (on different critical section handle 0x{current_unlock_handle:x})')
ok = False
else:
in_binary_unlock_pc = ctxt.read(x64.rip)
if in_binary_unlock_pc in mismatch_unlocks_pcs:
continue
mismatch_unlocks_pcs.add(in_binary_unlock_pc)
print(f'Warning:\n\t#{ctxt.transition_after().id}: unlock at \
(on 0x{current_unlock_handle:x}) without any lock')
ok = False
if ok:
print('OK')
# Build a dependency graph of locks: a lock A is followed by a lock B if A is still locked when B is locked.
# For example:
# - lock A, lock B => A followed by B
# - lock A, unlock A, lock B => A is not followed by B (since A is already unlocked when B is locked)
def build_locks_unlocks_order_graph_next(locks_unlocks: Iterator[Tuple[bool, Context]]) \
-> Tuple[Dict[int, List[int]], Dict[Tuple[int, int], List[Tuple[Context, Context]]]]:
order_graph: Dict[int, List[int]] = {}
order_graph_label: Dict[Tuple[int, int], List[Tuple[Context, Context]]] = {}
lock_stack: List[Context] = []
for (is_lock, ctxt) in locks_unlocks:
if not lock_stack:
if is_lock:
lock_stack.append(ctxt)
continue
current_lock_unlock_handle = RuntimeHelper.get_critical_section_handle(ctxt)
current_lock_unlock_threadid = RuntimeHelper.thread_id(ctxt)
if not is_lock:
# looking for the last lock in the stack
i = len(lock_stack) - 1
while True:
lock_handle_i = RuntimeHelper.get_critical_section_handle(lock_stack[i])
lock_threadid_i = RuntimeHelper.thread_id(lock_stack[i])
if lock_handle_i == current_lock_unlock_handle and lock_threadid_i == current_lock_unlock_threadid:
del lock_stack[i]
break
if i == 0:
break
i -= 1
continue
last_lock_ctxt = lock_stack[-1]
# check of the last lock and the current lock are in the same thread
if RuntimeHelper.thread_id(last_lock_ctxt) == RuntimeHelper.thread_id(ctxt):
# create the edge: last_lock -> current_lock
last_lock_handle = RuntimeHelper.get_critical_section_handle(last_lock_ctxt)
if last_lock_handle not in order_graph:
order_graph[last_lock_handle] = []
order_graph[last_lock_handle].append(current_lock_unlock_handle)
# create (or update) the label of the edge
if (last_lock_handle, current_lock_unlock_handle) not in order_graph_label:
order_graph_label[(last_lock_handle, current_lock_unlock_handle)] = []
order_graph_label[(last_lock_handle, current_lock_unlock_handle)].append((last_lock_ctxt, ctxt))
lock_stack.append(ctxt)
return (order_graph, order_graph_label)
# Check if there are cycles in the lock dependency graph, such a cycle is considered a potential deadlock.
def check_order_graph_cycle(graph: Dict[int, List[int]], labels: Dict[Tuple[int, int], List[Tuple[Context, Context]]]):
def dfs(path: List[Tuple[int, int, int]], starting_node: int, visited_nodes: Set[int]) \
-> Optional[List[List[Tuple[int, int, int]]]]:
if starting_node not in graph:
return None
next_nodes_to_visit = set(graph[starting_node]) - visited_nodes
if not next_nodes_to_visit:
return None
nodes_on_path = set()
tids_on_path = set()
for (hd, tl, tid) in path:
nodes_on_path.add(hd)
nodes_on_path.add(tl)
tids_on_path.add(tid)
# check if we can build a cycle of locks by trying visiting a node
back_nodes = next_nodes_to_visit & nodes_on_path
for node in back_nodes:
back_node = node
tids_starting_back = set()
for (ctxt_hd, _) in labels[(starting_node, back_node)]:
tids_starting_back.add(RuntimeHelper.thread_id(ctxt_hd))
# sub-path starting from the back-node to the starting-node
sub_path = []
sub_path_tids = set()
for (hd, tl, tid) in path:
if hd == node:
sub_path.append((hd, tl, tid))
sub_path_tids.add(tid)
node = tl
if tl == starting_node:
diff_tids = tids_starting_back - sub_path_tids
# there is an edge whose TID is not on the sub-path yet
cycles = []
if diff_tids:
for tid in diff_tids:
for (ctxt_hd, _) in labels[(starting_node, back_node)]:
if RuntimeHelper.thread_id(ctxt_hd) == tid:
sub_path.append((starting_node, back_node, tid))
cycles.append(sub_path)
break
return cycles
else:
return None
for next_node in next_nodes_to_visit:
tids = set()
for (ctxt_hd, _,) in labels[(starting_node, next_node)]:
tid = RuntimeHelper.thread_id(ctxt_hd)
tids.add(tid)
tids_to_visit = tids - tids_on_path
if tids_to_visit:
for tid_to_visit in tids_to_visit:
for (ctxt_hd, _) in labels[(starting_node, next_node)]:
if RuntimeHelper.thread_id(ctxt_hd) == tid_to_visit:
next_path = path
next_path.append((starting_node, next_node, tid))
visited_nodes.add(next_node)
some_cycles = dfs(next_path, next_node, visited_nodes)
if some_cycles is not None:
return some_cycles
return None
def compare_cycles(c0: List[Tuple[int, int, int]], c1: List[Tuple[int, int, int]]) -> bool:
if len(c0) != len(c1):
return False
if len(c0) == 0:
return True
def circular_generator(c: List[Tuple[int, int, int]], e: Tuple[int, int, int]) \
-> Optional[Iterator[Tuple[int, int, int]]]:
clen = len(c)
i = 0
for elem in c:
if elem == e:
while True:
yield c[i]
i = i + 1
if i == clen:
i = 0
i = i + 1
return None
c0_gen = circular_generator(c0, c0[0])
c1_gen = circular_generator(c1, c0[0])
if c1_gen is None or c0_gen is None:
return False
i = 0
while i < len(c0):
e0 = next(c0_gen)
e1 = next(c1_gen)
if e0 != e1:
return False
i = i + 1
return True
ok = True
distinct_cycles: List[List[Tuple[int, int, int]]] = []
for node in graph:
cycles = dfs([], node, set())
if cycles is None or not cycles:
continue
for cycle in cycles:
duplicated = False
if not distinct_cycles:
duplicated = False
else:
for ec in distinct_cycles:
if compare_cycles(ec, cycle):
duplicated = True
break
if duplicated:
continue
distinct_cycles.append(cycle)
print('Potential deadlock(s):')
for (node, next_node, tid) in cycle:
label = labels[(node, next_node)]
distinct_labels: Set[Tuple[int, int]] = set()
for (node_ctxt, next_node_ctxt) in label:
in_binary_lock_pc = (node_ctxt - 1).read(x64.rip)
in_binary_next_lock_pc = (next_node_ctxt - 1).read(x64.rip)
if (in_binary_lock_pc, in_binary_next_lock_pc) in distinct_labels:
continue
distinct_labels.add((in_binary_lock_pc, in_binary_next_lock_pc))
print(f'\t#{node_ctxt.transition_after().id}: lock at 0x{in_binary_lock_pc:x} \
(on thread {RuntimeHelper.thread_id(node_ctxt)}, critical section handle \
0x{RuntimeHelper.get_critical_section_handle(node_ctxt):x}) followed by\n\t\
#{next_node_ctxt.transition_after().id}: lock at 0x{in_binary_next_lock_pc:x} \
(on thread {RuntimeHelper.thread_id(next_node_ctxt)}, \
critical section handle 0x{RuntimeHelper.get_critical_section_handle(next_node_ctxt):x})')
ok = False
print(
'\t=============================================================='
)
if ok:
print('Not found.')
return None
# Get user mode locks (and unlocks) of called by the binary, build the dependency graph, then check the cycles.
def check_lock_cycle(trace: RuntimeHelper, threads_ranges: List[Tuple[Context, Context]], binary: Optional[str]):
locks_unlocks = get_thread_usermode_in_binary_locks_unlocks(trace, threads_ranges, binary)
(order_graph, order_graph_labels) = build_locks_unlocks_order_graph_next(locks_unlocks)
check_order_graph_cycle(order_graph, order_graph_labels)
# Combination of checking lock/unlock RAII and deadlock
def detect_deadlocks(trace: RuntimeHelper, pid: int, binary: Optional[str],
first_id: Optional[int], last_id: Optional[int]):
(first_context, last_context) = find_begin_end_context(trace, pid, binary, first_id, last_id)
process_ranges = list(find_process_usermode_ranges(trace, pid, first_context, last_context))
thread_ranges: Dict[int, List[Tuple[Context, Context]]] = {}
for (ctxt_low, ctxt_high) in process_ranges:
tid = RuntimeHelper.thread_id(ctxt_low)
if tid not in thread_ranges:
thread_ranges[tid] = []
thread_ranges[tid].append((ctxt_low, ctxt_high))
for (tid, ranges) in thread_ranges.items():
print('\n============ checking lock/unlock matching on thread {} ============'.format(tid))
check_intrathread_lock_unlock_matching(trace, ranges, binary)
print('\n\n============ checking potential deadlocks on process ============')
check_lock_cycle(trace, process_ranges, binary)
# %%
trace = RuntimeHelper(host, port)
detect_deadlocks(trace, pid, binary, begin_trans_id, end_trans_id)
Searching for Buffer-Overflow vulnerabilities
This notebook allows to search for potential Buffer-Overflow vulnerabilities in a Reven trace.
Prerequisites
- This notebook should be run in a jupyter notebook server equipped with a Reven python kernel.
Reven comes with a jupyter notebook server accessible with the
Open Python
button in theAnalyze
page of any scenario. - This notebook depends on
capstone
being installed in the Reven python kernel. To install capstone in the current environment, please execute the capstone cell of this notebook.
Running the notebook
Fill out the parameters in the Parameters cell below, then run all the cells of this notebook.
Note
Although the script is designed to limit false positive results, you may still encounter a few ones. Please check the results and feel free to report any issue, be it a false positive or a false negative ;-).
Source
# -*- coding: utf-8 -*-
# ---
# jupyter:
# jupytext:
# formats: ipynb,py:percent
# text_representation:
# extension: .py
# format_name: percent
# kernelspec:
# display_name: reven
# language: python
# name: reven-python3
# ---
# %% [markdown]
# # Searching for Buffer-Overflow vulnerabilities
#
# This notebook allows to search for potential Buffer-Overflow vulnerabilities in a Reven trace.
#
# ## Prerequisites
#
# - This notebook should be run in a jupyter notebook server equipped with a Reven python kernel.
# Reven comes with a jupyter notebook server accessible with the `Open Python` button in the `Analyze` page of any
# scenario.
#
# - This notebook depends on `capstone` being installed in the Reven python kernel.
# To install capstone in the current environment, please execute the [capstone cell](#Capstone-Installation) of this
# notebook.
#
#
# ## Running the notebook
#
# Fill out the parameters in the [Parameters cell](#Parameters) below, then run all the cells of this notebook.
#
#
# ## Note
#
# Although the script is designed to limit false positive results, you may still encounter a few ones. Please check
# the results and feel free to report any issue, be it a false positive or a false negative ;-).
# %% [markdown]
# # Capstone Installation
#
# Check for capstone's presence. If missing, attempt to get it from pip
# %%
try:
import capstone
print("capstone already installed")
except ImportError:
print("Could not find capstone, attempting to install it from pip")
import sys
import subprocess
command = [f"{sys.executable}", "-m", "pip", "install", "capstone"]
p = subprocess.run(command)
if int(p.returncode) != 0:
raise RuntimeError("Error installing capstone")
import capstone # noqa
print("Successfully installed capstone")
# %% [markdown]
# # Parameters
# %%
# Server connection
# Host of the Reven server running the scenario.
# When running this notebook from the Project Manager, '127.0.0.1' should be the correct value.
reven_backend_host = '127.0.0.1'
# Port of the Reven server running the scenario.
# After starting a Reven server on your scenario, you can get its port on the Analyze page of that scenario.
reven_backend_port = 13370
# Range control
# First transition considered for the detection of allocation/deallocation pairs
# If set to None, then the first transition of the trace
bof_from_tr = None
# bof_from_tr = 5300000 # ex: CVE-2020-17087 scenario
# First transition **not** considered for the detection of allocation/deallocation pairs
# If set to None, then the last transition of the trace
bof_to_tr = None
# bof_to_tr = 6100000 # ex: CVE-2020-17087 scenario
# Filter control
# Beware that, when filtering, if an allocation happens in the specified process and/or binary,
# the script will fail if the deallocation happens in a different process and/or binary.
# This issue should only happen for allocations in the kernel.
# Specify on which PID the allocation/deallocation pairs should be kept.
# Set a value of None to not filter on the PID.
faulty_process_pid = None
# faulty_process_pid = 756 # ex: CVE-2020-17087 scenario
# faulty_process_pid = 466 # ex: CVE-2021-3156 scenario
# Specify on which process name the allocation/deallocation pairs should be kept.
# Set a value of None to not filter on the process name.
# If both a process PID and a process name are specified, please make sure that they both
# refer to the same process, otherwise all allocations will be filtered and no results
# will be produced.
faulty_process_name = None
# faulty_process_name = "cve-2020-17087.exe" # ex: CVE-2020-17087 scenario
# faulty_process_name = "sudoedit" # ex: CVE-2021-3156 scenario
# Specify on which binary name the allocation/deallocation pairs should be kept.
# Set a value of None to not filter on the binary name.
# Only allocation/deallocation taking place in the binaries whose filename,
# path or name contain the specified value are kept.
# If filtering on both a process and a binary, please make sure that there are
# allocations taking place in that binary in the selected process, otherwise all
# allocations will be filtered and no result will be produced.
faulty_binary = None
# faulty_binary = "cng.sys" # ex: CVE-2020-17087 scenario
# faulty_binary = "sudoers.so" # ex: CVE-2021-3156 scenario
# Address control
# Specify a **physical** address suspected of being faulty here,
# to only test BoF for this specific address, instead of all (filtered) allocations.
# The address should still be returned by an allocation/deallocation pair.
# To get a physical address from a virtual address, find a context where the address
# is mapped, then use `virtual_address.translate(ctx)`.
bof_faulty_physical_address = None
# bof_faulty_physical_address = 0x7c5a1450 # ex: CVE-2020-17087 scenario
# bof_faulty_physical_address = 0x132498cd0 # ex: CVE-2021-3156 scenario
# Allocator control
# The script can use two allocators to find allocation/deallocation pairs.
# The following booleans allow to enable the search for allocations by these
# allocators for a scenario.
# Generally it is expected to have only a single allocator enabled for a given
# scenario.
# To add your own allocator, please look at how the two provided allocators were
# added.
# Whether or not to look for windows malloc/free allocation/deallocation pairs.
search_windows_malloc = True
# search_windows_malloc = False # ex: CVE-2020-17087 scenario
# search_windows_malloc = False # ex: CVE-2021-3156 scenario
# Whether or not to look for ExAllocatePoolWithTag/ExFreePoolWithTag
# allocation/deallocation pairs.
# This allocator is used by the Windows kernel.
search_pool_allocation = True
# search_pool_allocation = True # ex: CVE-2020-17087 scenario, kernel scenario
# search_pool_allocation = False # ex: CVE-2021-3156 scenario
# Whether or not to look for linux malloc/free allocation/deallocation pairs.
search_linux_malloc = False
# search_linux_malloc = False # ex: CVE-2020-17087 scenario, kernel scenario
# search_linux_malloc = True # ex: CVE-2021-3156 scenario
# Taint control
# Technical parameter: number of accesses in a taint after which the script gives up on
# that particular taint.
# As long taints degrade the performance of the script significantly, it is recommended to
# give up on a taint after it exceeds a certain number of operations.
# If you experience missing results, try increasing the value.
# If you experience a very long runtime for the script, try decreasing the value.
# The default value should be appropriate for most cases.
taint_max_length = 100000
# Technical parameter: number of bytes that we consider around an allocated buffer to determine if an access if a BoF
# (or underflow).
# Adjust this value to limit the number of false positives.
bof_overflow_limit = 1024
# %%
from collections import OrderedDict # noqa: E402
from typing import Dict, List # noqa: E402
import reven2 # noqa: E402
import reven2.preview.taint # noqa: E402
# %%
# Python script to connect to this scenario:
server = reven2.RevenServer(reven_backend_host, reven_backend_port)
print(server.trace.transition_count)
# %%
class MemoryRange:
page_size = 4096
page_mask = ~(page_size - 1)
def __init__(self, logical_address, size):
self.logical_address = logical_address
self.size = size
self.pages = [{
'logical_address': self.logical_address,
'size': self.size,
'physical_address': None,
'ctx_physical_address_mapped': None,
}]
# Compute the pages
while (((self.pages[-1]['logical_address'] & ~MemoryRange.page_mask) + self.pages[-1]['size'] - 1)
>= MemoryRange.page_size):
# Compute the size of the new page
new_page_size = ((self.pages[-1]['logical_address'] & ~MemoryRange.page_mask) + self.pages[-1]['size']
- MemoryRange.page_size)
# Reduce the size of the previous page and create the new one
self.pages[-1]['size'] -= new_page_size
self.pages.append({
'logical_address': self.pages[-1]['logical_address'] + self.pages[-1]['size'],
'size': new_page_size,
'physical_address': None,
'ctx_physical_address_mapped': None,
})
def try_translate_first_page(self, ctx):
if self.pages[0]['physical_address'] is not None:
return True
physical_address = reven2.address.LogicalAddress(self.pages[0]['logical_address']).translate(ctx)
if physical_address is None:
return False
self.pages[0]['physical_address'] = physical_address.offset
self.pages[0]['ctx_physical_address_mapped'] = ctx
return True
def try_translate_all_pages(self, ctx):
return_value = True
for page in self.pages:
if page['physical_address'] is not None:
continue
physical_address = reven2.address.LogicalAddress(page['logical_address']).translate(ctx)
if physical_address is None:
return_value = False
continue
page['physical_address'] = physical_address.offset
page['ctx_physical_address_mapped'] = ctx
return return_value
def is_physical_address_range_in_translated_pages(self, physical_address, size):
for page in self.pages:
if page['physical_address'] is None:
continue
if (
physical_address >= page['physical_address']
and physical_address + size <= page['physical_address'] + page['size']
):
return True
return False
def __repr__(self):
return "MemoryRange(0x%x, %d)" % (self.logical_address, self.size)
# Utils to translate the physical address of an address allocated just now
# - ctx should be the ctx where the address is located in `rax`
# - memory_range should be the range of memory of the newly allocated buffer
#
# We are using the translate API to translate it but sometimes just after the allocation
# the address isn't mapped yet. For that we are using the slicing and for all slice access
# we are trying to translate the address.
def translate_first_page_of_allocation(ctx, memory_range):
if memory_range.try_translate_first_page(ctx):
return
tainter = reven2.preview.taint.Tainter(server.trace)
taint = tainter.simple_taint(
tag0="rax",
from_context=ctx,
to_context=None,
is_forward=True
)
for access in taint.accesses(changes_only=False).all():
if memory_range.try_translate_first_page(access.transition.context_after()):
taint.cancel()
return
raise RuntimeError("Couldn't find the physical address of the first page")
# %%
class AllocEvent:
def __init__(self, memory_range, tr_begin, tr_end):
self.memory_range = memory_range
self.tr_begin = tr_begin
self.tr_end = tr_end
class FreeEvent:
def __init__(self, logical_address, tr_begin, tr_end):
self.logical_address = logical_address
self.tr_begin = tr_begin
self.tr_end = tr_end
def retrieve_events_for_symbol(
alloc_dict,
event_class,
symbol,
retrieve_event_info,
event_filter=None,
):
for ctx in server.trace.search.symbol(
symbol,
from_context=None if bof_from_tr is None else server.trace.context_before(bof_from_tr),
to_context=None if bof_to_tr is None else server.trace.context_before(bof_to_tr)
):
# We don't want hit on exception (code pagefault, hardware interrupts, etc)
if ctx.transition_after().exception is not None:
continue
previous_location = (ctx - 1).ossi.location()
previous_process = (ctx - 1).ossi.process()
# Filter by process pid/process name/binary name
# Filter by process pid
if faulty_process_pid is not None and previous_process.pid != faulty_process_pid:
continue
# Filter by process name
if faulty_process_name is not None and previous_process.name != faulty_process_name:
continue
# Filter by binary name / filename / path
if faulty_binary is not None and faulty_binary not in [
previous_location.binary.name,
previous_location.binary.filename,
previous_location.binary.path
]:
continue
# Filter the event with the argument filter
if event_filter is not None:
if event_filter(ctx.ossi.location(), previous_location):
continue
# Retrieve the call/ret
# The heuristic is that the ret is the end of our function
# - If the call is inlined it should be at the end of the caller function, so the ret is the ret of our
# function
# - If the call isn't inlined, the ret should be the ret of our function
ctx_call = next(ctx.stack.frames()).creation_transition.context_after()
ctx_ret = ctx_call.transition_before().find_inverse().context_before()
# Build the event by reading the needed registers
if event_class == AllocEvent:
current_address, size = retrieve_event_info(ctx, ctx_ret)
# Filter the alloc failing
if current_address == 0x0:
continue
memory_range = MemoryRange(current_address, size)
try:
translate_first_page_of_allocation(ctx_ret, memory_range)
except RuntimeError:
# If we can't translate the first page we assume that the buffer isn't used because
# the heuristic to detect the call/ret failed
continue
if memory_range.pages[0]['physical_address'] not in alloc_dict:
alloc_dict[memory_range.pages[0]['physical_address']] = []
alloc_dict[memory_range.pages[0]['physical_address']].append(
AllocEvent(
memory_range,
ctx.transition_after(), ctx_ret.transition_after()
)
)
elif event_class == FreeEvent:
current_address = retrieve_event_info(ctx, ctx_ret)
# Filter the free of NULL
if current_address == 0x0:
continue
current_physical_address = reven2.address.LogicalAddress(current_address).translate(ctx).offset
if current_physical_address not in alloc_dict:
alloc_dict[current_physical_address] = []
alloc_dict[current_physical_address].append(
FreeEvent(
current_address,
ctx.transition_after(), ctx_ret.transition_after()
)
)
else:
raise RuntimeError("Unknown event class: %s" % event_class.__name__)
# %%
# %%time
alloc_dict: Dict = {}
# Basic functions to retrieve the arguments
# They are working for the allocations/frees functions but won't work for all functions
# Particularly because on x86 we don't handle the size of the arguments
# nor if they are pushed left to right or right to left
def retrieve_first_argument(ctx):
if ctx.is64b():
return ctx.read(reven2.arch.x64.rcx)
else:
esp = ctx.read(reven2.arch.x64.esp)
return ctx.read(reven2.address.LogicalAddress(esp + 4, reven2.arch.x64.ss), 4)
def retrieve_second_argument(ctx):
if ctx.is64b():
return ctx.read(reven2.arch.x64.rdx)
else:
esp = ctx.read(reven2.arch.x64.esp)
return ctx.read(reven2.address.LogicalAddress(esp + 8, reven2.arch.x64.ss), 4)
def retrieve_first_argument_linux(ctx):
if ctx.is64b():
return ctx.read(reven2.arch.x64.rdi)
else:
raise NotImplementedError("Linux 32bits")
def retrieve_second_argument_linux(ctx):
if ctx.is64b():
return ctx.read(reven2.arch.x64.rsi)
else:
raise NotImplementedError("Linux 32bits")
def retrieve_return_value(ctx):
if ctx.is64b():
return ctx.read(reven2.arch.x64.rax)
else:
return ctx.read(reven2.arch.x64.eax)
def retrieve_alloc_info_with_size_as_first_argument(ctx_begin, ctx_end):
return (
retrieve_return_value(ctx_end),
retrieve_first_argument(ctx_begin)
)
def retrieve_alloc_info_with_size_as_first_argument_linux(ctx_begin, ctx_end):
return (
retrieve_return_value(ctx_end),
retrieve_first_argument_linux(ctx_begin)
)
def retrieve_alloc_info_with_size_as_second_argument(ctx_begin, ctx_end):
return (
retrieve_return_value(ctx_end),
retrieve_second_argument(ctx_begin)
)
def retrieve_alloc_info_with_size_as_second_argument_linux(ctx_begin, ctx_end):
return (
retrieve_return_value(ctx_end),
retrieve_second_argument_linux(ctx_begin)
)
def retrieve_alloc_info_for_calloc(ctx_begin, ctx_end):
return (
retrieve_return_value(ctx_end),
retrieve_first_argument(ctx_begin) * retrieve_second_argument(ctx_begin)
)
def retrieve_alloc_info_for_calloc_linux(ctx_begin, ctx_end):
return (
retrieve_return_value(ctx_end),
retrieve_first_argument_linux(ctx_begin) * retrieve_second_argument_linux(ctx_begin)
)
def retrieve_free_info_with_address_as_first_argument(ctx_begin, ctx_end):
return retrieve_first_argument(ctx_begin)
def retrieve_free_info_with_address_as_first_argument_linux(ctx_begin, ctx_end):
return retrieve_first_argument_linux(ctx_begin)
if search_windows_malloc:
def filter_in_realloc(location, caller_location):
return location.binary == caller_location.binary and caller_location.symbol.name == "realloc"
# Search for allocations with malloc
for symbol in server.ossi.symbols(r'^_?malloc$', binary_hint=r'msvcrt.dll'):
retrieve_events_for_symbol(alloc_dict, AllocEvent, symbol, retrieve_alloc_info_with_size_as_first_argument,
filter_in_realloc)
# Search for allocations with calloc
for symbol in server.ossi.symbols(r'^_?calloc(_crt)?$', binary_hint=r'msvcrt.dll'):
retrieve_events_for_symbol(alloc_dict, AllocEvent, symbol, retrieve_alloc_info_for_calloc)
# Search for deallocations with free
for symbol in server.ossi.symbols(r'^_?free$', binary_hint=r'msvcrt.dll'):
retrieve_events_for_symbol(alloc_dict, FreeEvent, symbol, retrieve_free_info_with_address_as_first_argument,
filter_in_realloc)
# Search for re-allocations with realloc
for symbol in server.ossi.symbols(r'^_?realloc$', binary_hint=r'msvcrt.dll'):
retrieve_events_for_symbol(alloc_dict, AllocEvent, symbol, retrieve_alloc_info_with_size_as_second_argument)
retrieve_events_for_symbol(alloc_dict, FreeEvent, symbol, retrieve_free_info_with_address_as_first_argument)
if search_pool_allocation:
# Search for allocations with ExAllocatePool...
def filter_ex_allocate_pool(location, caller_location):
return location.binary == caller_location.binary and caller_location.symbol.name.startswith("ExAllocatePool")
for symbol in server.ossi.symbols(r'^ExAllocatePool', binary_hint=r'ntoskrnl.exe'):
retrieve_events_for_symbol(alloc_dict, AllocEvent, symbol, retrieve_alloc_info_with_size_as_second_argument,
filter_ex_allocate_pool)
# Search for deallocations with ExFreePool...
def filter_ex_free_pool(location, caller_location):
return location.binary == caller_location.binary and caller_location.symbol.name == "ExFreePool"
for symbol in server.ossi.symbols(r'^ExFreePool', binary_hint=r'ntoskrnl.exe'):
retrieve_events_for_symbol(alloc_dict, FreeEvent, symbol, retrieve_free_info_with_address_as_first_argument,
filter_ex_free_pool)
if search_linux_malloc:
def filter_in_realloc(location, caller_location):
return (location.binary == caller_location.binary
and caller_location.symbol is not None
and caller_location.symbol.name in ["realloc", "__GI___libc_realloc"])
# Search for allocations with malloc
for symbol in server.ossi.symbols(r'^((__GI___libc_malloc)|(__libc_malloc))$', binary_hint=r'libc-.*.so'):
retrieve_events_for_symbol(alloc_dict, AllocEvent, symbol,
retrieve_alloc_info_with_size_as_first_argument_linux, filter_in_realloc)
# Search for allocations with calloc
for symbol in server.ossi.symbols(r'^((__calloc)|(__libc_calloc))$', binary_hint=r'libc-.*.so'):
retrieve_events_for_symbol(alloc_dict, AllocEvent, symbol, retrieve_alloc_info_for_calloc_linux)
# Search for deallocations with free
for symbol in server.ossi.symbols(r'^((__GI___libc_free)|(cfree))$', binary_hint=r'libc-.*.so'):
retrieve_events_for_symbol(alloc_dict, FreeEvent, symbol,
retrieve_free_info_with_address_as_first_argument_linux, filter_in_realloc)
# Search for re-allocations with realloc
for symbol in server.ossi.symbols(r'^((__GI___libc_realloc)|(realloc))$', binary_hint=r'libc-.*.so'):
retrieve_events_for_symbol(alloc_dict, AllocEvent, symbol,
retrieve_alloc_info_with_size_as_second_argument_linux)
retrieve_events_for_symbol(alloc_dict, FreeEvent, symbol,
retrieve_free_info_with_address_as_first_argument_linux)
# Sort the events per address and event type
for physical_address in alloc_dict.keys():
alloc_dict[physical_address] = list(sorted(
alloc_dict[physical_address],
key=lambda event: (event.tr_begin.id, 0 if isinstance(event, FreeEvent) else 1)
))
# Sort the dict by address
alloc_dict = OrderedDict(sorted(alloc_dict.items()))
# %%
def get_alloc_free_pairs(events, errors=None):
previous_event = None
for event in events:
if isinstance(event, AllocEvent):
if previous_event is None:
pass
elif isinstance(previous_event, AllocEvent):
if errors is not None:
errors.append("Two consecutives allocs found")
elif isinstance(event, FreeEvent):
if previous_event is None:
continue
elif isinstance(previous_event, FreeEvent):
if errors is not None:
errors.append("Two consecutives frees found")
elif isinstance(previous_event, AllocEvent):
yield (previous_event, event)
else:
assert 0, ("Unknown event type: %s" % type(event))
previous_event = event
if isinstance(previous_event, AllocEvent):
yield (previous_event, None)
# %%
# %%time
# Basic checks of the events
for physical_address, events in alloc_dict.items():
for event in events:
if not isinstance(event, AllocEvent) and not isinstance(event, FreeEvent):
raise RuntimeError("Unknown event type: %s" % type(event))
errors: List[str] = []
for (alloc_event, free_event) in get_alloc_free_pairs(events, errors):
# Check the uniformity of the logical address between the alloc and the free
if free_event is not None and alloc_event.memory_range.logical_address != free_event.logical_address:
errors.append("Phys:0x%x: Alloc #%d - Free #%d with different logical address: 0x%x != 0x%x" % (
physical_address,
alloc_event.tr_begin.id, free_event.tr_begin.id,
alloc_event.memory_range.logical_address, free_event.logical_address))
# Check size of 0x0
if alloc_event.memory_range.size == 0x0 or alloc_event.memory_range.size is None:
if free_event is None:
errors.append("Phys:0x%x: Alloc #%d - Free N/A with weird size %s" % (
physical_address,
alloc_event.tr_begin.id,
alloc_event.memory_range.size))
else:
errors.append("Phys:0x%x: Alloc #%d - Free #%d with weird size %s" % (
physical_address,
alloc_event.tr_begin.id, free_event.tr_begin.id,
alloc_event.memory_range.size))
if len(errors) > 0:
print("Phys:0x%x: Error(s) detected:" % (physical_address))
for error in errors:
print(" - %s" % error)
# %%
# Print the events
for physical_address, events in alloc_dict.items():
print("Phys:0x%x" % (physical_address))
print(" Events:")
for event in events:
if isinstance(event, AllocEvent):
print(" - Alloc at #%d (0x%x of size 0x%x)" % (event.tr_begin.id,
event.memory_range.logical_address, event.memory_range.size))
elif isinstance(event, FreeEvent):
print(" - Free at #%d (0x%x)" % (event.tr_begin.id, event.logical_address))
print(" Pairs:")
for (alloc_event, free_event) in get_alloc_free_pairs(events):
if free_event is None:
print(" - Allocated at #%d (0x%x of size 0x%x) and freed at N/A" % (alloc_event.tr_begin.id,
alloc_event.memory_range.logical_address, alloc_event.memory_range.size))
else:
print(" - Allocated at #%d (0x%x of size 0x%x) and freed at #%d (0x%x)" % (alloc_event.tr_begin.id,
alloc_event.memory_range.logical_address, alloc_event.memory_range.size, free_event.tr_begin.id,
free_event.logical_address))
print()
# %%
# Capstone utilities
def get_reven_register_from_name(name):
for reg in reven2.arch.helpers.x64_registers():
if reg.name == name:
return reg
raise RuntimeError("Unknown register: %s" % name)
def compute_dereferenced_address(ctx, cs_insn, cs_op):
dereferenced_address = 0
if cs_op.value.mem.base != 0:
dereferenced_address += ctx.read(get_reven_register_from_name(cs_insn.reg_name(cs_op.value.mem.base)))
if cs_op.value.mem.index != 0:
dereferenced_address += (cs_op.value.mem.scale
* ctx.read(get_reven_register_from_name(cs_insn.reg_name(cs_op.value.mem.index))))
dereferenced_address += cs_op.value.mem.disp
return dereferenced_address & 0xFFFFFFFFFFFFFFFF
# %%
# Function to compute the range of the intersection between two ranges
def range_intersect(r1, r2):
return range(max(r1.start, r2.start), min(r1.stop, r2.stop)) or None
def bof_analyze_function(physical_address, alloc_events):
bof_count = 0
# Setup capstone
md_64 = capstone.Cs(capstone.CS_ARCH_X86, capstone.CS_MODE_64)
md_64.detail = True
md_32 = capstone.Cs(capstone.CS_ARCH_X86, capstone.CS_MODE_32)
md_32.detail = True
errors = []
for (alloc_event, free_event) in get_alloc_free_pairs(alloc_events, errors):
# Setup the first taint [alloc; free]
tainter = reven2.preview.taint.Tainter(server.trace)
taint = tainter.simple_taint(
tag0="rax" if alloc_event.tr_end.context_before().is64b() else "eax",
from_context=alloc_event.tr_end.context_before(),
to_context=free_event.tr_begin.context_before() + 1 if free_event is not None else None,
is_forward=True
)
# Iterate on the slice
access_count = 0
for access in taint.accesses(changes_only=False).all():
access_count += 1
if access_count > taint_max_length:
if free_event is None:
print("Phys:0x%x: Allocated at #%d (0x%x of size 0x%x) and freed at N/A" % (physical_address,
alloc_event.tr_begin.id, alloc_event.memory_range.logical_address,
alloc_event.memory_range.size))
else:
print("Phys:0x%x: Allocated at #%d (0x%x of size 0x%x) and freed at #%d (0x%x)"
% (physical_address, alloc_event.tr_begin.id, alloc_event.memory_range.logical_address,
alloc_event.memory_range.size, free_event.tr_begin.id, free_event.logical_address))
print(" Warning: Allocation skipped: taint stopped after %d accesses" % access_count)
print()
break
ctx = access.transition.context_before()
md = md_64 if ctx.is64b() else md_32
cs_insn = next(md.disasm(access.transition.instruction.raw, access.transition.instruction.size))
# Skip `lea` instructions are they are not really memory read/write and the taint
# will propagate the taint anyway so that we will see the dereference of the computed value
if cs_insn.mnemonic == "lea":
continue
registers_in_state = {}
for reg_slice, _ in access.state_before().tainted_registers():
registers_in_state[reg_slice.register.name] = reg_slice
for cs_op in cs_insn.operands:
if cs_op.type != capstone.x86.X86_OP_MEM:
continue
bof_reg = None
if cs_op.value.mem.base != 0:
base_reg_name = cs_insn.reg_name(cs_op.value.mem.base)
if base_reg_name in registers_in_state:
bof_reg = registers_in_state[base_reg_name]
if bof_reg is None and cs_op.value.mem.index != 0:
index_reg_name = cs_insn.reg_name(cs_op.value.mem.index)
if index_reg_name in registers_in_state:
bof_reg = registers_in_state[index_reg_name]
if bof_reg is None:
continue
dereferenced_address = compute_dereferenced_address(ctx, cs_insn, cs_op)
# We only check on the translated pages as the taint won't return an access with a pagefault
# so the dereferenced address should be translated
dereferenced_physical_address = reven2.address.LogicalAddress(dereferenced_address).translate(ctx)
if dereferenced_physical_address is None:
continue
operand_range = range(dereferenced_address, dereferenced_address + cs_op.size)
before_buffer_range = range(alloc_event.memory_range.logical_address - bof_overflow_limit,
alloc_event.memory_range.logical_address)
after_buffer_range = range(alloc_event.memory_range.logical_address + alloc_event.memory_range.size,
alloc_event.memory_range.logical_address + alloc_event.memory_range.size
+ bof_overflow_limit)
if (range_intersect(operand_range, before_buffer_range) is None
and range_intersect(operand_range, after_buffer_range) is None):
continue
if free_event is None:
print("Phys:0x%x: Allocated at #%d (0x%x of size 0x%x) and freed at N/A" % (physical_address,
alloc_event.tr_begin.id, alloc_event.memory_range.logical_address,
alloc_event.memory_range.size))
else:
print("Phys:0x%x: Allocated at #%d (0x%x of size 0x%x) and freed at #%d (0x%x)"
% (physical_address, alloc_event.tr_begin.id, alloc_event.memory_range.logical_address,
alloc_event.memory_range.size, free_event.tr_begin.id, free_event.logical_address))
print(" BOF coming from reg %s[%d-%d] leading to dereferenced address = 0x%x"
% (bof_reg.register.name, bof_reg.begin, bof_reg.end, dereferenced_address))
print(" ", end="")
print(access.transition, end=" ")
print(ctx.ossi.location())
print()
bof_count += 1
if len(errors) > 0:
print("Phys:0x%x: Error(s) detected:" % (physical_address))
for error in errors:
print(" - %s" % error)
return bof_count
# %%
# %%time
bof_count = 0
if bof_faulty_physical_address is None:
for physical_address, alloc_events in alloc_dict.items():
bof_count += bof_analyze_function(physical_address, alloc_events)
else:
if bof_faulty_physical_address not in alloc_dict:
raise KeyError("The passed physical address was not detected during the allocation search")
bof_count += bof_analyze_function(bof_faulty_physical_address, alloc_dict[bof_faulty_physical_address])
print("---------------------------------------------------------------------------------")
bof_begin_range = "the beginning of the trace" if bof_from_tr is None else "#{}".format(bof_to_tr)
bof_end_range = "the end of the trace" if bof_to_tr is None else "#{}".format(bof_to_tr)
bof_range = ("on the whole trace" if bof_from_tr is None and bof_to_tr is None else
"between {} and {}".format(bof_begin_range, bof_end_range))
bof_range_size = server.trace.transition_count
if bof_from_tr is not None:
bof_range_size -= bof_from_tr
if bof_to_tr is not None:
bof_range_size -= server.trace.transition_count - bof_to_tr
if bof_faulty_physical_address is None:
searched_memory_addresses = "with {} searched memory addresses".format(len(alloc_dict))
else:
searched_memory_addresses = "on {:#x}".format(bof_faulty_physical_address)
print("{} BOF(s) found {} ({} transitions) {}".format(
bof_count, bof_range, bof_range_size, searched_memory_addresses
))
print("---------------------------------------------------------------------------------")
Searching for Use-of-Uninitialized-Memory vulnerabilities
This notebook allows to search for potential Use-of-Uninitialized-Memory vulnerabilities in a Reven trace.
Prerequisites
- This notebook should be run in a jupyter notebook server equipped with a Revenpython kernel.
Reven comes with a jupyter notebook server accessible with the
Open Python
button in theAnalyze
page of any scenario. - This notebook depends on capstone being installed in the Revenpython kernel. To install capstone in the current environment, please execute the capstone cell of this notebook.
- This notebook requires the Memory History resource for your target scenario.
Running the notebook
Fill out the parameters in the Parameters cell below, then run all the cells of this notebook.
Source
# -*- coding: utf-8 -*-
# ---
# jupyter:
# jupytext:
# formats: ipynb,py:percent
# text_representation:
# extension: .py
# format_name: percent
# kernelspec:
# display_name: reven
# language: python
# name: reven-python3
# ---
# %% [markdown]
# # Searching for Use-of-Uninitialized-Memory vulnerabilities
#
# This notebook allows to search for potential Use-of-Uninitialized-Memory vulnerabilities in a Reven trace.
#
# ## Prerequisites
#
# - This notebook should be run in a jupyter notebook server equipped with a Revenpython kernel.
# Reven comes with a jupyter notebook server accessible with the `Open Python` button
# in the `Analyze` page of any scenario.
# - This notebook depends on capstone being installed in the Revenpython kernel.
# To install capstone in the current environment, please execute the capstone cell of this notebook.
# - This notebook requires the Memory History resource for your target scenario.
#
#
# ## Running the notebook
#
# Fill out the parameters in the [Parameters cell](#Parameters) below, then run all the cells of this notebook.
# %% [markdown]
# # Capstone Installation
#
# Check for capstone's presence. If missing, attempt to get it from pip
# %%
try:
import capstone
print("capstone already installed")
except ImportError:
print("Could not find capstone, attempting to install it from pip")
import sys
import subprocess
command = [f"{sys.executable}", "-m", "pip", "install", "capstone"]
p = subprocess.run(command)
if int(p.returncode) != 0:
raise RuntimeError("Error installing capstone")
import capstone # noqa
print("Successfully installed capstone")
# %% [markdown]
# # Parameters
# %%
# Server connection
# Host of the Reven server running the scenario.
# When running this notebook from the Project Manager, '127.0.0.1' should be the correct value.
reven_backend_host = '127.0.0.1'
# Port of the Reven server running the scenario.
# After starting a Reven server on your scenario, you can get its port on the Analyze page of that scenario.
reven_backend_port = 13370
# Range control
# First transition considered for the detection of allocation/deallocation pairs
# If set to None, then the first transition of the trace
from_tr = None
# First transition **not** considered for the detection of allocation/deallocation pairs
# If set to None, then the last transition of the trace
to_tr = None
# Filter control
# Beware that, when filtering, if an allocation happens in the specified process and/or binary,
# the script will fail if the deallocation happens in a different process and/or binary.
# This issue should only happen for allocations in the kernel.
# Specify on which PID the allocation/deallocation pairs should be kept.
# Set a value of None to not filter on the PID.
faulty_process_pid = None
# Specify on which process name the allocation/deallocation pairs should be kept.
# Set a value of None to not filter on the process name.
# If both a process PID and a process name are specified, please make sure that they both
# refer to the same process, otherwise all allocations will be filtered and no results
# will be produced.
faulty_process_name = None
# Specify on which binary name the allocation/deallocation pairs should be kept.
# Set a value of None to not filter on the binary name.
# Only allocation/deallocation taking place in the binaries whose filename,
# path or name contain the specified value are kept.
# If filtering on both a process and a binary, please make sure that there are
# allocations taking place in that binary in the selected process, otherwise all
# allocations will be filtered and no result will be produced.
faulty_binary = None
# Address control
# Specify a **physical** address suspected of being faulty here,
# to only test the script for this specific address, instead of all (filtered) allocations.
# The address should still be returned by an allocation/deallocation pair.
# To get a physical address from a virtual address, find a context where the address
# is mapped, then use `virtual_address.translate(ctx)`.
faulty_physical_address = None
# Allocator control
# The script can use two allocators to find allocation/deallocation pairs.
# The following booleans allow to enable the search for allocations by these
# allocators for a scenario.
# Generally it is expected to have only a single allocator enabled for a given
# scenario.
# To add your own allocator, please look at how the two provided allocators were
# added.
# Whether or not to look for windows malloc/free allocation/deallocation pairs.
search_windows_malloc = True
# Whether or not to look for ExAllocatePoolWithTag/ExFreePoolWithTag
# allocation/deallocation pairs.
# This allocator is used by the Windows kernel.
search_pool_allocation = False
# Whether or not to look for linux malloc/free allocation/deallocation pairs.
search_linux_malloc = False
# Analysis control
# Whether or not the display should be restricted to UUMs impacting the control flow
only_display_uum_changing_control_flow = False
# %%
import struct # noqa: E402
from collections import OrderedDict # noqa: E402
from typing import Dict, List # noqa: E402
import reven2 # noqa: E402
import reven2.preview.taint # noqa: E402
# %%
# Python script to connect to this scenario:
server = reven2.RevenServer(reven_backend_host, reven_backend_port)
print(server.trace.transition_count)
# %%
class MemoryRange:
page_size = 4096
page_mask = ~(page_size - 1)
def __init__(self, logical_address, size):
self.logical_address = logical_address
self.size = size
self.pages = [{
'logical_address': self.logical_address,
'size': self.size,
'physical_address': None,
'ctx_physical_address_mapped': None,
}]
# Compute the pages
while (((self.pages[-1]['logical_address'] & ~MemoryRange.page_mask) + self.pages[-1]['size'] - 1)
>= MemoryRange.page_size):
# Compute the size of the new page
new_page_size = ((self.pages[-1]['logical_address'] & ~MemoryRange.page_mask) + self.pages[-1]['size']
- MemoryRange.page_size)
# Reduce the size of the previous page and create the new one
self.pages[-1]['size'] -= new_page_size
self.pages.append({
'logical_address': self.pages[-1]['logical_address'] + self.pages[-1]['size'],
'size': new_page_size,
'physical_address': None,
'ctx_physical_address_mapped': None,
})
def try_translate_first_page(self, ctx):
if self.pages[0]['physical_address'] is not None:
return True
physical_address = reven2.address.LogicalAddress(self.pages[0]['logical_address']).translate(ctx)
if physical_address is None:
return False
self.pages[0]['physical_address'] = physical_address.offset
self.pages[0]['ctx_physical_address_mapped'] = ctx
return True
def try_translate_all_pages(self, ctx):
return_value = True
for page in self.pages:
if page['physical_address'] is not None:
continue
physical_address = reven2.address.LogicalAddress(page['logical_address']).translate(ctx)
if physical_address is None:
return_value = False
continue
page['physical_address'] = physical_address.offset
page['ctx_physical_address_mapped'] = ctx
return return_value
def is_physical_address_range_in_translated_pages(self, physical_address, size):
for page in self.pages:
if page['physical_address'] is None:
continue
if (
physical_address >= page['physical_address']
and physical_address + size <= page['physical_address'] + page['size']
):
return True
return False
def __repr__(self):
return "MemoryRange(0x%x, %d)" % (self.logical_address, self.size)
# Utils to translate the physical address of an address allocated just now
# - ctx should be the ctx where the address is located in `rax`
# - memory_range should be the range of memory of the newly allocated buffer
#
# We are using the translate API to translate it but sometimes just after the allocation
# the address isn't mapped yet. For that we are using the slicing and for all slice access
# we are trying to translate the address.
def translate_first_page_of_allocation(ctx, memory_range):
if memory_range.try_translate_first_page(ctx):
return
tainter = reven2.preview.taint.Tainter(server.trace)
taint = tainter.simple_taint(
tag0="rax",
from_context=ctx,
to_context=None,
is_forward=True
)
for access in taint.accesses(changes_only=False).all():
if memory_range.try_translate_first_page(access.transition.context_after()):
taint.cancel()
return
raise RuntimeError("Couldn't find the physical address of the first page")
# %%
class AllocEvent:
def __init__(self, memory_range, tr_begin, tr_end):
self.memory_range = memory_range
self.tr_begin = tr_begin
self.tr_end = tr_end
class FreeEvent:
def __init__(self, logical_address, tr_begin, tr_end):
self.logical_address = logical_address
self.tr_begin = tr_begin
self.tr_end = tr_end
def retrieve_events_for_symbol(
alloc_dict,
event_class,
symbol,
retrieve_event_info,
event_filter=None,
):
for ctx in server.trace.search.symbol(
symbol,
from_context=None if from_tr is None else server.trace.context_before(from_tr),
to_context=None if to_tr is None else server.trace.context_before(to_tr)
):
# We don't want hit on exception (code pagefault, hardware interrupts, etc)
if ctx.transition_after().exception is not None:
continue
previous_location = (ctx - 1).ossi.location()
previous_process = (ctx - 1).ossi.process()
# Filter by process pid/process name/binary name
# Filter by process pid
if faulty_process_pid is not None and previous_process.pid != faulty_process_pid:
continue
# Filter by process name
if faulty_process_name is not None and previous_process.name != faulty_process_name:
continue
# Filter by binary name / filename / path
if faulty_binary is not None and faulty_binary not in [
previous_location.binary.name,
previous_location.binary.filename,
previous_location.binary.path
]:
continue
# Filter the event with the argument filter
if event_filter is not None:
if event_filter(ctx.ossi.location(), previous_location):
continue
# Retrieve the call/ret
# The heuristic is that the ret is the end of our function
# - If the call is inlined it should be at the end of the caller function, so the ret is the ret of our
# function
# - If the call isn't inlined, the ret should be the ret of our function
ctx_call = next(ctx.stack.frames()).creation_transition.context_after()
ctx_ret = ctx_call.transition_before().find_inverse().context_before()
# Build the event by reading the needed registers
if event_class == AllocEvent:
current_address, size = retrieve_event_info(ctx, ctx_ret)
# Filter the alloc failing
if current_address == 0x0:
continue
memory_range = MemoryRange(current_address, size)
try:
translate_first_page_of_allocation(ctx_ret, memory_range)
except RuntimeError:
# If we can't translate the first page we assume that the buffer isn't used because
# the heuristic to detect the call/ret failed
continue
if memory_range.pages[0]['physical_address'] not in alloc_dict:
alloc_dict[memory_range.pages[0]['physical_address']] = []
alloc_dict[memory_range.pages[0]['physical_address']].append(
AllocEvent(
memory_range,
ctx.transition_after(), ctx_ret.transition_after()
)
)
elif event_class == FreeEvent:
current_address = retrieve_event_info(ctx, ctx_ret)
# Filter the free of NULL
if current_address == 0x0:
continue
current_physical_address = reven2.address.LogicalAddress(current_address).translate(ctx).offset
if current_physical_address not in alloc_dict:
alloc_dict[current_physical_address] = []
alloc_dict[current_physical_address].append(
FreeEvent(
current_address,
ctx.transition_after(), ctx_ret.transition_after()
)
)
else:
raise RuntimeError("Unknown event class: %s" % event_class.__name__)
# %%
# %%time
alloc_dict: Dict = {}
# Basic functions to retrieve the arguments
# They are working for the allocations/frees functions but won't work for all functions
# Particularly because on x86 we don't handle the size of the arguments
# nor if they are pushed left to right or right to left
def retrieve_first_argument(ctx):
if ctx.is64b():
return ctx.read(reven2.arch.x64.rcx)
else:
esp = ctx.read(reven2.arch.x64.esp)
return ctx.read(reven2.address.LogicalAddress(esp + 4, reven2.arch.x64.ss), 4)
def retrieve_second_argument(ctx):
if ctx.is64b():
return ctx.read(reven2.arch.x64.rdx)
else:
esp = ctx.read(reven2.arch.x64.esp)
return ctx.read(reven2.address.LogicalAddress(esp + 8, reven2.arch.x64.ss), 4)
def retrieve_first_argument_linux(ctx):
if ctx.is64b():
return ctx.read(reven2.arch.x64.rdi)
else:
raise NotImplementedError("Linux 32bits")
def retrieve_second_argument_linux(ctx):
if ctx.is64b():
return ctx.read(reven2.arch.x64.rsi)
else:
raise NotImplementedError("Linux 32bits")
def retrieve_return_value(ctx):
if ctx.is64b():
return ctx.read(reven2.arch.x64.rax)
else:
return ctx.read(reven2.arch.x64.eax)
def retrieve_alloc_info_with_size_as_first_argument(ctx_begin, ctx_end):
return (
retrieve_return_value(ctx_end),
retrieve_first_argument(ctx_begin)
)
def retrieve_alloc_info_with_size_as_first_argument_linux(ctx_begin, ctx_end):
return (
retrieve_return_value(ctx_end),
retrieve_first_argument_linux(ctx_begin)
)
def retrieve_alloc_info_with_size_as_second_argument(ctx_begin, ctx_end):
return (
retrieve_return_value(ctx_end),
retrieve_second_argument(ctx_begin)
)
def retrieve_alloc_info_with_size_as_second_argument_linux(ctx_begin, ctx_end):
return (
retrieve_return_value(ctx_end),
retrieve_second_argument_linux(ctx_begin)
)
def retrieve_alloc_info_for_calloc(ctx_begin, ctx_end):
return (
retrieve_return_value(ctx_end),
retrieve_first_argument(ctx_begin) * retrieve_second_argument(ctx_begin)
)
def retrieve_alloc_info_for_calloc_linux(ctx_begin, ctx_end):
return (
retrieve_return_value(ctx_end),
retrieve_first_argument_linux(ctx_begin) * retrieve_second_argument_linux(ctx_begin)
)
def retrieve_free_info_with_address_as_first_argument(ctx_begin, ctx_end):
return retrieve_first_argument(ctx_begin)
def retrieve_free_info_with_address_as_first_argument_linux(ctx_begin, ctx_end):
return retrieve_first_argument_linux(ctx_begin)
if search_windows_malloc:
def filter_in_realloc(location, caller_location):
return location.binary == caller_location.binary and caller_location.symbol.name == "realloc"
# Search for allocations with malloc
for symbol in server.ossi.symbols(r'^_?malloc$', binary_hint=r'msvcrt.dll'):
retrieve_events_for_symbol(alloc_dict, AllocEvent, symbol, retrieve_alloc_info_with_size_as_first_argument,
filter_in_realloc)
# Search for allocations with calloc
for symbol in server.ossi.symbols(r'^_?calloc(_crt)?$', binary_hint=r'msvcrt.dll'):
retrieve_events_for_symbol(alloc_dict, AllocEvent, symbol, retrieve_alloc_info_for_calloc)
# Search for deallocations with free
for symbol in server.ossi.symbols(r'^_?free$', binary_hint=r'msvcrt.dll'):
retrieve_events_for_symbol(alloc_dict, FreeEvent, symbol, retrieve_free_info_with_address_as_first_argument,
filter_in_realloc)
# Search for re-allocations with realloc
for symbol in server.ossi.symbols(r'^_?realloc$', binary_hint=r'msvcrt.dll'):
retrieve_events_for_symbol(alloc_dict, AllocEvent, symbol, retrieve_alloc_info_with_size_as_second_argument)
retrieve_events_for_symbol(alloc_dict, FreeEvent, symbol, retrieve_free_info_with_address_as_first_argument)
if search_pool_allocation:
# Search for allocations with ExAllocatePool...
def filter_ex_allocate_pool(location, caller_location):
return location.binary == caller_location.binary and caller_location.symbol.name.startswith("ExAllocatePool")
for symbol in server.ossi.symbols(r'^ExAllocatePool', binary_hint=r'ntoskrnl.exe'):
retrieve_events_for_symbol(alloc_dict, AllocEvent, symbol, retrieve_alloc_info_with_size_as_second_argument,
filter_ex_allocate_pool)
# Search for deallocations with ExFreePool...
def filter_ex_free_pool(location, caller_location):
return location.binary == caller_location.binary and caller_location.symbol.name == "ExFreePool"
for symbol in server.ossi.symbols(r'^ExFreePool', binary_hint=r'ntoskrnl.exe'):
retrieve_events_for_symbol(alloc_dict, FreeEvent, symbol, retrieve_free_info_with_address_as_first_argument,
filter_ex_free_pool)
if search_linux_malloc:
def filter_in_realloc(location, caller_location):
return (
location.binary == caller_location.binary
and (
caller_location.symbol is not None
and caller_location.symbol.name in ["realloc", "__GI___libc_realloc"]
)
)
# Search for allocations with malloc
for symbol in server.ossi.symbols(r'^((__GI___libc_malloc)|(__libc_malloc))$', binary_hint=r'libc-.*.so'):
retrieve_events_for_symbol(alloc_dict, AllocEvent, symbol,
retrieve_alloc_info_with_size_as_first_argument_linux, filter_in_realloc)
# Search for allocations with calloc
for symbol in server.ossi.symbols(r'^((__calloc)|(__libc_calloc))$', binary_hint=r'libc-.*.so'):
retrieve_events_for_symbol(alloc_dict, AllocEvent, symbol, retrieve_alloc_info_for_calloc_linux)
# Search for deallocations with free
for symbol in server.ossi.symbols(r'^((__GI___libc_free)|(cfree))$', binary_hint=r'libc-.*.so'):
retrieve_events_for_symbol(alloc_dict, FreeEvent, symbol,
retrieve_free_info_with_address_as_first_argument_linux, filter_in_realloc)
# Search for re-allocations with realloc
for symbol in server.ossi.symbols(r'^((__GI___libc_realloc)|(realloc))$', binary_hint=r'libc-.*.so'):
retrieve_events_for_symbol(alloc_dict, AllocEvent, symbol,
retrieve_alloc_info_with_size_as_second_argument_linux)
retrieve_events_for_symbol(alloc_dict, FreeEvent, symbol,
retrieve_free_info_with_address_as_first_argument_linux)
# Sort the events per address and event type
for physical_address in alloc_dict.keys():
alloc_dict[physical_address] = list(sorted(
alloc_dict[physical_address],
key=lambda event: (event.tr_begin.id, 0 if isinstance(event, FreeEvent) else 1)
))
# Sort the dict by address
alloc_dict = OrderedDict(sorted(alloc_dict.items()))
# %%
def get_alloc_free_pairs(events, errors=None):
previous_event = None
for event in events:
if isinstance(event, AllocEvent):
if previous_event is None:
pass
elif isinstance(previous_event, AllocEvent):
if errors is not None:
errors.append("Two consecutives allocs found")
elif isinstance(event, FreeEvent):
if previous_event is None:
continue
elif isinstance(previous_event, FreeEvent):
if errors is not None:
errors.append("Two consecutives frees found")
elif isinstance(previous_event, AllocEvent):
yield (previous_event, event)
else:
assert 0, ("Unknown event type: %s" % type(event))
previous_event = event
if isinstance(previous_event, AllocEvent):
yield (previous_event, None)
# %%
# %%time
# Basic checks of the events
for physical_address, events in alloc_dict.items():
for event in events:
if not isinstance(event, AllocEvent) and not isinstance(event, FreeEvent):
raise RuntimeError("Unknown event type: %s" % type(event))
errors: List[str] = []
for (alloc_event, free_event) in get_alloc_free_pairs(events, errors):
# Check the uniformity of the logical address between the alloc and the free
if free_event is not None and alloc_event.memory_range.logical_address != free_event.logical_address:
errors.append(
"Phys:0x%x: Alloc #%d - Free #%d with different logical address: 0x%x != 0x%x" % (
physical_address,
alloc_event.tr_begin.id,
free_event.tr_begin.id,
alloc_event.memory_range.logical_address,
free_event.logical_address
)
)
# Check size of 0x0
if alloc_event.memory_range.size == 0x0 or alloc_event.memory_range.size is None:
if free_event is None:
errors.append("Phys:0x%x: Alloc #%d - Free N/A with weird size %s" % (
physical_address, alloc_event.tr_begin.id, alloc_event.memory_range.size
))
else:
errors.append("Phys:0x%x: Alloc #%d - Free #%d with weird size %s" % (
physical_address, alloc_event.tr_begin.id, free_event.tr_begin.id, alloc_event.memory_range.size
))
if len(errors) > 0:
print("Phys:0x%x: Error(s) detected:" % (physical_address))
for error in errors:
print(" - %s" % error)
# %%
# Print the events
for physical_address, events in alloc_dict.items():
print("Phys:0x%x" % (physical_address))
print(" Events:")
for event in events:
if isinstance(event, AllocEvent):
print(" - Alloc at #%d (0x%x of size 0x%x)" % (
event.tr_begin.id, event.memory_range.logical_address, event.memory_range.size
))
elif isinstance(event, FreeEvent):
print(" - Free at #%d (0x%x)" % (event.tr_begin.id, event.logical_address))
print(" Pairs:")
for (alloc_event, free_event) in get_alloc_free_pairs(events):
if free_event is None:
print(" - Allocated at #%d (0x%x of size 0x%x) and freed at N/A" % (
alloc_event.tr_begin.id, alloc_event.memory_range.logical_address, alloc_event.memory_range.size
))
else:
print(" - Allocated at #%d (0x%x of size 0x%x) and freed at #%d (0x%x)" % (
alloc_event.tr_begin.id, alloc_event.memory_range.logical_address, alloc_event.memory_range.size,
free_event.tr_begin.id, free_event.logical_address
))
print()
# %%
# Setup capstone
md_64 = capstone.Cs(capstone.CS_ARCH_X86, capstone.CS_MODE_64)
md_64.detail = True
md_32 = capstone.Cs(capstone.CS_ARCH_X86, capstone.CS_MODE_32)
md_32.detail = True
# Retrieve a `bytes` array from the capstone operand
def get_mask_from_cs_op(mask_op):
mask_formats = [
None,
'B', # 1
'H', # 2
None,
'I', # 4
None,
None,
None,
'Q', # 8
]
return struct.pack(
mask_formats[mask_op.size],
mask_op.imm if mask_op.imm >= 0 else ((1 << (mask_op.size * 8)) + mask_op.imm)
)
# This function will return an array containing either `True` or `False` for each byte
# of the memory access to know which one should be considered for an UUM
def filter_and_bytes(cs_insn, mem_access):
# `and` instruction could be used to set some bytes to 0 with an immediate mask
# Bytes in the mask tell us what to do
# - with 0x00 we should consider the write and not the read
# - with 0xFF we should consider neither of them
# - with everything else we should consider the reads and the writes
filtered_bytes = [False] * mem_access.size
dest_op = cs_insn.operands[0]
mask_op = cs_insn.operands[1]
if dest_op.type != capstone.x86.X86_OP_MEM or mask_op.type != capstone.x86.X86_OP_IMM:
return filtered_bytes
mask = get_mask_from_cs_op(mask_op)
for i in range(0, mask_op.size):
if mask[i] == 0x00 and mem_access.operation == reven2.memhist.MemoryAccessOperation.Read:
filtered_bytes[i] = True
elif mask[i] == 0xFF:
filtered_bytes[i] = True
return filtered_bytes
# This function will return an array containing either `True` or `False` for each byte
# of the memory access to know which one should be considered for an UUM
def filter_or_bytes(cs_insn, mem_access):
# `or` instruction could be used to set some bytes to 0 with an immediate mask
# Bytes in the mask tell us what to do
# - with 0x00 we should consider neither of them
# - with 0xFF we should consider the write and not the read
# - with everything else we should consider the reads and the writes
filtered_bytes = [False] * mem_access.size
dest_op = cs_insn.operands[0]
mask_op = cs_insn.operands[1]
if dest_op.type != capstone.x86.X86_OP_MEM or mask_op.type != capstone.x86.X86_OP_IMM:
return filtered_bytes
mask = get_mask_from_cs_op(mask_op)
for i in range(0, mask_op.size):
if mask[i] == 0x00:
filtered_bytes[i] = True
elif mask[i] == 0xFF and mem_access.operation == reven2.memhist.MemoryAccessOperation.Read:
filtered_bytes[i] = True
return filtered_bytes
# This function will return an array containing either `True` or `False` for each byte
# of the memory access to know which one should be considered for an UUM.
# Only bytes whose index returns `False` will be considered for potential UUM
def filter_bts_bytes(cs_insn, mem_access):
# `bts` instruction with an immediate only access one byte in the memory
# but could be written with a bigger access (e.g `dword`)
# We only consider the byte accessed by the `bts` instruction in this case
filtered_bytes = [False] * mem_access.size
dest_op = cs_insn.operands[0]
bit_nb_op = cs_insn.operands[1]
if dest_op.type != capstone.x86.X86_OP_MEM or bit_nb_op.type != capstone.x86.X86_OP_IMM:
return filtered_bytes
filtered_bytes = [True] * mem_access.size
filtered_bytes[bit_nb_op.imm // 8] = False
return filtered_bytes
# This function will return an array containing either `True` or `False` for each byte
# of the memory access to know which one should be considered for an UUM
def get_filtered_bytes(cs_insn, mem_access):
if cs_insn.mnemonic in ["and", "lock and"]:
return filter_and_bytes(cs_insn, mem_access)
elif cs_insn.mnemonic in ["or", "lock or"]:
return filter_or_bytes(cs_insn, mem_access)
elif cs_insn.mnemonic in ["bts", "lock bts"]:
return filter_bts_bytes(cs_insn, mem_access)
return [False] * mem_access.size
class UUM:
# This array contains the relation between the capstone flag and
# the reven register to check
test_eflags = {
capstone.x86.X86_EFLAGS_TEST_OF: reven2.arch.x64.of,
capstone.x86.X86_EFLAGS_TEST_SF: reven2.arch.x64.sf,
capstone.x86.X86_EFLAGS_TEST_ZF: reven2.arch.x64.zf,
capstone.x86.X86_EFLAGS_TEST_PF: reven2.arch.x64.pf,
capstone.x86.X86_EFLAGS_TEST_CF: reven2.arch.x64.cf,
capstone.x86.X86_EFLAGS_TEST_NT: reven2.arch.x64.nt,
capstone.x86.X86_EFLAGS_TEST_DF: reven2.arch.x64.df,
capstone.x86.X86_EFLAGS_TEST_RF: reven2.arch.x64.rf,
capstone.x86.X86_EFLAGS_TEST_IF: reven2.arch.x64.if_,
capstone.x86.X86_EFLAGS_TEST_TF: reven2.arch.x64.tf,
capstone.x86.X86_EFLAGS_TEST_AF: reven2.arch.x64.af,
}
def __init__(self, alloc_event, free_event, memaccess, uum_bytes):
self.alloc_event = alloc_event
self.free_event = free_event
self.memaccess = memaccess
self.bytes = uum_bytes
# Store conditionals depending on uninitialized memory
# - 'transition': the transition
# - 'reg': the flag which is uninitialized
self.conditionals = None
@property
def nb_uum_bytes(self):
return len(list(filter(lambda byte: byte, self.bytes)))
def analyze_usage(self):
# Initialize an array of what to taint based on the uninitialized bytes
taint_tags = []
for i in range(0, self.memaccess.size):
if not self.bytes[i]:
continue
taint_tags.append(reven2.preview.taint.TaintedMemories(self.memaccess.physical_address + i, 1))
# Start a taint of just the first instruction (the memory access)
# We don't want to keep the memory tainted as if the memory is accessed later we
# will have another UUM anyway. So we are using the state of this first taint
# and we remove the initial tainted memory to start a new taint from after the first
# instruction to the end of the trace
tainter = reven2.preview.taint.Tainter(server.trace)
taint = tainter.simple_taint(
tag0=taint_tags,
from_context=self.memaccess.transition.context_before(),
to_context=self.memaccess.transition.context_after() + 1,
is_forward=True
)
state_after_first_instruction = taint.state_at(self.memaccess.transition.context_after())
# We assume that we won't have other tainted memories than the uninitialized memories
# after the first instruction, so we can just keep the registers from
# `state_after_first_instruction` and not the memories
# In the future, we should keep the inverse of the intersection of the uninitialized memories
# and the memories in the `state_after_first_instruction`
taint = tainter.simple_taint(
tag0=list(map(
lambda x: x[0],
state_after_first_instruction.tainted_registers()
)),
from_context=self.memaccess.transition.context_after(),
to_context=None,
is_forward=True
)
conditionals = []
for access in taint.accesses(changes_only=False).all():
ctx = access.transition.context_before()
md = md_64 if ctx.is64b() else md_32
cs_insn = next(md.disasm(access.transition.instruction.raw, access.transition.instruction.size))
# Test conditional jump & move
for flag, reg in self.test_eflags.items():
if not cs_insn.eflags & flag:
continue
if not UUM._is_register_tainted_in_taint_state(
taint.state_at(access.transition.context_after()),
reg
):
continue
conditionals.append({
'transition': access.transition,
'flag': reg,
})
self.conditionals = conditionals
def _is_register_tainted_in_taint_state(taint_state, reg):
for tainted_reg, _ in taint_state.tainted_registers():
if tainted_reg.register == reg:
return True
return False
def __str__(self):
desc = ""
if self.free_event is None:
desc += "Phys:0x%x: Allocated at #%d (0x%x of size 0x%x) and freed at N/A\n" % (
self.alloc_event.memory_range.pages[0]['physical_address'],
self.alloc_event.tr_begin.id, self.alloc_event.memory_range.logical_address,
self.alloc_event.memory_range.size,
)
desc += "\tAlloc in: %s / %s\n\n" % (
(self.alloc_event.tr_begin - 1).context_before().ossi.location(),
(self.alloc_event.tr_begin - 1).context_before().ossi.process(),
)
else:
desc += "Phys:0x%x: Allocated at #%d (0x%x of size 0x%x) ad freed at #%d (0x%x)\n" % (
self.alloc_event.memory_range.pages[0]['physical_address'],
self.alloc_event.tr_begin.id, self.alloc_event.memory_range.logical_address,
self.alloc_event.memory_range.size,
self.free_event.tr_begin.id, self.free_event.logical_address,
)
desc += "\tAlloc in: %s / %s\n" % (
(self.alloc_event.tr_begin - 1).context_before().ossi.location(),
(self.alloc_event.tr_begin - 1).context_before().ossi.process(),
)
desc += "\tFree in: %s / %s\n\n" % (
(self.free_event.tr_begin - 1).context_before().ossi.location(),
(self.free_event.tr_begin - 1).context_before().ossi.process(),
)
desc += "\tUUM of %d byte(s) first read at:\n" % self.nb_uum_bytes
desc += "\t\t%s / %s\n" % (
self.memaccess.transition.context_before().ossi.location(),
self.memaccess.transition.context_before().ossi.process(),
)
desc += "\t\t%s" % (self.memaccess.transition)
if self.conditionals is None:
return desc
elif len(self.conditionals) == 0:
desc += "\n\n\tNot impacting the control flow"
return desc
desc += "\n\n\tThe control flow depends on uninitialized value(s):"
conditionals = []
for conditional in self.conditionals:
conditional_str = "\n\t\tFlag '%s' depends on uninitialized memory\n" % conditional['flag'].name
conditional_str += "\t\t%s / %s\n" % (
conditional['transition'].context_before().ossi.location(),
conditional['transition'].context_before().ossi.process()
)
conditional_str += "\t\t%s" % (conditional['transition'])
conditionals.append(conditional_str)
desc += "\n".join(conditionals)
return desc
def analyze_one_memaccess(alloc_event, free_event, pages, pages_bytes_written, memaccess):
if (
memaccess.transition > alloc_event.tr_begin
and memaccess.transition < alloc_event.tr_end
and memaccess.operation == reven2.memhist.MemoryAccessOperation.Read
):
# We assume that read accesses during the allocator are okay
# as the allocator should know what it is doing
return None
ctx = memaccess.transition.context_before()
md = md_64 if ctx.is64b() else md_32
cs_insn = next(md.disasm(memaccess.transition.instruction.raw, memaccess.transition.instruction.size))
filtered_bytes = get_filtered_bytes(cs_insn, memaccess)
uum_bytes = [False] * memaccess.size
for i in range(0, memaccess.size):
if filtered_bytes[i]:
continue
possible_pages = list(filter(
lambda page: (
memaccess.physical_address.offset + i >= page['physical_address']
and memaccess.physical_address.offset + i < page['physical_address'] + page['size']
),
pages
))
if len(possible_pages) > 1:
# Should not be possible to have a byte in multiple pages
raise AssertionError("Single byte access accross multiple pages")
elif len(possible_pages) == 0:
# Access partially outside the buffer
continue
phys_addr = possible_pages[0]['physical_address']
byte_offset_in_page = memaccess.physical_address.offset + i - possible_pages[0]['physical_address']
if memaccess.operation == reven2.memhist.MemoryAccessOperation.Read:
byte_written = pages_bytes_written[phys_addr][byte_offset_in_page]
if not byte_written:
uum_bytes[i] = True
elif memaccess.operation == reven2.memhist.MemoryAccessOperation.Write:
pages_bytes_written[phys_addr][byte_offset_in_page] = True
if any(uum_bytes):
return UUM(alloc_event, free_event, memaccess, uum_bytes)
return None
def uum_analyze_function(physical_address, alloc_events):
uum_count = 0
for (alloc_event, free_event) in get_alloc_free_pairs(alloc_events, errors):
# We are trying to translate all the pages and will construct
# an array of translated pages.
# We don't check UUM on pages we couldn't translate
alloc_event.memory_range.try_translate_all_pages(
free_event.tr_begin.context_before()
if free_event is not None else
alloc_event.tr_end.context_before()
)
pages = list(filter(
lambda page: page['physical_address'] is not None,
alloc_event.memory_range.pages
))
# An iterator of all the memory accesses of all the translated pages
# The from is the start of the alloc and not the end of the alloc as in some
# cases we want the accesses in it. For example a `calloc` will write the memory
# during its execution. That's also why we are ignoring the read memory accesses
# during the alloc function.
mem_accesses = reven2.util.collate(map(
lambda page: server.trace.memory_accesses(
reven2.address.PhysicalAddress(page['physical_address']),
page['size'],
from_transition=alloc_event.tr_begin,
to_transition=free_event.tr_begin if free_event is not None else None,
is_forward=True,
operation=None
),
pages
), key=lambda access: access.transition)
# This will contain for each page an array of booleans representing
# if the byte have been written before or not
pages_bytes_written = {}
for page in pages:
pages_bytes_written[page['physical_address']] = [False] * page['size']
for memaccess in mem_accesses:
if all([all(bytes_written) for bytes_written in pages_bytes_written.values()]):
# All the bytes have been set in the memory
# we no longer need to track the memory accesses
break
# Do we have a UUM on this memory access?
uum = analyze_one_memaccess(alloc_event, free_event, pages, pages_bytes_written, memaccess)
if uum is None:
continue
uum.analyze_usage()
if only_display_uum_changing_control_flow and len(uum.conditionals) == 0:
continue
print(str(uum))
print()
uum_count += 1
return uum_count
# %%
# %%time
count = 0
if faulty_physical_address is None:
for physical_address, alloc_events in alloc_dict.items():
count += uum_analyze_function(physical_address, alloc_events)
else:
if faulty_physical_address not in alloc_dict:
raise KeyError("The passed physical address was not detected during the allocation search")
count += uum_analyze_function(faulty_physical_address, alloc_dict[faulty_physical_address])
print("---------------------------------------------------------------------------------")
begin_range = "the beginning of the trace" if from_tr is None else "#{}".format(to_tr)
end_range = "the end of the trace" if to_tr is None else "#{}".format(to_tr)
final_range = ("on the whole trace" if from_tr is None and to_tr is None else
"between {} and {}".format(begin_range, end_range))
range_size = server.trace.transition_count
if from_tr is not None:
range_size -= from_tr
if to_tr is not None:
range_size -= server.trace.transition_count - to_tr
if faulty_physical_address is None:
searched_memory_addresses = "with {} searched memory addresses".format(len(alloc_dict))
else:
searched_memory_addresses = "on {:#x}".format(faulty_physical_address)
print("{} UUM(s) found {} ({} transitions) {}".format(
count, final_range, range_size, searched_memory_addresses
))
print("---------------------------------------------------------------------------------")
Detect data race
This notebook checks for possible data race that may occur when using critical sections as the synchronization primitive.
Prerequisites
- Supported versions:
- Reven 2.9+
- This notebook should be run in a jupyter notebook server equipped with a RevenPython kernel.
Reven comes with a Jupyter notebook server accessible with the
Open Python
button in theAnalyze
page of any scenario. - The following resources are needed for the analyzed scenario:
- Trace
- OSSI
- Memory History
- Backtrace (Stack Events included)
- Fast Search
Perimeter:
- Windows 10 64-bit
Limits
- Only support the following critical section API(s) as the lock/unlock operations.
RtlEnterCriticalSection
,RtlTryEnterCriticalSection
RtlLeaveCriticalSection
Running
Fill out the parameters in the Parameters cell below, then run all the cells of this notebook.
Source
# ---
# jupyter:
# jupytext:
# formats: ipynb,py:percent
# text_representation:
# extension: .py
# format_name: percent
# kernelspec:
# display_name: reven
# language: python
# name: reven-python3
# ---
# %% [markdown]
# # Detect data race
# This notebook checks for possible data race that may occur when using critical sections as the synchronization
# primitive.
#
# ## Prerequisites
# - Supported versions:
# - Reven 2.9+
# - This notebook should be run in a jupyter notebook server equipped with a RevenPython kernel.
# Reven comes with a Jupyter notebook server accessible with the `Open Python` button in the `Analyze`
# page of any scenario.
# - The following resources are needed for the analyzed scenario:
# - Trace
# - OSSI
# - Memory History
# - Backtrace (Stack Events included)
# - Fast Search
#
# ## Perimeter:
# - Windows 10 64-bit
#
# ## Limits
# - Only support the following critical section API(s) as the lock/unlock operations.
# - `RtlEnterCriticalSection`, `RtlTryEnterCriticalSection`
# - `RtlLeaveCriticalSection`
#
#
# ## Running
# Fill out the parameters in the [Parameters cell](#Parameters) below, then run all the cells of this notebook.
# %%
from dataclasses import dataclass
from enum import Enum
from typing import Dict, Generic, Iterable, Iterator, List, Optional, Set, Tuple, TypeVar, Union
# Reven specific
import reven2
from reven2.address import LinearAddress, LogicalAddress
from reven2.arch import x64
from reven2.memhist import MemoryAccess, MemoryAccessOperation
from reven2.ossi.ossi import Symbol
from reven2.trace import Context, Trace, Transition
from reven2.util import collate
# %% [markdown]
# # Parameters
# %%
# Host and port of the running the scenario.
host = '127.0.0.1'
port = 41309
# The PID and the name of the binary of interest (optional): if the binary name is given (i.e. not None), then only
# locks and unlocks called directly from the binary are counted.
pid = 2460
binary = None
begin_trans_id = None
end_trans_id = None
# Do not show common memory accesses (from different threads) which are synchronized (i.e. mutually excluded) by some
# critical section(s).
hide_synchronized_accesses = True
# Do not show common memory accesses (from different threads) which are dynamically free from critical section(s).
suppress_unknown_primitives = False
# %%
# Helper class which wraps Reven's runtime objects and give methods helping get information about calls to
# RtlEnterCriticalSection and RtlLeaveCriticalSection
class RuntimeHelper:
def __init__(self, host: str, port: int):
try:
server = reven2.RevenServer(host, port)
except RuntimeError:
raise RuntimeError(f'Cannot connect to the scenario at {host}:{port}')
self.trace = server.trace
self.ossi = server.ossi
self.search_symbol = self.trace.search.symbol
self.search_binary = self.trace.search.binary
# basic OSSI
bin_symbol_names = {
'c:/windows/system32/ntoskrnl.exe': {
'KiSwapContext'
},
'c:/windows/system32/ntdll.dll': {
# critical section
'RtlInitializeCriticalSection',
'RtlInitializeCriticalSectionEx',
'RtlInitializeCriticalSectionAndSpinCount',
'RtlEnterCriticalSection',
'RtlTryEnterCriticalSection',
'RtlLeaveCriticalSection',
# initialize/shutdown thread
'LdrpInitializeThread',
'LdrShutdownThread'
},
'c:/windows/system32/basesrv.dll': {
'BaseSrvCreateThread'
},
'c:/windows/system32/csrsrv.dll': {
'CsrCreateThread',
'CsrThreadRefcountZero',
'CsrDereferenceThread'
}
}
self.symbols: Dict[str, Optional[Symbol]] = {}
for (bin, symbol_names) in bin_symbol_names.items():
try:
exec_bin = next(self.ossi.executed_binaries(f'^{bin}$'))
except StopIteration:
if bin == 'c:/windows/system32/ntoskrnl.exe':
raise RuntimeError(f'{bin} not found')
exec_bin = None
for name in symbol_names:
if exec_bin is None:
self.symbols[name] = None
continue
try:
sym = next(exec_bin.symbols(f'^{name}$'))
self.symbols[name] = sym
except StopIteration:
msg = f'{name} not found in {bin}'
if name in {
'KiSwapContext',
'RtlEnterCriticalSection', 'RtlTryEnterCriticalSection', 'RtlLeaveCriticalSection'
}:
raise RuntimeError(msg)
else:
self.symbols[name] = None
print(f'Warning: {msg}')
self.has_debugger = True
try:
self.trace.first_transition.step_over()
except RuntimeError:
print('Warning: the debugger interface is not available, so the script cannot determine \
function return values.\nMake sure the stack events and PC range resources are replayed for this scenario.')
self.has_debugger = False
def get_memory_accesses(self, from_context: Context, to_context: Context) -> Iterator[MemoryAccess]:
try:
from_trans = from_context.transition_after()
to_trans = to_context.transition_before()
except IndexError:
return
accesses = self.trace.memory_accesses(from_transition=from_trans, to_transition=to_trans)
for access in accesses:
# Skip the access without virtual address
if access.virtual_address is None:
continue
# Skip the access without instruction
if access.transition.instruction is None:
continue
# Skip the access of `lock` prefixed instruction
ins_bytes = access.transition.instruction.raw
if ins_bytes[0] == 0xf0:
continue
yield access
@staticmethod
def get_lock_handle(ctxt: Context) -> int:
return ctxt.read(x64.rcx)
@staticmethod
def thread_id(ctxt: Context) -> int:
return ctxt.read(LogicalAddress(0x48, x64.gs), 4)
@staticmethod
def is_kernel_mode(ctxt: Context) -> bool:
return ctxt.read(x64.cs) & 0x3 == 0
# %%
# Look for a possible first execution context of a binary
# Find the lower/upper bound of contexts on which the deadlock detection processes
def find_begin_end_context(sco: RuntimeHelper, pid: int, binary: Optional[str],
begin_id: Optional[int], end_id: Optional[int]) -> Tuple[Context, Context]:
begin_ctxt = None
if begin_id is not None:
try:
begin_trans = sco.trace.transition(begin_id)
begin_ctxt = begin_trans.context_after()
except IndexError:
begin_ctxt = None
if begin_ctxt is None:
if binary is not None:
for name in sco.ossi.executed_binaries(binary):
for ctxt in sco.search_binary(name):
ctx_process = ctxt.ossi.process()
assert ctx_process is not None
if ctx_process.pid == pid:
begin_ctxt = ctxt
break
if begin_ctxt is not None:
break
if begin_ctxt is None:
begin_ctxt = sco.trace.first_context
end_ctxt = None
if end_id is not None:
try:
end_trans = sco.trace.transition(end_id)
end_ctxt = end_trans.context_before()
except IndexError:
end_ctxt = None
if end_ctxt is None:
end_ctxt = sco.trace.last_context
if (end_ctxt <= begin_ctxt):
raise RuntimeError("The begin transition must be smaller than the end.")
return (begin_ctxt, end_ctxt)
# Get all execution contexts of a given process
def find_process_ranges(sco: RuntimeHelper, pid: int, first_context: Context, last_context: Context) \
-> Iterator[Tuple[Context, Optional[Context]]]:
if last_context <= first_context:
return iter(())
ctxt_low = first_context
assert sco.symbols['KiSwapContext'] is not None
ki_swap_context = sco.symbols['KiSwapContext']
for ctxt in sco.search_symbol(ki_swap_context, from_context=first_context,
to_context=None if last_context == sco.trace.last_context else last_context):
ctx_process = ctxt_low.ossi.process()
assert ctx_process is not None
if ctx_process.pid == pid:
yield (ctxt_low, ctxt)
ctxt_low = ctxt
ctx_process = ctxt_low.ossi.process()
assert ctx_process is not None
if ctx_process.pid == pid:
if ctxt_low < last_context:
yield (ctxt_low, last_context)
else:
# So ctxt_low == last_context, using None for the upper bound.
# This happens only when last_context is in the process, and is also the last context of the trace.
yield (ctxt_low, None)
# Start from a transition, return the first transition that is not a non-instruction, or None if there isn't one.
def ignore_non_instructions(trans: Transition, trace: Trace) -> Optional[Transition]:
while trans.instruction is None:
if trans == trace.last_transition:
return None
trans = trans + 1
return trans
# Extract user mode only context ranges from a context range (which may include also kernel mode ranges)
def find_usermode_ranges(sco: RuntimeHelper, ctxt_low: Context, ctxt_high: Optional[Context]) \
-> Iterator[Tuple[Context, Context]]:
if ctxt_high is None:
return
trans = ignore_non_instructions(ctxt_low.transition_after(), sco.trace)
if trans is None:
return
ctxt_current = trans.context_before()
while ctxt_current < ctxt_high:
ctxt_next = ctxt_current.find_register_change(x64.cs, is_forward=True)
if not RuntimeHelper.is_kernel_mode(ctxt_current):
if ctxt_next is None or ctxt_next > ctxt_high:
yield (ctxt_current, ctxt_high)
break
else:
# It's safe to decrease ctxt_next by 1 because it was obtained from a forward find_register_change
yield (ctxt_current, ctxt_next - 1)
if ctxt_next is None:
break
ctxt_current = ctxt_next
# Get user mode only execution contexts of a given process
def find_process_usermode_ranges(trace: RuntimeHelper, pid: int, first_ctxt: Context, last_ctxt: Context) \
-> Iterator[Tuple[Context, Context]]:
for (ctxt_low, ctxt_high) in find_process_ranges(trace, pid, first_ctxt, last_ctxt):
usermode_ranges = find_usermode_ranges(trace, ctxt_low, ctxt_high)
for usermode_range in usermode_ranges:
yield usermode_range
def build_ordered_api_calls(sco: RuntimeHelper, binary: Optional[str],
ctxt_low: Context, ctxt_high: Context, apis: List[str]) -> Iterator[Tuple[str, Context]]:
def gen(api):
return (
(api, ctxt)
for ctxt in sco.search_symbol(sco.symbols[api], from_context=ctxt_low,
to_context=None if ctxt_high == sco.trace.last_context else ctxt_high)
)
api_contexts = (
gen(api)
for api in apis if api in sco.symbols
)
if binary is None:
for api_ctxt in collate(api_contexts, key=lambda name_ctxt: name_ctxt[1]):
yield api_ctxt
else:
for (api, ctxt) in collate(api_contexts, key=lambda name_ctxt: name_ctxt[1]):
try:
caller_ctxt = ctxt - 1
except IndexError:
continue
caller_location = caller_ctxt.ossi.location()
if caller_location is None:
continue
caller_binary = caller_location.binary
if caller_binary is None:
continue
if binary in [caller_binary.name, caller_binary.filename, caller_binary.path]:
yield (api, ctxt)
def get_return_value(sco: RuntimeHelper, ctxt: Context) -> Optional[int]:
if not sco.has_debugger:
return None
try:
trans_after = ctxt.transition_after()
except IndexError:
return None
if trans_after is None:
return None
trans_ret = trans_after.step_out()
if trans_ret is None:
return None
ctxt_ret = trans_ret.context_after()
return ctxt_ret.read(x64.rax)
class SynchronizationAction(Enum):
LOCK = 1
UNLOCK = 0
def get_locks_unlocks(sco: RuntimeHelper, binary: Optional[str], ranges: List[Tuple[Context, Context]]) \
-> List[Tuple[SynchronizationAction, Context]]:
lock_unlock_apis = {
'RtlEnterCriticalSection': SynchronizationAction.LOCK,
'RtlLeaveCriticalSection': SynchronizationAction.UNLOCK,
'RtlTryEnterCriticalSection': None,
}
critical_section_actions = []
for ctxt_low, ctxt_high in ranges:
for name, ctxt in build_ordered_api_calls(sco, binary, ctxt_low, ctxt_high, list(lock_unlock_apis.keys())):
# either lock, unlock, or none
action = lock_unlock_apis[name]
if action is None:
# need to check the return value of the API to get the action
api_ret_val = get_return_value(sco, ctxt)
if api_ret_val is None:
print(f'Warning: failed to get the return value, {name} is omitted')
continue
else:
# RtlTryEnterCriticalSection: the return value is nonzero if the thread is success
# to enter the critical section
if api_ret_val != 0:
action = SynchronizationAction.LOCK
else:
continue
critical_section_actions.append((action, ctxt))
return critical_section_actions
# Get locks which are effective at a transition
def get_live_locks(sco: RuntimeHelper,
locks_unlocks: List[Tuple[SynchronizationAction, Context]], trans: Transition) -> List[Context]:
protection_locks: List[Context] = []
trans_ctxt = trans.context_before()
for (action, ctxt) in locks_unlocks:
if ctxt > trans_ctxt:
return protection_locks
if action == SynchronizationAction.LOCK:
protection_locks.append(ctxt)
continue
unlock_handle = RuntimeHelper.get_lock_handle(ctxt)
# look for the lastest corresponding lock
for (idx, lock_ctxt) in reversed(list(enumerate(protection_locks))):
lock_handle = RuntimeHelper.get_lock_handle(lock_ctxt)
if lock_handle == unlock_handle:
del protection_locks[idx]
break
return protection_locks
# %%
AccessType = TypeVar("AccessType")
@dataclass
class MemorySegment:
address: Union[LinearAddress, LogicalAddress]
size: int
@dataclass
class MemorySegmentAccess(Generic[AccessType]):
segment: MemorySegment
accesses: List[AccessType]
# Insert a segment into a current list of segments, start at position; if the position is None then start from the
# head of the list.
# Precondition: `new_segment.segment.size > 0`
# Return the position where the segment is inserted.
# The complexity of the insertion is about O(n).
def insert_memory_segment_access(new_segment: MemorySegmentAccess, segments: List[MemorySegmentAccess],
position: Optional[int]) -> int:
new_address = new_segment.segment.address
new_size = new_segment.segment.size
new_accesses = new_segment.accesses
if not segments:
segments.append(MemorySegmentAccess(MemorySegment(new_address, new_size), new_accesses))
return 0
if position is None:
position = 0
index = position
first_found_index = None
while index < len(segments):
# Loop invariant `new_size > 0`
# - True at the first iteration
# - In the subsequent iterations, either
# - the invariant is always kept through cases (2), (5), (6), (7)
# - the loop returns directly in cases (1), (3), (4)
address = segments[index].segment.address
size = segments[index].segment.size
accesses = segments[index].accesses
if new_address < address:
# Case 1
# |--------------|
# |--------------|
if new_address + new_size <= address:
segments.insert(index,
MemorySegmentAccess(MemorySegment(new_address, new_size), new_accesses))
return first_found_index if first_found_index is not None else index
# Case 2
# |--------------|
# |--------------|
else:
# Insert the different part of the new segment
segments.insert(index,
MemorySegmentAccess(
MemorySegment(new_address, address.offset - new_address.offset), new_accesses))
if first_found_index is None:
first_found_index = index
# The common part will be handled in the next iteration by either case (3), (4), or (5)
#
# Since
# - `new_address + new_size > address` (else condition of case 2), so
# - `new_size > address.offset - new_address.offset`
# then invariant `new_size > 0`
new_size -= (address.offset - new_address.offset)
new_address = address
index += 1
elif new_address == address:
# case 3
# |--------------|
# |--------|
if new_address + new_size < address + size:
segments.insert(index,
MemorySegmentAccess(MemorySegment(new_address, new_size), accesses + new_accesses))
segments[index + 1] = MemorySegmentAccess(
MemorySegment(new_address + new_size, size - new_size), accesses)
return first_found_index if first_found_index is not None else index
# case 4
# |--------------|
# |--------------|
elif new_address + new_size == address + size:
segments[index] = MemorySegmentAccess(MemorySegment(new_address, new_size), accesses + new_accesses)
return first_found_index if first_found_index is not None else index
# case 5
# |--------------|
# |------------------|
# new_address + new_size > address + size
else:
# Current segment's accesses are augmented by new segment's accesses
segments[index] = MemorySegmentAccess(MemorySegment(address, size), accesses + new_accesses)
if first_found_index is None:
first_found_index = index
# The different part of the new segment will be handled in the next iteration
#
# Since:
# - `new_address == address` and `new_address + new_size > address + size`, so
# - `new_size > size`
# then invariant `new_size > 0`
new_address = address + size
new_size = new_size - size
index += 1
# new_address > address
else:
# case 6
# |--------------|
# |-----------|
if new_address >= address + size:
index += 1
# case 7
# |--------------|
# |-----------|
else:
# Split the current segment into:
# - the different part
segments[index] = MemorySegmentAccess(
MemorySegment(address, new_address.offset - address.offset), accesses)
# - the common part
segments.insert(index + 1, MemorySegmentAccess(
MemorySegment(new_address, address.offset + size - new_address.offset), accesses))
# The `new_segment` will be handled in the next iteration by either case (3), (4) or (5)
index += 1
segments.append(MemorySegmentAccess(MemorySegment(new_address, new_size), new_accesses))
index = len(segments) - 1
return first_found_index if first_found_index is not None else index
ThreadId = int
# Get the common memory accesses of two threads of the same process.
# Preconditions on both parameters:
# 1. The list of MemorySegmentAccess is sorted by address and size
# 2. For each MemorySegmentAccess access, `access.segment.size > 0`
# Each thread has a sorted (by address and size) memory segment access list; segments are not empty (i.e `size > 0`).
def get_common_memory_accesses(
first_thread_segment_accesses: Tuple[ThreadId,
List[MemorySegmentAccess[Tuple[Transition, MemoryAccessOperation]]]],
second_thread_segment_accesses: Tuple[ThreadId,
List[MemorySegmentAccess[Tuple[Transition, MemoryAccessOperation]]]]) \
-> List[Tuple[MemorySegment,
Tuple[List[Tuple[Transition, MemoryAccessOperation]],
List[Tuple[Transition, MemoryAccessOperation]]]]]:
(first_thread_id, first_segment_accesses) = first_thread_segment_accesses
(second_thread_id, second_segment_accesses) = second_thread_segment_accesses
# Merge these lists into a new list
merged_accesses: List[MemorySegmentAccess[Tuple[ThreadId, Transition, MemoryAccessOperation]]] = []
i = j = 0
while i < len(first_segment_accesses) and j < len(second_segment_accesses):
(first_segment, first_accesses) = (first_segment_accesses[i].segment, first_segment_accesses[i].accesses)
(second_segment, second_accesses) = (second_segment_accesses[j].segment, second_segment_accesses[j].accesses)
first_threaded_accesses = [
(first_thread_id, trans, mem_operation) for (trans, mem_operation) in first_accesses]
second_threaded_accesses = [
(second_thread_id, trans, mem_operation) for (trans, mem_operation) in second_accesses]
(first_address, first_size) = (first_segment.address, first_segment.size)
(second_address, second_size) = (second_segment.address, second_segment.size)
if (first_address, first_size) < (second_address, second_size):
merged_accesses.append(MemorySegmentAccess(first_segment, first_threaded_accesses))
i += 1
elif (first_address, first_size) > (second_address, second_size):
merged_accesses.append(MemorySegmentAccess(second_segment, second_threaded_accesses))
j += 1
else:
merged_accesses.append(MemorySegmentAccess(
first_segment, first_threaded_accesses + second_threaded_accesses))
i += 1
j += 1
while i < len(first_segment_accesses):
(first_segment, first_accesses) = (first_segment_accesses[i].segment, first_segment_accesses[i].accesses)
first_threaded_accesses = [
(first_thread_id, trans, mem_operation) for (trans, mem_operation) in first_accesses]
merged_accesses.append(MemorySegmentAccess(first_segment, first_threaded_accesses))
i += 1
while j < len(second_segment_accesses):
(second_segment, second_accesses) = (second_segment_accesses[j].segment, second_segment_accesses[j].accesses)
second_threaded_accesses = [
(second_thread_id, trans, mem_operation) for (trans, mem_operation) in second_accesses]
merged_accesses.append(MemorySegmentAccess(second_segment, second_threaded_accesses))
j += 1
# The merged list needs to be segmented again to handle the case of overlapping segments coming from different
# threads.
# We start from an empty list `refined_accesses`, and gradually insert new segment into it.
#
# The list merged_accesses is sorted already, then the position where a segment is inserted will be always larger
# or equal the inserting position of the previous segment. We can use the inserting position of an segment as
# the starting position when looking for the position to insert the next segment.
#
# Though the complexity of `insert_memory_segment_access` is O(n) in average case, the insertion in the loop
# below happens mostly at the end of the list (with the complexity O(1)). The complexity of the loop is still O(n).
refined_accesses: List[MemorySegmentAccess[Tuple[ThreadId, Transition, MemoryAccessOperation]]] = []
last_inserted_index = None
for refined_access in merged_accesses:
last_inserted_index = insert_memory_segment_access(refined_access, refined_accesses, last_inserted_index)
common_accesses = []
for refined_access in refined_accesses:
first_thread_rws = []
second_thread_rws = []
for (thread_id, transition, operation) in refined_access.accesses:
if thread_id == first_thread_id:
first_thread_rws.append((transition, operation))
else:
second_thread_rws.append((transition, operation))
if first_thread_rws and second_thread_rws:
common_accesses.append((refined_access.segment, (first_thread_rws, second_thread_rws)))
return common_accesses
# Return True if the transition should be excluded by the caller for data race detection.
def transition_excluded_by_heuristics(sco: RuntimeHelper, trans: Transition, blacklist: Set[int]) -> bool:
if trans.id in blacklist:
return True
# Memory accesses of the first transition are considered non-protected
try:
trans_ctxt = trans.context_before()
except IndexError:
return False
trans_location = trans_ctxt.ossi.location()
if trans_location is None:
return False
if trans_location.binary is None:
return False
trans_binary_path = trans_location.binary.path
process_thread_core_dlls = [
'c:/windows/system32/ntdll.dll', 'c:/windows/system32/csrsrv.dll', 'c:/windows/system32/basesrv.dll'
]
if trans_binary_path not in process_thread_core_dlls:
return False
if trans_location.symbol is None:
return False
symbols = sco.symbols
# Accesses of the synchronization APIs themselves are not counted.
thread_sync_apis = [
symbols[name] for name in [
'RtlInitializeCriticalSection', 'RtlInitializeCriticalSectionEx',
'RtlInitializeCriticalSectionAndSpinCount',
'RtlEnterCriticalSection', 'RtlTryEnterCriticalSection', 'RtlLeaveCriticalSection'
]
if symbols[name] is not None
]
if trans_location.symbol in thread_sync_apis:
blacklist.add(trans.id)
return True
trans_frames = trans_ctxt.stack.frames()
# Accesses of thread create/shutdown are not counted
thread_create_apis = [
api for api in [
symbols['LdrpInitializeThread'], symbols['BaseSrvCreateThread'], symbols['CsrCreateThread']
]
if api is not None
]
thread_shutdown_apis = [
api for api in [
symbols['CsrDereferenceThread'], symbols['CsrThreadRefcountZero'], symbols['LdrShutdownThread']
]
if api is not None
]
for frame in trans_frames:
frame_ctxt = frame.first_context
frame_location = frame_ctxt.ossi.location()
if frame_location is None:
continue
if frame_location.binary is None:
continue
if frame_location.binary.path not in process_thread_core_dlls:
continue
if frame_location.symbol is None:
continue
if (
frame_location.symbol in thread_create_apis
or frame_location.symbol in thread_shutdown_apis
or frame_location.symbol in thread_sync_apis
):
blacklist.add(trans.id)
return True
return False
def get_threads_locks_unlocks(trace: RuntimeHelper, binary: Optional[str],
process_ranges: List[Tuple[Context, Context]]) \
-> Dict[ThreadId, List[Tuple[SynchronizationAction, Context]]]:
threads_ranges: Dict[int, List[Tuple[Context, Context]]] = {}
for (ctxt_lo, ctxt_hi) in process_ranges:
tid = RuntimeHelper.thread_id(ctxt_lo)
if tid not in threads_ranges:
threads_ranges[tid] = []
threads_ranges[tid].append((ctxt_lo, ctxt_hi))
threads_locks_unlocks: Dict[ThreadId, List[Tuple[SynchronizationAction, Context]]] = {}
for tid, ranges in threads_ranges.items():
threads_locks_unlocks[tid] = get_locks_unlocks(trace, binary, ranges)
return threads_locks_unlocks
# Get all segmented memory accesses of each thread given a list of context ranges.
# Return a map: thread_id -> list(((mem_addr, mem_size), list((thread_id, transition, mem_operation))))
def get_threads_segmented_memory_accesses(trace: RuntimeHelper, process_ranges: List[Tuple[Context, Context]]) \
-> Dict[ThreadId, List[MemorySegmentAccess[Tuple[Transition, MemoryAccessOperation]]]]:
sorted_threads_segmented_memory_accesses: Dict[
ThreadId,
List[MemorySegmentAccess[Tuple[Transition, MemoryAccessOperation]]]
] = {}
for (ctxt_lo, ctxt_hi) in process_ranges:
tid = RuntimeHelper.thread_id(ctxt_lo)
if tid not in sorted_threads_segmented_memory_accesses:
sorted_threads_segmented_memory_accesses[tid] = []
for access in trace.get_memory_accesses(ctxt_lo, ctxt_hi):
access_virtual_address = access.virtual_address
if access_virtual_address is None:
# DMA access
continue
sorted_threads_segmented_memory_accesses[tid].append(MemorySegmentAccess(
MemorySegment(access_virtual_address, access.size),
[(access.transition, access.operation)]))
# The memory segment accesses of each thread are sorted by address and size of segments, that will help to improve
# the performance of looking for common accesses of two threads.
# The complexity of sorting is O(n * logn)
# Note that segments can be still overlapped.
for tid in sorted_threads_segmented_memory_accesses.keys():
sorted_threads_segmented_memory_accesses[tid].sort(key=lambda x: (x.segment.address, x.segment.size))
threads_segmented_memory_accesses: Dict[
ThreadId,
List[MemorySegmentAccess[Tuple[Transition, MemoryAccessOperation]]]
] = {}
# The non-overlapped memory segment access list of each thread is built by:
# - start from an empty list
# - gradually insert segment into the list using `insert_memory_segment_access`
# Since the original segment lists are sorted, the complexity of the build is O(n). The total complexity of
# constructing the non-overlapped memory segment accesses of each thread is 0(n * logn) then.
for tid in sorted_threads_segmented_memory_accesses.keys():
threads_segmented_memory_accesses[tid] = []
last_mem_acc_index = None
for seg_mem_acc in sorted_threads_segmented_memory_accesses[tid]:
last_mem_acc_index = insert_memory_segment_access(
seg_mem_acc, threads_segmented_memory_accesses[tid], last_mem_acc_index
)
return threads_segmented_memory_accesses
TransitionId = int
InstructionPointer = int
LockHandle = int
def detect_data_race(trace: RuntimeHelper, pid: int, binary: Optional[str],
begin_transition_id: Optional[TransitionId], end_transition_id: Optional[TransitionId]):
def handlers_to_string(handlers: Iterable[int]):
msgs = [f'{handle:#x}' for handle in handlers]
message = ', '.join(msgs)
return message
def get_live_lock_handles(
sco: RuntimeHelper, locks_unlocks: List[Tuple[SynchronizationAction, Context]], trans: Transition
) -> Tuple[List[Context], Set[LockHandle]]:
locks = get_live_locks(sco, locks_unlocks, trans)
return (locks, {RuntimeHelper.get_lock_handle(ctxt) for ctxt in locks})
(first_context, last_context) = find_begin_end_context(trace, pid, binary, begin_transition_id, end_transition_id)
process_ranges = list(find_process_usermode_ranges(trace, pid, first_context, last_context))
threads_memory_accesses = get_threads_segmented_memory_accesses(trace, process_ranges)
threads_locks_unlocks = get_threads_locks_unlocks(trace, binary, process_ranges)
thread_ids: Set[ThreadId] = set()
# map from a transition (of a given thread) to a tuple whose
# - the first is a list of contexts of calls (e.g. RtlEnterCriticalSection) to lock
# - the second is a set of handles used by these calls
cached_live_locks_handles: Dict[Tuple[ThreadId, TransitionId], Tuple[List[Context], Set[LockHandle]]] = {}
cached_pcs: Dict[TransitionId, InstructionPointer] = {}
filtered_trans_ids: Set[TransitionId] = set()
for first_tid, first_thread_accesses in threads_memory_accesses.items():
thread_ids.add(first_tid)
for second_tid, second_thread_accesses in threads_memory_accesses.items():
if second_tid in thread_ids:
continue
common_accesses = get_common_memory_accesses(
(first_tid, first_thread_accesses), (second_tid, second_thread_accesses)
)
for (segment, (first_accesses, second_accesses)) in common_accesses:
(segment_address, segment_size) = (segment.address, segment.size)
race_pcs: Set[Tuple[InstructionPointer, InstructionPointer]] = set()
for (first_transition, first_rw) in first_accesses:
if first_transition.id in cached_pcs:
first_pc = cached_pcs[first_transition.id]
else:
first_pc = first_transition.context_before().read(x64.rip)
if (first_tid, first_transition.id) not in cached_live_locks_handles:
critical_section_locks_unlocks = threads_locks_unlocks[first_tid]
cached_live_locks_handles[(first_tid, first_transition.id)] = get_live_lock_handles(
trace, critical_section_locks_unlocks, first_transition)
for (second_transition, second_rw) in second_accesses:
if second_transition.id in cached_pcs:
second_pc = cached_pcs[second_transition.id]
else:
second_pc = second_transition.context_before().read(x64.rip)
cached_pcs[second_transition.id] = second_pc
# the execution of an instruction is considered atomic
if first_pc == second_pc:
continue
if (second_tid, second_transition.id) not in cached_live_locks_handles:
critical_section_locks_unlocks = threads_locks_unlocks[second_tid]
cached_live_locks_handles[(second_tid, second_transition.id)] = get_live_lock_handles(
trace, critical_section_locks_unlocks, second_transition)
# Skip if both are the same operation (i.e. read/read or write/write)
if first_rw == second_rw:
continue
first_is_write = first_rw == MemoryAccessOperation.Write
second_is_write = second_rw == MemoryAccessOperation.Write
if first_transition < second_transition:
before_mem_opr_str = 'write' if first_is_write else 'read'
after_mem_opr_str = 'write' if second_is_write else 'read'
before_trans = first_transition
after_trans = second_transition
before_tid = first_tid
after_tid = second_tid
else:
before_mem_opr_str = 'write' if second_is_write else 'read'
after_mem_opr_str = 'write' if first_is_write else 'read'
before_trans = second_transition
after_trans = first_transition
before_tid = second_tid
after_tid = first_tid
# Duplicated
if (first_pc, second_pc) in race_pcs:
continue
race_pcs.add((first_pc, second_pc))
# ===== No data race
message = f'No data race during {after_mem_opr_str} at \
[{segment_address.offset:#x}, {segment_size}] (transition #{after_trans.id}) by thread {after_tid}\n\
with a previous {before_mem_opr_str} (transition #{before_trans.id}) by thread {before_tid}'
# There is a common critical section synchronizing the accesses
live_critical_sections: Dict[ThreadId, Set[LockHandle]] = {
first_tid: cached_live_locks_handles[(first_tid, first_transition.id)][1],
second_tid: cached_live_locks_handles[(second_tid, second_transition.id)][1]
}
common_critical_sections = live_critical_sections[first_tid].intersection(
live_critical_sections[second_tid]
)
if common_critical_sections:
if not hide_synchronized_accesses:
handlers_str = handlers_to_string(common_critical_sections)
print(f'{message}: synchronized by critical section(s) ({handlers_str}).\n')
continue
# Accesses are excluded since they are in thread create/shutdown
are_excluded = transition_excluded_by_heuristics(
trace, first_transition, filtered_trans_ids
) and transition_excluded_by_heuristics(trace, second_transition, filtered_trans_ids)
if are_excluded:
if not hide_synchronized_accesses:
print(f'{message}: shared accesses in create/shutdown threads')
continue
# ===== Data race
# If there is no data race, then show the un-synchronized accesses in the following order
# 1. one of them are protected by one or several locks
# 2. both of them are not protected by any lock
message = f'Possible data race during {after_mem_opr_str} at \
[{segment_address.offset:#x}, {segment_size}] (transition #{after_trans.id}) by thread {after_tid}\n\
with a previous {before_mem_opr_str} (transition #{before_trans.id}) by thread {before_tid}'
if live_critical_sections[first_tid] or live_critical_sections[second_tid]:
print(f'{message}.')
for tid in {first_tid, second_tid}:
if live_critical_sections[tid]:
handlers_str = handlers_to_string(live_critical_sections[tid])
print(f'\tCritical section handles(s) used by {tid}: {handlers_str}\n')
elif not suppress_unknown_primitives:
print(f'{message}: no critical section used.\n')
# %%
trace = RuntimeHelper(host, port)
detect_data_race(trace, pid, binary, begin_trans_id, end_trans_id)
Migration scripts
Scripts in this directory make it easier to migrate from some version of Reven to some other.
Migrate bookmarks from 2.5 to 2.6
Purpose
We fixed an issue in Reven 2.6 leading to some changes in the transition number for QEMU scenarios.
This script is here to help you migrate your bookmarks if they are off after replaying your scenario with Reven 2.6.
How to use
Launch after updating the trace resource for your scenario to Reven 2.6+.
usage: migrate_bookmarks_2.5_to_2.6.py [-h] [--host HOST] [-p PORT]
optional arguments:
-h, --help show this help message and exit
--host HOST Reven host, as a string (default: "localhost")
-p PORT, --port PORT Reven port, as an int (default: 13370)
Known limitations
- It also does not attempt to determine whether the scenario needs to be upgraded or not. Applying the script when bookmarks don't need to be upgraded will actually put them at the wrong position. Apply only if you notice that the bookmarks have been put at the wrong position after updating.
Supported versions
Reven 2.6+
Supported perimeter
Reven scenarios recorded with QEMU.
Dependencies
None.
Source
#!/usr/bin/env python3
import argparse
import sys
import reven2
"""
# Migrate bookmarks from 2.5 to 2.6
## Purpose
We fixed an issue in Reven 2.6 leading to some changes in the transition number for QEMU scenarios.
This script is here to help you migrate your bookmarks if they are off after replaying your scenario with Reven 2.6.
## How to use
Launch after updating the trace resource for your scenario to Reven 2.6+.
```bash
usage: migrate_bookmarks_2.5_to_2.6.py [-h] [--host HOST] [-p PORT]
optional arguments:
-h, --help show this help message and exit
--host HOST Reven host, as a string (default: "localhost")
-p PORT, --port PORT Reven port, as an int (default: 13370)
```
## Known limitations
- It also does not attempt to determine whether the scenario needs to be upgraded or not.
Applying the script when bookmarks don't need to be upgraded will actually put them at the wrong position.
Apply only if you notice that the bookmarks have been put at the wrong position after updating.
## Supported versions
Reven 2.6+
## Supported perimeter
Reven scenarios recorded with QEMU.
## Dependencies
None.
"""
def migrate_bookmarks(reven_server):
offset_table = {}
def get_offset(transition_id):
lower_bound = None
for key in offset_table.keys():
if key <= transition_id and (lower_bound is None or lower_bound < key):
lower_bound = key
return offset_table[lower_bound] if lower_bound is not None else 0
print("Generating offset table...")
c = reven_server.trace.context_before(0)
counter = 0
while c is not None:
c = c.find_register_change(reven2.arch.x64.cr2)
if c is None:
continue
t_exception = c.transition_before()
t_before_exception = t_exception - 1
ctx_before = t_exception.context_before()
ctx_after = t_exception.context_after()
# We are looking for code pagefault, so CR2 will contains PC
cr2 = ctx_after.read(reven2.arch.x64.cr2)
if ctx_before.is64b():
if cr2 != ctx_before.read(reven2.arch.x64.rip):
continue
else:
if cr2 != ctx_before.read(reven2.arch.x64.eip):
continue
try:
# The issues occurred when the previous instruction was just doing some reads and not any write
# Warning: This could not work if the trace is desync with the memory
next(t_before_exception.memory_accesses(operation=reven2.memhist.MemoryAccessOperation.Write))
except StopIteration:
counter += 1
offset_table[t_exception.id - counter] = counter
print("Migrating bookmarks...")
for bookmark in list(reven_server.bookmarks.all()):
offset = get_offset(bookmark.transition.id)
print(
" id: %d | %60.60s | Transition #%d => #%d (+%d)"
% (bookmark.id, bookmark.description, bookmark.transition.id, bookmark.transition.id + offset, offset)
)
if offset == 0:
continue
reven_server.bookmarks.add(bookmark.transition + offset, bookmark.description)
reven_server.bookmarks.remove(bookmark)
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--host", type=str, default="localhost", help='Reven host, as a string (default: "localhost")')
parser.add_argument("-p", "--port", type=int, default=13370, help="Reven port, as an int (default: 13370)")
args = parser.parse_args()
answer = ""
while answer not in ["y", "n"]:
answer = input(
"This script should be launched on a QEMU scenario with the trace generated with Reven 2.6 with bookmarks"
" that were added in Reven 2.5 or older. Do you want to continue [Y/N]? "
).lower()
if answer == "n":
print("Aborting")
sys.exit(0)
reven_server = reven2.RevenServer(args.host, args.port)
migrate_bookmarks(reven_server)
print("Bookmarks migrated!")
Import classic bookmarks
Purpose
Import classic bookmarks (created using up to Reven 2.4) from ".rbm" files to the "server-side" bookmarks system (from Reven 2.5+).
How to use
usage: import_bookmarks.py [-h] [--host HOST] [-p PORT] [-f FILENAME]
[--prepend-symbol]
optional arguments:
-h, --help show this help message and exit
--host HOST Reven host, as a string (default: "localhost")
-p PORT, --port PORT Reven port, as an int (default: 13370)
-f FILENAME, --filename FILENAME
Path to the classic bookmark file (*.rbm).
--prepend-symbol If set, prepend the OSSI symbol as stored in the
classic symbol file to the description of the bookmark
Known limitations
N/A
Supported versions
Reven 2.5+
Supported perimeter
Any Reven scenario for which a .rbm is available.
Dependencies
None.
Source
import argparse
import json
import reven2
"""
# Import classic bookmarks
## Purpose
Import classic bookmarks (created using up to Reven 2.4) from ".rbm" files to the "server-side" bookmarks system
(from Reven 2.5+).
## How to use
```bash
usage: import_bookmarks.py [-h] [--host HOST] [-p PORT] [-f FILENAME]
[--prepend-symbol]
optional arguments:
-h, --help show this help message and exit
--host HOST Reven host, as a string (default: "localhost")
-p PORT, --port PORT Reven port, as an int (default: 13370)
-f FILENAME, --filename FILENAME
Path to the classic bookmark file (*.rbm).
--prepend-symbol If set, prepend the OSSI symbol as stored in the
classic symbol file to the description of the bookmark
```
## Known limitations
N/A
## Supported versions
Reven 2.5+
## Supported perimeter
Any Reven scenario for which a .rbm is available.
## Dependencies
None.
"""
def import_bookmarks(reven_server, rbm_path, prepend_symbol=False):
r"""
This function is a helper to import classic bookmarks from ".rbm" files to the new "server-side" bookmarks system.
Examples
========
>>> # Import bookmarks
>>> f = "Reven2/2.5.0-rc2-1-ga1b971b/Scenarios/bksod_ff34e5e1-dfaa-41fe-88b0-fdad14993fe3/UserData/bookmarks.rbm"
>>> import_bookmarks(reven_server, f)
>>> for bookmark in reven_server.bookmarks.all():
... print(bookmark)
#169672818: 'mst120 deallocated by network'
#8655429: 'mst120 allocated by system'
#8627412: 'IcaRawInput looks nice to see decrypted data'
#1231549571: 'ica find channel on this pointer???'
#1141851788: 'Same pointer reallocated to something else'
#1231549773: 'crash'
>>> # Import bookmarks, prepending the known symbol before the description
>>> import_bookmarks(reven_server, f, prepend_symbol=True)
>>> for bookmark in reven_server.bookmarks.all():
... print(bookmark)
#169672818: 'ExFreePoolWithTag+0x0 - ntoskrnl.exe: mst120 deallocated by network'
#8655429: 'ExAllocatePoolWithTag+0x1df - ntoskrnl.exe: mst120 allocated by system'
#8627412: 'IcaRawInput+0x0 - termdd.sys: IcaRawInput looks nice to see decrypted data'
#1231549571: 'IcaFindChannel+0x3d - termdd.sys: ica find channel on this pointer???'
#1141851788: 'ExAllocatePoolWithTag+0x1df - ntoskrnl.exe: Same pointer reallocated to something else'
#1231549773: 'ExpCheckForIoPriorityBoost+0xa7 - ntoskrnl.exe: crash'
Information
===========
@param reven_server: The C{reven2.RevenServer} instance on which you wish to import the bookmarks.
@param rbm_path: Path to the classic bookmark file.
@param prepend_symbol: If C{True}, prepend the OSSI symbol as stored in the classic symbol file to the description
of the bookmark.
"""
with open(rbm_path) as f:
json_bookmarks = json.load(f)
for json_bookmark in json_bookmarks.values():
try:
transition = reven_server.trace.transition(int(json_bookmark["identifier"]))
description_prefix = (json_bookmark["symbol"] + ": ") if prepend_symbol else ""
description = description_prefix + json_bookmark["description"]
reven_server.bookmarks.add(transition, str(description))
except IndexError:
print(
"Skipping import of bookmark at transition {} which is out of range".format(
json_bookmark["identifier"]
)
)
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--host", type=str, default="localhost", help='Reven host, as a string (default: "localhost")')
parser.add_argument("-p", "--port", type=int, default="13370", help="Reven port, as an int (default: 13370)")
parser.add_argument("-f", "--filename", type=str, help="Path to the classic bookmark file (*.rbm).")
parser.add_argument(
"--prepend-symbol",
action="store_true",
help="If set, prepend the OSSI symbol as stored in the classic symbol file to the "
"description of the bookmark",
)
args = parser.parse_args()
reven_server = reven2.RevenServer(args.host, args.port)
import_bookmarks(reven_server, args.filename, args.prepend_symbol)
print("Bookmarks imported!")
Find symbols that access a specific memory range
Purpose
This notebook and script are designed to find all symbols that access a specific memory range. This script searches a Reven trace for all symbols that accessed a specific memory range. The script can filter the results by processes, ring, included binaries, excluded binaries, excluded symbols, context range and memory access operation. The script can generate two kinds of results:
- process, binary and symbol information for each memory access.
- for each symbol, all the memory accesses that occurred in that symbol. Note that this option can take long time to start showing results, especially when there is many nested functions or many functions that don't end in the trace. Note that:
- accesses will be reported as belonging to the innermost symbol that has not been excluded and whose binary has not been excluded in the configuration.
- we consider that we are "in a symbol" when the corresponding context.location.symbol returns this symbol.
Reven returns the closest symbol with an
rva
lower than ours. Note that we are not trying to determine the exact bounds of the function with that symbol for name. In particular, when there are missing symbols, this may report a symbol we saw a long time ago rather than
How to use
Results can be generated from this notebook or from the command line. The script can also be imported as a module for use from your own script or notebook.
From the notebook
- Upload the
symbols_access_memory_range.ipynb
file in Jupyter. - Fill out the parameters cell of this notebook according to your scenario and desired output.
- Run the full notebook.
From the command line
- Make sure that you are in an environment that can run Reven scripts.
- Run
python symbols_access_memory_range.py --help
to get a tour of available arguments. - Run
python symbols_access_memory_range.py --host <your_host> --port <your_port> [<other_option>]
with your arguments of choice.
Imported in your own script or notebook
- Make sure that you are in an environment that can run Reven scripts.
- Make sure that
symbols_access_memory_range.py
is in the same directory as your script or notebook. - Add
import symbols_access_memory_range
to your script or notebook. You can access the various functions and classes exposed by the module from thesymbols_access_memory_range
namespace. - Refer to the Argument parsing cell for an example of use in a script, and to the
Parameters cell and below for an example of use in a notebook (you just need to preprend
symbols_access_memory_range
in front of the functions and classes from the script).
Known limitations
N/A.
Supported versions
Reven 2.10+
Supported perimeter
Any Reven scenario.
Dependencies
The script requires that the target Reven scenario have:
- The OSSI feature replayed.
- The memory history feature replayed.
- pandas python module
Source
# ---
# jupyter:
# jupytext:
# formats: ipynb,py:percent
# text_representation:
# extension: .py
# format_name: percent
# kernelspec:
# display_name: reven
# language: python
# name: reven-python3
# ---
# %% [markdown]
# # Find symbols that access a specific memory range
#
# ## Purpose
#
# This notebook and script are designed to find all symbols that access a specific memory range.
#
# This script searches a Reven trace for all symbols that accessed a specific memory range.
# The script can filter the results by processes, ring, included binaries, excluded binaries, excluded
# symbols, context range and memory access operation.
#
# The script can generate two kinds of results:
# - process, binary and symbol information for each memory access.
# - for each symbol, all the memory accesses that occurred in that symbol.
# Note that this option can take long time to start showing results,
# especially when there is many nested functions or many functions that don't end in the trace.
#
# Note that:
# - accesses will be reported as belonging to the innermost symbol that has not been excluded
# and whose binary has not been excluded in the configuration.
# - we consider that we are "in a symbol" when the corresponding context.location.symbol returns this symbol.
# Reven returns the closest symbol with an `rva` lower than ours. Note that we are not trying to determine the
# exact bounds of the function with that symbol for name. In particular, when there are missing symbols,
# this may report a symbol we saw a long time ago rather than <unknown>
#
#
#
# ## How to use
#
# Results can be generated from this notebook or from the command line.
# The script can also be imported as a module for use from your own script or notebook.
#
#
# ### From the notebook
#
# 1. Upload the `symbols_access_memory_range.ipynb` file in Jupyter.
# 2. Fill out the [parameters](#Parameters) cell of this notebook according to your scenario and desired output.
# 3. Run the full notebook.
#
#
# ### From the command line
#
# 1. Make sure that you are in an environment that can run Reven scripts.
# 2. Run `python symbols_access_memory_range.py --help` to get a tour of available arguments.
# 3. Run `python symbols_access_memory_range.py --host <your_host> --port <your_port> [<other_option>]` with your
# arguments of choice.
#
# ### Imported in your own script or notebook
#
# 1. Make sure that you are in an environment that can run Reven scripts.
# 2. Make sure that `symbols_access_memory_range.py` is in the same directory as your script or notebook.
# 3. Add `import symbols_access_memory_range` to your script or notebook. You can access the various functions and
# classes exposed by the module from the `symbols_access_memory_range` namespace.
# 4. Refer to the [Argument parsing](#Argument-parsing) cell for an example of use in a script, and to the
# [Parameters](#Parameters) cell and below for an example of use in a notebook (you just need to preprend
# `symbols_access_memory_range` in front of the functions and classes from the script).
#
# ## Known limitations
#
# N/A.
#
# ## Supported versions
#
# Reven 2.10+
#
# ## Supported perimeter
#
# Any Reven scenario.
#
# ## Dependencies
#
# The script requires that the target Reven scenario have:
#
# * The OSSI feature replayed.
# * The memory history feature replayed.
# * pandas python module
# %% [markdown]
# ### Package imports
# %%
import argparse
from enum import Enum
from typing import Iterable as _Iterable, List
from typing import Optional as _Optional
from typing import cast as _cast
from IPython.core.display import display # type: ignore
import pandas
import reven2.address as _address
import reven2.arch as _arch
from reven2.filter import RingPolicy
from reven2.memhist import MemoryAccess, MemoryAccessOperation
from reven2.memory_range import MemoryRange
from reven2.ossi import Binary, Process, Symbol
from reven2.ossi.thread import Thread
from reven2.prelude import RevenServer
from reven2.stack import Stack
from reven2.trace import Context, Trace
from reven2.util import collate as _collate
# %% [markdown]
# ### Utility functions
# %%
# Detect if we are currently running a Jupyter notebook.
#
# This is used e.g. to display rendered results inline in Jupyter when we are executing in the context of a Jupyter
# notebook, or to display raw results on the standard output when we are executing in the context of a script.
def in_notebook():
try:
from IPython import get_ipython # type: ignore
if get_ipython() is None or ("IPKernelApp" not in get_ipython().config):
return False
except ImportError:
return False
return True
# %% [markdown]
# ### Helper classes for results
# %%
class CallSymbol:
r"""
CallSymbol is a helper class used to represent a symbol with its start and end context
"""
def __init__(self, symbol: _Optional[Symbol], start: Context, end: _Optional[Context] = None) -> None:
self._symbol = symbol
self._start = start
self._end = end
@property
def symbol(self) -> _Optional[Symbol]:
r"""
B{Property:} The symbol of the call symbol. None if the symbol is unknown.
"""
return self._symbol
@property
def start_context(self) -> Context:
r"""
B{Property:} The start context of the call symbol.
"""
return self._start
@property
def end_context(self) -> _Optional[Context]:
r"""
B{Property:} The end excluded context of the call symbol. None if the end context isn't in the trace.
"""
return self._end
def __eq__(self, other: "CallSymbol") -> bool: # type: ignore
return self._symbol == other._symbol and self._start == other._start and self._end == other._end
def __ne__(self, other: "CallSymbol") -> bool: # type: ignore
return not (self == other)
class MemoryRangeSymbolResult:
r"""
MemoryRangeSymbolResult is a helper class that represents one result of the search.
"""
def __init__(
self,
call_symbol: CallSymbol,
memory_access: _Optional[MemoryAccess],
ring: int,
process: _Optional[Process],
thread: _Optional[Thread],
binary: _Optional[Binary],
) -> None:
self._call_symbol = call_symbol
self._memory_accesses = [] if memory_access is None else [memory_access]
self._ring = ring
self._process = process
self._thread = thread
self._binary = binary
def copy(self) -> "MemoryRangeSymbolResult":
r"""
return a copy of this object
it makes a shallow copy of all attributes except for memory accesses where the list is deeply copied
"""
new_obj = MemoryRangeSymbolResult(
call_symbol=self._call_symbol,
memory_access=None,
ring=self._ring,
process=self._process,
thread=self._thread,
binary=self._binary,
)
if self._memory_accesses is not None:
new_obj._memory_accesses += self._memory_accesses
return new_obj
@property
def call_symbol(self) -> CallSymbol:
r"""
B{Property:} The call symbol of the result.
"""
return self._call_symbol
@property
def memory_accesses(self) -> List[MemoryAccess]:
r"""
B{Property:} The memory accesses of the result.
"""
return self._memory_accesses
@property
def ring(self) -> int:
r"""
B{Property:} The ring of the result.
"""
return self._ring
@property
def process(self) -> _Optional[Process]:
r"""
B{Property:} The process of the result.
"""
return self._process
@property
def binary(self) -> _Optional[Binary]:
r"""
B{Property:} The binary of the result. None if the binary is unknown.
"""
return self._binary
@property
def thread(self) -> _Optional[Thread]:
r"""
B{Property:} The thread of the result.
"""
return self._thread
def __eq__(self, other: "MemoryRangeSymbolResult") -> bool: # type: ignore
return (
self._ring == other._ring
and self._process is not None
and other._process is not None
and self._process.name == other._process.name
and self._process.pid == other._process.pid
and self._process.ppid == other._process.ppid
and self._thread is not None
and other._thread is not None
and self._thread.id == other._thread.id
and self._thread.owner_process_id == other._thread.owner_process_id
and (
(self._binary is None and other._binary is None)
or (self._binary is not None and other._binary is not None and self._binary.path == other._binary.path)
)
and self._call_symbol == other._call_symbol
)
def __ne__(self, other: "MemoryRangeSymbolResult") -> bool: # type: ignore
return not (self == other)
def __str__(self) -> str:
memory_accesses = "\nmemory accesses:"
for m in self._memory_accesses:
memory_accesses += f"\n\t{m}, "
memory_accesses += "\n"
return (
f"ring: {self._ring}, process: {self._process}, "
f"thread: {self._thread}, binary: {self._binary}, "
f"symbol: {self._call_symbol.symbol}[{self._call_symbol.start_context}, "
f"{self._call_symbol.end_context}[ {memory_accesses}"
)
def format_as_html(self):
r"""
This method gets an html formatting string representation for this class instance.
Information
===========
@returns: C{String}
"""
memory_accesses = "<p>memory accesses:</p><ol>"
for m in self._memory_accesses:
memory_accesses += f"<li>{m.format_as_html()}</li>"
memory_accesses += "</ol>"
return (
f"ring: {self._ring}, process: {self._process if self._process is not None else 'unknown'}, "
f" thread: {self._thread if self._thread is not None else 'unknown'}, binary: {self._binary}, "
f"symbol: {self._call_symbol.symbol}[{self._call_symbol.start_context}, "
f"{self._call_symbol.end_context}[ {memory_accesses}"
)
def _repr_html_(self):
r"""
Representation used by Jupyter Notebook when an instance of the this class is displayed in a cell.
"""
return "<p>{}</p>".format(self.format_as_html())
# %% [markdown]
# ### MemoryRangeSymbolFinder
#
# This class represents the main logic of this script
# %%
class MemoryRangeSymbolFinder(object):
r"""
This class is a helper class to search for all symbols that access a specific memory range.
Results can be filtered by processes, ring, binaries, excluded binaries, excluded symbols
and a context range.
The symbols that access this memory range are returned.
Examples
========
>>> # Search all symbols that access the memory range [ds:0xfffff8800115e180 ; 128]
>>> # filtered by the process `svchost.exe` at the context #410545055.
>>> processes = server.ossi.executed_processes('svchost.exe')
>>> memory_range = MemoryRange::from_string("[ds:0xfffff8800115e180 ; 128]")
>>> context = server.trace.context_before(410545055)
>>> symbol_mem_finder = MemoryRangeSymbolFinder(
... trace=server.trace, memory_range=memory_range,
... context=context, processes=processes)
>>> for r in symbol_mem_finder.query():
... print(r)
ring: 0, process: svchost.exe (1004), thread: 2256, binary: c:/windows/system32/drivers/cng.sys,
symbol: cng!AesCbcDecrypt[Context before #25208343, Context before #25212457[
memory accesses:
[#25208488 xor r8d, dword ptr ds:[r11+rax*4+0x800]]Read access at
@phy:0x36411e0 (virtual address: lin:0xfffff8800115e1e0) of size 4,
...
"""
def __init__(
self,
trace: Trace,
memory_range: MemoryRange,
translation_context: _Optional[Context] = None,
from_context: _Optional[Context] = None,
to_context: _Optional[Context] = None,
ring_policy: RingPolicy = RingPolicy.All,
processes: _Optional[_Iterable[Process]] = None,
included_binaries: _Optional[_Iterable[Binary]] = None,
excluded_binaries: _Optional[_Iterable[Binary]] = None,
excluded_symbols: _Optional[_Iterable[Symbol]] = None,
operation: _Optional[MemoryAccessOperation] = None,
) -> None:
r"""
Initialize a C{MemoryRangeSymbolFinder}
Information
===========
@param trace: the trace where symbols will be looked for.
@param memory_range: the memory range that are accessed by the returned symbols.
@param translation_context: context used to translate the memory range when it is virtual.
@param to_context: the context where the search will be ended.
@param ring_policy: ring policy to search for.
@param processes: processes to limit the search in it. If None, all processes will be filtered.
@param included_binaries: binaries that must be included in the search.
If None, all binaries will be included.
When binary is not included, all its symbols are ignored with its memory accesses
@param excluded_binaries: binaries that must be excluded from the search. If None nothing will be excluded.
Accesses performed in this binary are reported, but using the first caller
binary that is not excluded. Note that inclusion is applied before exclusion.
@param excluded_symbols: symbols that must be excluded from the search. If None nothing will be excluded.
Accesses performed in this symbol are reported, but using the first caller
symbol that is not excluded.
@param operation: limit results to accesses performing the specified operation.
@raises TypeError: if trace is not a C{reven2.trace.Trace}.
@raises ValueError: If provided memory range is virtual and the translation_context is None.
"""
if not isinstance(trace, Trace):
raise TypeError("You must provide a valid trace")
self._trace = trace
if isinstance(memory_range.address, _address.PhysicalAddress):
self._physical_memory_ranges = [_cast(MemoryRange[_address.PhysicalAddress], memory_range)]
elif translation_context is None:
raise ValueError("You must provide a context for the translation if the memory range is virtual")
else:
self._physical_memory_ranges = [mem_range for mem_range in memory_range.translate(translation_context)]
self._from_context = from_context
self._to_context = to_context
self._ring_policy = ring_policy
self._processes = None if processes is None else [process for process in processes]
self._included_binaries = None if included_binaries is None else {binary.name for binary in included_binaries}
self._excluded_binaries = set() if excluded_binaries is None else {binary.name for binary in excluded_binaries}
self._excluded_symbols = set() if excluded_symbols is None else {symbol.name for symbol in excluded_symbols}
self._operation = operation
def filter_by_processes(self, processes: _Iterable[Process]) -> "MemoryRangeSymbolFinder":
r"""
Extend the list of processes to limit the search in, and return the self object.
Information
===========
@param processes: processes to limit the search in.
@returns : self object
"""
if self._processes is None:
self._processes = []
self._processes += [process for process in processes]
return self
def filter_by_ring(self, ring_policy: RingPolicy) -> "MemoryRangeSymbolFinder":
r"""
Update the ring policy to search for and return the `self` object.
Information
===========
@param ring_policy: ring policy to search for.
@returns : self object
"""
self._ring_policy = ring_policy
return self
def from_context(self, context: Context) -> "MemoryRangeSymbolFinder":
r"""
Update the context where the search will be started and return the `self` object.
Information
===========
@param context: context where the search will be started.
@returns : self object
"""
self._from_context = context
return self
def to_context(self, context: Context) -> "MemoryRangeSymbolFinder":
r"""
Update the context where the search will be ended and return the `self` object.
Information
===========
@param context: context where the search will be ended.
@returns : self object
"""
self._to_context = context
return self
def include_bnaries(self, binaries: _Iterable[Binary]) -> "MemoryRangeSymbolFinder":
r"""
Extend the list of binaries that must be included in the search and return the `self` object.
Information
===========
@param binaries: binaries that must be included in the search.
@returns : self object
"""
if self._included_binaries is None:
self._included_binaries = {binary.name for binary in binaries}
else:
self._included_binaries.update([binary.name for binary in binaries])
return self
def exclude_bnaries(self, binaries: _Iterable[Binary]) -> "MemoryRangeSymbolFinder":
r"""
Extend the list of binaries that must be excluded from the search and return the `self` object.
Information
===========
@param binaries: binaries that must be excluded from the search.
@returns : self object
"""
self._excluded_binaries.update([binary.name for binary in binaries])
return self
def exclude_symbols(self, symbols: _Iterable[Symbol]) -> "MemoryRangeSymbolFinder":
r"""
Extend the list of symbols that must be excluded from the search and return the `self` object.
Information
===========
@param symbols: symbols that must be excluded from the search.
@returns : self object
"""
self._excluded_symbols.update([symbol.name for symbol in symbols])
return self
def filter_by_memory_access_operation(
self, operation: _Optional[MemoryAccessOperation] = None
) -> "MemoryRangeSymbolFinder":
r"""
Update the memory access operation to limit results to accesses performing this
operation and return the `self` object.
Information
===========
@param operation: limit results to accesses performing the specified operation.
@returns : self object
"""
self._operation = operation
return self
def _is_the_same_stack(self, stack1: Stack, stack2: Stack) -> bool:
# we assume that two stacks are the same if the first contexts of their first frames are the same
frame1 = next(stack1.frames())
frame2 = next(stack2.frames())
return frame1.first_context == frame2.first_context
def query(self) -> _Iterable[MemoryRangeSymbolResult]:
r"""
Iterate over all filtered contexts and yield symbols.
Note: the same symbol can be yielded several times with different memory accesses.
"""
# Make a copy of the variables that can modify the generated results
operation = self._operation
included_binaries = None if self._included_binaries is None else self._included_binaries.copy()
excluded_binaries = self._excluded_binaries.copy()
excluded_symbols = self._excluded_symbols.copy()
# store last handled stack to use it if we are in the same stack
last_stack: _Optional[Stack] = None
# store last result to use it if we are in the same stack
last_result: _Optional[MemoryRangeSymbolResult] = None
# Iterate over all context range filtered by ring, processes, from_context and to_context
for context_range in self._trace.filter(
processes=self._processes,
ring_policy=self._ring_policy,
from_context=self._from_context,
to_context=self._to_context,
):
from_transition = (
context_range.begin.transition_before()
if context_range.begin == self._trace.last_context
else context_range.begin.transition_after()
)
to_transition = (
context_range.last.transition_before()
if context_range.last == self._trace.last_context
else context_range.last.transition_after()
)
# iterate over physical memory range
iterators = [
self._trace.memory_accesses(
address=memory_range.address,
size=memory_range.size,
from_transition=from_transition,
to_transition=to_transition,
)
for memory_range in self._physical_memory_ranges
]
# iterate over all memory accesses in the this range
for memory_access in _collate(iterators, key=lambda x: x.transition.id):
# apply filter by operation here instead of in the query, because currently
# operation-constrained queries are not optimized in the backend
if operation is not None and operation != memory_access.operation:
continue
# get the stack at this transition
current_context: Context = memory_access.transition.context_before()
stack = current_context.stack
if last_result is not None and last_stack is not None and self._is_the_same_stack(last_stack, stack):
# update the memory access of the last result and yield it
last_result._memory_accesses = [memory_access]
yield last_result
continue
last_stack = stack
# exclude symbols and binary
handled_binary = None
handled_symbol = None
handled_symbol_found = False
frames = [frame for frame in stack.frames()]
frames.reverse()
for frame in frames:
loc = frame.first_context.ossi.location()
if loc is not None and (
loc.binary.name in excluded_binaries
or (loc.symbol is not None and loc.symbol.name in excluded_symbols)
or ("unknown" in excluded_symbols)
):
break
if loc is not None:
if loc.binary is not None:
handled_binary = loc.binary
if loc.symbol is not None:
handled_symbol = loc.symbol
handled_symbol_found = True
first_context = frame.first_context
handled_process = frame.first_context.ossi.process()
handled_thread = frame.first_context.ossi.thread()
# ignore symbol if it is in excluded symbols or if its binary in the excluded binaries
if not handled_symbol_found:
continue
# ignore symbol if its binary isn't in the included binaries
if (
included_binaries is not None
and handled_binary is not None
and handled_binary.name not in included_binaries
):
continue
# get the end of symbol
end_transition = (
first_context.transition_after().step_out()
if first_context != self._trace.last_context
else first_context.transition_before().step_out()
)
end_context = None if end_transition is None else end_transition.context_before()
# get the ring of the symbol
handled_ring = first_context.read(_arch.x64.cs) & 0x3
last_result = MemoryRangeSymbolResult(
call_symbol=CallSymbol(handled_symbol, first_context, end_context),
memory_access=memory_access,
ring=handled_ring,
process=handled_process,
thread=handled_thread,
binary=handled_binary,
)
yield last_result
def group_by_symbol_query(self) -> _Iterable[MemoryRangeSymbolResult]:
r"""
Iterate over all filtered contexts and yield symbols.
Note: each symbol will be yielded only once, with a group of all its memory accesses.
"""
# Add symbols to a stack and pop it when it is finished
result_stack = [] # type: List[MemoryRangeSymbolResult]
for result in self.query():
if len(result_stack) > 0:
# firstly, we verify if we can pop the last item from the stack
# Item will be yielded if its end context isn't None and the current result of
# the query has a memory access such that the before context of its transition
# >= the context of the last symbol in the stack
if (
result.call_symbol.end_context is not None
# len(result.memory_accesses) > 0 because the results of `query`
# contain exactly one memory_access by construction.
and result.memory_accesses[0].transition.context_before() >= result.call_symbol.end_context
):
res = result_stack.pop(-1)
yield res
# Here we observe symbols change,
# if the symbol is changed (result_stack[-1] != result) we add the new symbol to the stack.
# (len(result_stack) == 0 is only to handle the case of the first result)
if len(result_stack) == 0 or result_stack[-1] != result:
# store a deep copy of the result
result_stack.append(result.copy())
continue
# the symbol didn't change, so we add the memory access of the current result
# to the last item in the stack
result_stack[-1]._memory_accesses += result.memory_accesses
# yield all symbols with None end context
for result in result_stack:
yield result
# %% [markdown]
#
# ### OutputType
# %%
class OutputFormat(Enum):
r"""
Enum describing the various possible output formats of the results
- RAW: The results will be output using its string representation.
- TABLE: The results will be output using pandas table format.
- CSV: The results will be output as csv.
- HTML: The results will be output as html table.
"""
RAW = 0
TABLE = 1
CSV = 2
HTML = 3
# %% [markdown]
# ### Main function
#
# This function is called with parameters from the [Parameters](#Parameters) cell in the notebook context,
# or with parameters from the command line in the script context.
# %%
def symbols_access_memory_range(
server: RevenServer,
memory_range: MemoryRange,
context: _Optional[int],
from_context: _Optional[int] = None,
to_context: _Optional[int] = None,
ring_policy: RingPolicy = RingPolicy.All,
processes: _Optional[_Iterable[str]] = None,
included_binaries: _Optional[_Iterable[str]] = None,
excluded_binaries: _Optional[_Iterable[str]] = None,
excluded_symbols: _Optional[_Iterable[str]] = None,
operation: _Optional[MemoryAccessOperation] = None,
grouped_by_symbol: bool = False,
output_format: OutputFormat = OutputFormat.RAW,
output_file: _Optional[str] = None,
) -> None:
# declare symbol finder.
memory_range_symbols_finder = MemoryRangeSymbolFinder(
trace=server.trace,
memory_range=memory_range,
translation_context=(None if context is None else server.trace.context_before(context)),
from_context=(None if from_context is None else server.trace.context_before(from_context)),
to_context=(None if to_context is None else server.trace.context_before(to_context)),
ring_policy=ring_policy,
operation=operation,
)
# filer by processes
if processes is not None:
for process in processes:
memory_range_symbols_finder.filter_by_processes(server.ossi.executed_processes(process))
# include binaries
if included_binaries is not None:
for binary in included_binaries:
memory_range_symbols_finder.include_bnaries(server.ossi.executed_binaries(binary))
# exclude binaries
if excluded_binaries is not None:
for binary in excluded_binaries:
memory_range_symbols_finder.exclude_bnaries(server.ossi.executed_binaries(binary))
# exclude symbols
if excluded_symbols is not None:
for symbol in excluded_symbols:
memory_range_symbols_finder.exclude_symbols(server.ossi.symbols(symbol))
query = (
memory_range_symbols_finder.group_by_symbol_query()
if grouped_by_symbol
else memory_range_symbols_finder.query()
)
if output_format == OutputFormat.RAW:
print_func = display if in_notebook() else print
if output_file is not None:
file = open(output_file, "w")
def fprint_func(s: MemoryRangeSymbolResult) -> None:
file.write(str(s))
file.write("\n")
print_func = fprint_func
for result in query:
print_func(result)
if output_file is not None:
file.close()
else:
results = { # type: ignore
"Ring": [],
"Process": [],
"Thread": [],
"Binary": [],
"Symbol": [],
"Start context": [],
"Access transition": [],
"Access operation": [],
"Access physical": [],
"Access linear": [],
"Access size": [],
}
for result in query:
for mem_access in result.memory_accesses:
results["Ring"].append(result.ring)
results["Process"].append(str(result.process) if result.process is not None else "unknown")
results["Thread"].append(str(result.thread) if result.thread is not None else "unknown")
results["Binary"].append(result.binary.name if result.binary is not None else "unknown")
results["Symbol"].append(
result.call_symbol.symbol.name if result.call_symbol.symbol is not None else "unknown"
)
results["Start context"].append(str(result.call_symbol.start_context))
results["Access transition"].append(mem_access.transition.id)
results["Access operation"].append(mem_access.operation.name)
results["Access physical"].append(mem_access.physical_address)
results["Access linear"].append(mem_access.virtual_address)
results["Access size"].append(mem_access.size)
# type stub is installed for pandas module but it is a WIP.
# It doesn't know the `from_dict`` method of `DataFrame` class.
# so we ignore the type here.
df = pandas.DataFrame.from_dict(results) # type: ignore
if output_format == OutputFormat.TABLE:
if output_file is not None:
with open(output_file, "w") as file:
file.write(str(df))
else:
print(df)
elif output_format == OutputFormat.CSV:
print(df.to_csv()) if output_file is None else df.to_csv(output_file)
elif output_format == OutputFormat.HTML:
print(df.to_html()) if output_file is None else df.to_html(output_file)
# %% [markdown]
# ### Argument parsing
#
# Argument parsing function for use in the script context.
# %%
def get_memory_access_operation(operation: str) -> MemoryAccessOperation:
if operation is None:
return None
if operation.lower() == "read":
return MemoryAccessOperation.Read
if operation.lower() == "write":
return MemoryAccessOperation.Write
raise ValueError(f"'operation' value should be 'read' or 'write'. Received '{operation}'.")
def get_ring_policy(ring: int) -> RingPolicy:
if ring is None:
return RingPolicy.All
if ring == 0:
return RingPolicy.R0Only
if ring == 3:
return RingPolicy.R3Only
raise ValueError(f"'ring_policy' value should be '0' or '1'. Received '{ring_policy}'.")
def get_output_format(format: str) -> OutputFormat:
if format.lower() == "raw":
return OutputFormat.RAW
if format.lower() == "table":
return OutputFormat.TABLE
if format.lower() == "html":
return OutputFormat.HTML
if format.lower() == "csv":
return OutputFormat.CSV
raise ValueError(f"'output format' value should be 'raw', 'table', 'html', or 'csv'. Received '{format}'.")
def script_main():
parser = argparse.ArgumentParser(description="Find all symbols that access a memory range")
parser.add_argument(
"--host",
type=str,
default="localhost",
required=False,
help='Reven host, as a string (default: "localhost")',
)
parser.add_argument(
"-p",
"--port",
type=int,
default="13370",
required=False,
help="Reven port, as an int (default: 13370)",
)
parser.add_argument(
"-m",
"--memory-range",
type=str,
required=True,
help="The memory range whose accesses to look for in symbols (e.g. [ds:0xfff5000; 2])",
)
parser.add_argument(
"-C",
"--context",
type=int,
required=False,
help="The context used to translate the memory range if it is virtual",
)
parser.add_argument(
"--from-context",
type=int,
required=False,
help="The context from where the search starts",
)
parser.add_argument(
"--to-context",
type=int,
required=False,
help="The context(not included) at which the search stops",
)
parser.add_argument(
"--ring",
type=int,
required=False,
help="Show symbols in this ring only, can be (0=ring0, 3=ring3)",
)
parser.add_argument(
"--processes",
required=False,
nargs="*",
help="Show symbols in these processes only",
)
parser.add_argument(
"--include-binaries",
required=False,
nargs="*",
help="Show symbols in these binaries only",
)
parser.add_argument(
"--exclude-binaries",
required=False,
nargs="*",
help="Don't show symbols in these binaries, accesses that belong to these symbols will be reported with "
"the innermost symbol such that it or its binary don't excluded",
)
parser.add_argument(
"--exclude-symbols",
required=False,
nargs="*",
help="Don't show these symbols, accesses that belong to these symbols will be reported with "
"the innermost non excluded symbol",
)
parser.add_argument(
"--memory-access-operation",
choices=["read", "write"],
required=False,
help="Only show symbols that access the memory range using this operation",
)
parser.add_argument(
"--grouped-by-symbol",
action="store_true",
required=False,
default=False,
help="Group results by symbol",
)
parser.add_argument(
"-o",
"--output-file",
type=str,
required=False,
help="The target file of the results. If absent, the results will be printed on the standard output",
)
parser.add_argument(
"--output-format",
choices=["raw", "table", "csv", "html"],
required=False,
default="raw",
help="Output format of the results",
)
args = parser.parse_args()
try:
server = RevenServer(args.host, args.port)
except RuntimeError:
raise RuntimeError(f"Could not connect to the server on {args.host}:{args.port}.")
symbols_access_memory_range(
server=server,
memory_range=MemoryRange.from_string(args.memory_range),
context=args.context,
from_context=args.from_context,
to_context=args.to_context,
ring_policy=get_ring_policy(args.ring),
processes=args.processes,
included_binaries=args.include_binaries,
excluded_binaries=args.exclude_binaries,
excluded_symbols=args.exclude_symbols,
operation=get_memory_access_operation(args.memory_access_operation),
grouped_by_symbol=args.grouped_by_symbol,
output_format=get_output_format(args.output_format),
output_file=args.output_file,
)
# %% [markdown]
# ## Parameters
#
# These parameters have to be filled out to use in the notebook context.
# %%
# Server connection
#
host = "localhost"
port = 37103
# Input data
memory_range = MemoryRange(address=_address.LogicalAddress(offset=0xFFFFF8800115E180), size=1)
# Or use the MemoryRange.from_string method
# memory_range = MemoryRange.from_string("[ds:0xFFFFF8800115E180; 1]")
context = 100
# context = None # can be None only when the memory range is defined by a physical address
# Output filter
from_context = None
# from_context = 10
to_context = None
# to_context = 10
ring_policy = RingPolicy.All
# ring_policy = RingPolicy.R0Only
# ring_policy = RingPolicy.R3Only
processes = None # display result for all processes in the trace
# processes = ["xxx",]
included_binaries = None
# included_binaries = ["xxx",]
excluded_binaries = None
# excluded_binaries = ["xxx",]
excluded_symbols = None
# excluded_symbols = "xxx"
memory_access_operation = None
# memory_access_operation = MemoryAccessOperation.Write
# memory_access_operation = MemoryAccessOperation.Read
# Output target
#
output_file = None # display results inline
# output_file = "res.csv" # write results formatted as `csv` to a file named "res.csv" in the current directory
# Output control
#
# group results by symbol
grouped_by_symbol = False
# pandas output type
output_format: OutputFormat = OutputFormat.RAW
# %% [markdown]
# ### Pandas module
#
# This cell verify if pandas module is installed and install it if needed.
# %%
if in_notebook():
try:
import pandas # noqa
print("pandas already installed")
except ImportError:
print("Could not find pandas, attempting to install it from pip")
import sys
import subprocess
command = [f"{sys.executable}", "-m", "pip", "install", "pandas"]
p = subprocess.run(command)
if int(p.returncode) != 0:
raise RuntimeError("Error installing pandas")
import pandas # noqa
print("Successfully installed pandas")
else:
import pandas # noqa
# %% [markdown]
# ### Execution cell
#
# This cell executes according to the [parameters](#Parameters) when in notebook context, or according to the
# [parsed arguments](#Argument-parsing) when in script context.
#
# When in notebook context, if the `output` parameter is `None`, then the report will be displayed in the last cell of
# the notebook.
# %%
if __name__ == "__main__":
if in_notebook():
try:
server = RevenServer(host, port)
except RuntimeError:
raise RuntimeError(f"Could not connect to the server on {host}:{port}.")
symbols_access_memory_range(
server=server,
memory_range=memory_range,
context=context,
from_context=from_context,
to_context=to_context,
ring_policy=ring_policy,
processes=processes,
included_binaries=included_binaries,
excluded_binaries=excluded_binaries,
excluded_symbols=excluded_symbols,
operation=memory_access_operation,
grouped_by_symbol=grouped_by_symbol,
output_format=output_format,
output_file=output_file,
)
else:
script_main()
# %%
Find all memory accesses that are accessed a given symbol
Purpose
This notebook and script are designed to find all memory accesses that are accessed by a given symbol. This script searches a Reven trace for all memory accesses that are accessed by a given symbol. The script can filter the results by processes, threads, ring, context range and memory access operation. The script can generate two kinds of results:
- process, binary and symbol call information and all its memory accesses.
- for this symbol, all its call with all the memory accesses that occurred in that symbol call. Note that this option can take long time to start showing results, Note that:
- this script allow to include/exclude memory accesses that occurred in children symbol calls of each symbol call.
How to use
Results can be generated from this notebook or from the command line. The script can also be imported as a module for use from your own script or notebook.
From the notebook
- Upload the
memory_ranges_accessed_by_a_symbol.ipynb
file in Jupyter. - Fill out the parameters cell of this notebook according to your scenario and desired output.
- Run the full notebook.
From the command line
- Make sure that you are in an environment that can run Reven scripts.
- Run
python memory_ranges_accessed_by_a_symbol.py --help
to get a tour of available arguments. - Run
python memory_ranges_accessed_by_a_symbol.py --host <your_host> --port <your_port> [<other_option>]
with your arguments of choice.
Imported in your own script or notebook
- Make sure that you are in an environment that can run Reven scripts.
- Make sure that
memory_ranges_accessed_by_a_symbol.py
is in the same directory as your script or notebook. - Add
import memory_ranges_accessed_by_a_symbol
to your script or notebook. You can access the various functions and classes exposed by the module from thememory_ranges_accessed_by_a_symbol
namespace. - Refer to the Argument parsing cell for an example of use in a script, and to the
Parameters cell and below for an example of use in a notebook (you just need to preprend
memory_ranges_accessed_by_a_symbol
in front of the functions and classes from the script).
Known limitations
When using the "table", "csv", "html" output format, this script might require a large quantity of RAM due to the data being retained in memory. If you notice an important RAM usage, you can try the following:
- Restart with the "raw" format
- Split the results using the
from_context
andto_context
parameters - Use the provided filters (ring, processes, threads) to reduce the number of results
Supported versions
Reven 2.12+
Supported perimeter
Any Reven scenario.
Dependencies
The script requires that the target Reven scenario have:
- The OSSI feature replayed.
- The memory history feature replayed.
- pandas python module
Source
# ---
# jupyter:
# jupytext:
# formats: ipynb,py:percent
# text_representation:
# extension: .py
# format_name: percent
# kernelspec:
# display_name: reven
# language: python
# name: reven-python3
# ---
# %% [markdown]
# # Find all memory accesses that are accessed a given symbol
#
# ## Purpose
#
# This notebook and script are designed to find all memory accesses that are accessed by a given symbol.
#
# This script searches a Reven trace for all memory accesses that are accessed by a given symbol.
# The script can filter the results by processes, threads, ring, context range and memory access operation.
#
# The script can generate two kinds of results:
# - process, binary and symbol call information and all its memory accesses.
# - for this symbol, all its call with all the memory accesses that occurred in that symbol call.
# Note that this option can take long time to start showing results,
#
# Note that:
# - this script allow to include/exclude memory accesses that occurred in children symbol calls of each symbol call.
#
#
#
# ## How to use
#
# Results can be generated from this notebook or from the command line.
# The script can also be imported as a module for use from your own script or notebook.
#
#
# ### From the notebook
#
# 1. Upload the `memory_ranges_accessed_by_a_symbol.ipynb` file in Jupyter.
# 2. Fill out the [parameters](#Parameters) cell of this notebook according to your scenario and desired output.
# 3. Run the full notebook.
#
#
# ### From the command line
#
# 1. Make sure that you are in an environment that can run Reven scripts.
# 2. Run `python memory_ranges_accessed_by_a_symbol.py --help` to get a tour of available arguments.
# 3. Run `python memory_ranges_accessed_by_a_symbol.py --host <your_host> --port <your_port> [<other_option>]`
# with your arguments of choice.
#
# ### Imported in your own script or notebook
#
# 1. Make sure that you are in an environment that can run Reven scripts.
# 2. Make sure that `memory_ranges_accessed_by_a_symbol.py` is in the same directory as your script or notebook.
# 3. Add `import memory_ranges_accessed_by_a_symbol` to your script or notebook. You can access the various functions
# and classes exposed by the module from the `memory_ranges_accessed_by_a_symbol` namespace.
# 4. Refer to the [Argument parsing](#Argument-parsing) cell for an example of use in a script, and to the
# [Parameters](#Parameters) cell and below for an example of use in a notebook (you just need to preprend
# `memory_ranges_accessed_by_a_symbol` in front of the functions and classes from the script).
#
# ## Known limitations
#
# When using the "table", "csv", "html" output format, this script might require a large quantity of RAM due to the
# data being retained in memory. If you notice an important RAM usage, you can try the following:
#
# - Restart with the "raw" format
# - Split the results using the `from_context` and `to_context` parameters
# - Use the provided filters (ring, processes, threads) to reduce the number of results
#
# ## Supported versions
#
# Reven 2.12+
#
# ## Supported perimeter
#
# Any Reven scenario.
#
# ## Dependencies
#
# The script requires that the target Reven scenario have:
#
# * The OSSI feature replayed.
# * The memory history feature replayed.
# * pandas python module
# %% [markdown]
# ### Package imports
# %%
import argparse
import re
import sys
from dataclasses import dataclass
from enum import Enum
from typing import Callable as _Callable, Iterable as _Iterable, Iterator, List, Optional as _Optional
from IPython.core.display import display # type: ignore
import reven2.arch as _arch
from reven2.filter import RingPolicy
from reven2.memhist import MemoryAccess, MemoryAccessOperation
from reven2.ossi import Binary, Process, Symbol
from reven2.ossi.thread import Thread
from reven2.prelude import RevenServer
from reven2.stack import FrameEnd, FrameStart, StackEnter, StackFrame, StackLeave
from reven2.trace import Context, ContextRange, Trace
# %% [markdown]
# ### Utility functions
# %%
# Detect if we are currently running a Jupyter notebook.
#
# This is used e.g. to display rendered results inline in Jupyter when we are executing in the context of a Jupyter
# notebook, or to display raw results on the standard output when we are executing in the context of a script.
def in_notebook():
try:
from IPython import get_ipython # type: ignore
if get_ipython() is None or ("IPKernelApp" not in get_ipython().config):
return False
except ImportError:
return False
return True
# %% [markdown]
# ### Helper classes for results
# %%
class CallSymbol:
r"""
CallSymbol is a helper class used to represent a symbol with its start and end context
"""
def __init__(self, symbol: _Optional[Symbol], start: Context, end: _Optional[Context] = None) -> None:
self._symbol = symbol
self._start = start
self._end = end
@property
def symbol(self) -> _Optional[Symbol]:
r"""
B{Property:} The symbol of the call symbol. None if the symbol is unknown.
"""
return self._symbol
@property
def start_context(self) -> Context:
r"""
B{Property:} The start context of the call symbol.
"""
return self._start
@property
def end_context(self) -> _Optional[Context]:
r"""
B{Property:} The end excluded context of the call symbol. None if the end context isn't in the trace.
"""
return self._end
def __eq__(self, other: "CallSymbol") -> bool: # type: ignore
return self._symbol == other._symbol and self._start == other._start and self._end == other._end
def __ne__(self, other: "CallSymbol") -> bool: # type: ignore
return not (self == other)
class HtmlStr:
r"""
Helper class used with notebook special `display` function to
consider the HTML string as HTML.
"""
def __init__(self, html: str) -> None:
self._html = html
def _repr_html_(self):
return self._html
# %%
@dataclass
class ExecutionRange:
path: List[StackFrame]
range: ContextRange
def __str__(self) -> str:
return f"[{len(self.path) - 1}]{self.path[-1]} {self.range}"
def execution_ranges(frame: StackFrame, with_children: bool = True) -> Iterator[ExecutionRange]:
first_context: _Optional[Context] = frame.first_context
frames = [frame]
it = frame.descendant_events()
for event in it:
if isinstance(event, StackLeave):
if first_context is not None:
yield ExecutionRange(frames, ContextRange(first_context, event.transition.context_before()))
first_context = None
elif isinstance(event, StackEnter):
first_context = event.transition.context_after()
elif isinstance(event, FrameStart):
if first_context is not None:
yield ExecutionRange(frames, ContextRange(first_context, event.frame.first_context - 1))
if not with_children:
it.skip_children()
first_context = None
continue
first_context = event.frame.first_context
frames.append(event.frame)
elif isinstance(event, FrameEnd):
if first_context is not None:
yield ExecutionRange(frames, ContextRange(first_context, event.frame.last_context))
first_context = event.frame.last_context + 1
if with_children and len(frames) > 1:
frames.pop()
if first_context is not None:
yield ExecutionRange(frames, ContextRange(first_context, frame.last_context))
class MemoryRangeSymbolResult:
r"""
MemoryRangeSymbolResult is a helper class that represents one result of the search.
"""
def __init__(
self,
call_symbol: CallSymbol,
memory_accesses: _Iterable[MemoryAccess],
ring: int,
process: _Optional[Process],
thread: _Optional[Thread],
binary: _Optional[Binary],
) -> None:
self._call_symbol = call_symbol
self._memory_accesses = memory_accesses
self._ring = ring
self._process = process
self._thread = thread
self._binary = binary
@property
def call_symbol(self) -> CallSymbol:
r"""
B{Property:} The call symbol of the result.
"""
return self._call_symbol
@property
def memory_accesses(self) -> _Iterable[MemoryAccess]:
r"""
B{Property:} The memory accesses of the result.
Calling this property will consume the generator
"""
return self._memory_accesses
@property
def ring(self) -> int:
r"""
B{Property:} The ring of the result.
"""
return self._ring
@property
def process(self) -> _Optional[Process]:
r"""
B{Property:} The process of the result.
"""
return self._process
@property
def binary(self) -> _Optional[Binary]:
r"""
B{Property:} The binary of the result, None if unknown.
"""
return self._binary
@property
def thread(self) -> _Optional[Thread]:
r"""
B{Property:} The thread of the result.
"""
return self._thread
def __eq__(self, other: "MemoryRangeSymbolResult") -> bool: # type: ignore
return (
self._ring == other._ring
and self._process is not None
and other._process is not None
and self._process.name == other._process.name
and self._process.pid == other._process.pid
and self._process.ppid == other._process.ppid
and self._thread is not None
and other._thread is not None
and self._thread.id == other._thread.id
and self._thread.owner_process_id == other._thread.owner_process_id
and (
(self._binary is None and other._binary is None)
or (self._binary is not None and other._binary is not None and self._binary.path == other._binary.path)
)
and self._call_symbol == other._call_symbol
)
def __ne__(self, other: "MemoryRangeSymbolResult") -> bool: # type: ignore
return not (self == other)
def output(self, print_func: _Callable, is_in_notebook: bool):
r"""
Output this result using the `print_func` or call `output_in_notebook` if `is_in_notebook`
argument is true.
"""
if is_in_notebook:
self.output_in_notebook()
else:
print_func(
f"ring: {self._ring}, process: {self._process}, "
f"thread: {self._thread}, binary: {self._binary}, "
f"symbol: {self._call_symbol.symbol}[{self._call_symbol.start_context}, "
f"{self._call_symbol.end_context}["
)
print_func("\nmemory accesses:")
for m in self._memory_accesses:
print_func(f"\n\t{m}, ")
print_func("\n")
def output_in_notebook(self) -> None:
r"""
Output this result using the special notebook `display` function
"""
end_context = self._call_symbol.end_context
display(
HtmlStr(
f"<p>ring: {self._ring}, process: {self._process if self._process is not None else 'unknown'}, "
f"thread: {self._thread if self._thread is not None else 'unknown'}, "
f"binary: {self._binary if self._binary is not None else 'unknwon'}, "
f"symbol: {self._call_symbol.symbol}[{self._call_symbol.start_context.format_as_html()}, "
f"{None if end_context is None else end_context.format_as_html()}[</p>"
)
)
display(HtmlStr("<p>memory accesses:</p>"))
for m in self._memory_accesses:
display(HtmlStr(f'<p style="text-indent: 2em;"> ◉ {m.format_as_html()}</p>'))
class GroupedMemoryRangeSymbolResult:
r"""
GroupedMemoryRangeSymbolResult is a helper class that represents results of the search grouped by symbol.
"""
def __init__(
self,
symbol: Symbol,
memory_range_symbol_result: _Iterable[MemoryRangeSymbolResult],
) -> None:
self._symbol = symbol
self._memory_range_symbol_result = memory_range_symbol_result
def output(self, print_func: _Callable, is_in_notebook: bool) -> None:
r"""
Output these results using the `print_func` or call `output_in_notebook` if `is_in_notebook`
argument is true.
"""
if is_in_notebook:
self.output_in_notebook()
else:
print_func(f"{self._symbol}\n\nCalls")
for res in self._memory_range_symbol_result:
print_func(
f"\n\t[{res.call_symbol.start_context.format_as_html()}, "
f"{None if res.call_symbol.end_context is None else res.call_symbol.end_context.format_as_html()}]"
f"\n\tring: {res.ring}, process: {res.process if res.process is not None else 'unknown'}, "
f" thread: {res.thread if res.thread is not None else 'unknown'}, binary: {res.binary}, "
)
print_func("\n\tmemory accesses:")
for m in res.memory_accesses:
print_func(f"\n\t\t{m}, ")
print_func("\n")
def output_in_notebook(self) -> None:
r"""
Output these results using the special notebook `display` function
"""
display(HtmlStr(f'<p>{self._symbol}</p><p style="font-weight: bolder;">Calls</p>'))
for res in self._memory_range_symbol_result:
end_context = res.call_symbol.end_context
display(
HtmlStr(
f'<p style="text-indent: 2em;"> ◎ [{res.call_symbol.start_context.format_as_html()}, '
f"{None if end_context is None else end_context.format_as_html()}]</p>"
f'<p style="text-indent: 3em;">ring: {res.ring}, process: '
f"{res.process if res.process is not None else 'unknown'}, "
f" thread: {res.thread if res.thread is not None else 'unknown'}, binary: {res.binary}</p>"
)
)
display(HtmlStr('<p style="text-indent: 3em; font-weight: bolder;">memory accesses:</p>'))
for m in res.memory_accesses:
display(HtmlStr(f'<style="text-indent: 4em;"> ◉ {m.format_as_html()}</p>'))
# %% [markdown]
# ### SymbolMemoryAccessesFinder
#
# This class represents the main logic of this script
# %%
class SymbolMemoryAccessesFinder(object):
r"""
This class is a helper class to search for all memory accesses that are accessed a given symbol.
Results can be filtered by processes, ring, threads, memory access operation and a context range.
The memory accesses that are accessed by the given symbol are returned.
Examples
========
>>> # search all memory accesses that are accessed by the calls of `ExCompareExchangeCallBack`
>>> # symbol
>>> import reven2
>>> server = reven2.RevenServer('localhost', 46445)
>>> symbol = next(server.ossi.symbols(pattern="ExCompareExchangeCallBack"))
>>> finder = SymbolMemoryAccessesFinder(server.trace, symbol, with_children_symbols=False)
>>> for res in finder.query():
... print(res)
ring: 0, process: System (4), thread: 932, binary: c:/windows/system32/ntoskrnl.exe,
symbol: ntoskrnl!ExCompareExchangeCallBack[Context before #492474770, Context before #492474830[
memory accesses:
[#492474770 mov qword ptr ss:[rsp+0x8], rbx]Write access at
@phy:0x7297a8c0 (virtual address: lin:0xfffff880045ca8c0) of size 8,
[#492474771 mov qword ptr ss:[rsp+0x10], rbp]Write access at
@phy:0x7297a8c8 (virtual address: lin:0xfffff880045ca8c8) of size 8,
[#492474772 mov qword ptr ss:[rsp+0x18], rsi]Write access at
@phy:0x7297a8d0 (virtual address: lin:0xfffff880045ca8d0) of size 8,
...
"""
def __init__(
self,
trace: Trace,
symbol: Symbol,
with_children_symbols: bool = True,
from_context: _Optional[Context] = None,
to_context: _Optional[Context] = None,
ring_policy: RingPolicy = RingPolicy.All,
processes: _Optional[_Iterable[Process]] = None,
threads: _Optional[_Iterable[int]] = None,
memory_access_operation: _Optional[MemoryAccessOperation] = None,
) -> None:
r"""
Initialize a C{SymbolMemoryAccessesFinder}
Information
===========
@param trace: the trace where memory accesses will be looked for.
@param symbol: the symbol that for it the memory accesses will be returned.
@param from_context: the context where the search will be started.
@param to_context: the context where the search will be ended.
@param ring_policy: ring policy to search for.
@param processes: processes to limit the search in it. If None, all processes will be filtered.
@param threads: thread ids to limit the search in it. If None, all threads will be filtered.
@param operation: limit results to accesses performing the specified operation.
@raises TypeError: if trace is not a C{reven2.trace.Trace}.
"""
if not isinstance(trace, Trace):
raise TypeError("You must provide a valid trace")
self._trace = trace
self._symbol = symbol
self._with_children_symbols = with_children_symbols
self._from_context = from_context
self._to_context = to_context
self._ring_policy = ring_policy
self._memory_access_operation = memory_access_operation
self._processes = None if processes is None else [process for process in processes]
self._threads = None if threads is None else [thread for thread in threads]
def _query_mem_accesses(
self,
ranges: Iterator[ExecutionRange],
from_context: _Optional[Context],
to_context: _Optional[Context],
memory_access_operation: _Optional[MemoryAccessOperation],
) -> _Iterable[MemoryAccess]:
for range in ranges:
first_context = (
range.range.begin if from_context is None or from_context < range.range.begin else from_context
)
last_context = (
range.range.end
if to_context is None or (range.range.end is not None and to_context > range.range.end)
else to_context
)
first_transition = (
first_context.transition_after()
if first_context != self._trace.last_context
else self._trace.last_transition
)
last_transition = (
None
if last_context is None
else last_context.transition_after()
if last_context != self._trace.last_context
else self._trace.last_transition
)
if last_transition is not None and first_transition > last_transition:
return
for mem_access in self._trace.memory_accesses(
from_transition=first_transition, to_transition=last_transition
):
if memory_access_operation is not None and mem_access.operation != memory_access_operation:
continue
# ignore accesses without linear address
if mem_access.virtual_address is None:
continue
yield mem_access
def filter_by_threads(self, threads: _Iterable[int]) -> "SymbolMemoryAccessesFinder":
r"""
Extend the list of threads to limit the search in, and return the self object.
Information
===========
@param threads: threads to limit the search in.
@returns : self object
"""
if self._threads is None:
self._threads = []
self._threads += [thread for thread in threads]
return self
def filter_by_processes(self, processes: _Iterable[Process]) -> "SymbolMemoryAccessesFinder":
r"""
Extend the list of processes to limit the search in, and return the self object.
Information
===========
@param processes: processes to limit the search in.
@returns : self object
"""
if self._processes is None:
self._processes = []
self._processes += [process for process in processes]
return self
def filter_by_ring(self, ring_policy: RingPolicy) -> "SymbolMemoryAccessesFinder":
r"""
Update the ring policy to search for and return the `self` object.
Information
===========
@param ring_policy: ring policy to search for.
@returns : self object
"""
self._ring_policy = ring_policy
return self
def from_context(self, context: Context) -> "SymbolMemoryAccessesFinder":
r"""
Update the context where the search will be started and return the `self` object.
Information
===========
@param context: context where the search will be started.
@returns : self object
"""
self._from_context = context
return self
def to_context(self, context: Context) -> "SymbolMemoryAccessesFinder":
r"""
Update the context where the search will be ended and return the `self` object.
Information
===========
@param context: context where the search will be ended.
@returns : self object
"""
self._to_context = context
return self
def filter_by_memory_access_operation(
self, operation: _Optional[MemoryAccessOperation] = None
) -> "SymbolMemoryAccessesFinder":
r"""
Update the memory access operation to limit results to accesses performing this
operation and return the `self` object.
Information
===========
@param operation: limit results to accesses performing the specified operation.
@returns : self object
"""
self._operation = operation
return self
def query(self) -> _Iterable[MemoryRangeSymbolResult]:
with_children_symbols = self._with_children_symbols
memory_access_operation = self._memory_access_operation
thread_ids = None if self._threads is None else self._threads.copy()
# copy so we can reuse them without them mutating under our feet after a yield point
from_context = self._from_context
to_context = self._to_context
# filter by process
for context_range in self._trace.filter(
processes=self._processes,
ring_policy=self._ring_policy,
from_context=from_context,
to_context=to_context,
):
last_context: _Optional[Context] = None
# search symbol call
for context in self._trace.search.symbol(self._symbol, context_range.begin, context_range.end):
# ignore results that aren't in the list of thread
if thread_ids is not None:
thread = context.ossi.thread()
if thread is None or thread.id not in thread_ids:
continue
# we need also to ignore symbols that are recursively called.
if last_context is not None and last_context > context:
continue
# if with_children_symbols is true, that means we need to consider
# the memory accesses in children symbols.
# use step_out to go out of the symbol
frame = context.stack.frame()
last_context = frame.last_context
# get the ring of the symbol
curr_ring = context.read(_arch.x64.cs) & 0x3
curr_process = context.ossi.process()
curr_thread = context.ossi.thread()
curr_location = context.ossi.location()
curr_binary = None if curr_location is None else curr_location.binary
yield MemoryRangeSymbolResult(
call_symbol=CallSymbol(self._symbol, context, last_context),
memory_accesses=self._query_mem_accesses(
execution_ranges(context.stack.frame(), with_children_symbols),
from_context,
to_context,
memory_access_operation,
),
ring=curr_ring,
process=curr_process,
thread=curr_thread,
binary=curr_binary,
)
# %% [markdown]
#
# ### OutputType
# %%
class OutputFormat(Enum):
r"""
Enum describing the various possible output formats of the results
- RAW: The results will be output using its string representation.
- TABLE: The results will be output using pandas table format.
- CSV: The results will be output as csv.
- HTML: The results will be output as html table.
"""
RAW = 0
TABLE = 1
CSV = 2
HTML = 3
# %% [markdown]
# ### Main function
#
# This function is called with parameters from the [Parameters](#Parameters) cell in the notebook context,
# or with parameters from the command line in the script context.
# %%
def memory_ranges_accessed_by_a_symbol(
server: RevenServer,
symbol: str,
binary_hint: _Optional[str] = None,
with_children_symbols: bool = True,
from_context: _Optional[int] = None,
to_context: _Optional[int] = None,
ring_policy: RingPolicy = RingPolicy.All,
processes: _Optional[_Iterable[str]] = None,
threads: _Optional[_Iterable[int]] = None,
operation: _Optional[MemoryAccessOperation] = None,
grouped_by_symbol: bool = False,
output_format: OutputFormat = OutputFormat.RAW,
output_file: _Optional[str] = None,
) -> None:
# get the symbol form the ossi server and raise if it isn't exist
trace_symbol = None
symbol_count = 0
for sym in server.ossi.symbols(pattern=re.escape(symbol), binary_hint=binary_hint):
if sym.name == symbol:
symbol_count += 1
if trace_symbol is None:
trace_symbol = sym
if symbol_count == 2:
print(trace_symbol, file=sys.stderr)
if symbol_count >= 2:
print(sym, file=sys.stderr)
if trace_symbol is None:
raise ValueError(f"The requested symbol '{symbol}' could not be found")
if symbol_count > 1:
sys.exit(
"Many symbols exist with the same provided symbol name, you may need to provide the symbol's "
"binary name, please provide one from the list above"
)
# declare memory accesses finder.
symbol_memory_ranges_finder = SymbolMemoryAccessesFinder(
trace=server.trace,
symbol=trace_symbol,
with_children_symbols=with_children_symbols,
from_context=(None if from_context is None else server.trace.context_before(from_context)),
to_context=(None if to_context is None else server.trace.context_before(to_context)),
threads=threads,
ring_policy=ring_policy,
memory_access_operation=operation,
)
# filer by processes
if processes is not None:
for process in processes:
symbol_memory_ranges_finder.filter_by_processes(server.ossi.executed_processes(process))
if output_format == OutputFormat.RAW:
is_in_notebook = output_file is None and in_notebook()
def std_print_func(s: str) -> None:
print(s)
print_func = std_print_func
if output_file is not None:
file = open(output_file, "w")
def fprint_func(s: str) -> None:
file.write(s)
print_func = fprint_func
if grouped_by_symbol:
grouped_result = GroupedMemoryRangeSymbolResult(trace_symbol, symbol_memory_ranges_finder.query())
grouped_result.output(print_func, is_in_notebook)
else:
for result in symbol_memory_ranges_finder.query():
result.output(print_func, is_in_notebook)
if output_file is not None:
file.close()
else:
column_headers = [
"Ring",
"Process",
"Thread",
"Binary",
"Symbol",
"Start context",
"Access transition",
"Access operation",
"Access physical",
"Access linear",
"Access size",
]
def data_generator():
for result in symbol_memory_ranges_finder.query():
for mem_access in result.memory_accesses:
yield (
result.ring,
str(result.process) if result.process is not None else "unknown",
str(result.thread) if result.thread is not None else "unknown",
result.binary.name,
result.call_symbol.symbol.name,
str(result.call_symbol.start_context),
mem_access.transition.id,
mem_access.operation.name,
mem_access.physical_address,
mem_access.virtual_address,
mem_access.size,
)
df = pandas.DataFrame(data=data_generator(), columns=column_headers)
if output_format == OutputFormat.TABLE:
if output_file is not None:
with open(output_file, "w") as file:
file.write(str(df))
else:
print(df)
elif output_format == OutputFormat.CSV:
print(df.to_csv()) if output_file is None else df.to_csv(output_file)
elif output_format == OutputFormat.HTML:
print(df.to_html()) if output_file is None else df.to_html(output_file)
# %% [markdown]
# ### Argument parsing
#
# Argument parsing function for use in the script context.
# %%
def get_memory_access_operation(operation: str) -> MemoryAccessOperation:
if operation is None:
return None
if operation.lower() == "read":
return MemoryAccessOperation.Read
if operation.lower() == "write":
return MemoryAccessOperation.Write
raise ValueError(f"'operation' value should be 'read' or 'write'. Received '{operation}'.")
def get_ring_policy(ring: int) -> RingPolicy:
if ring is None:
return RingPolicy.All
if ring == 0:
return RingPolicy.R0Only
if ring == 3:
return RingPolicy.R3Only
raise ValueError(f"'ring_policy' value should be '0' or '1'. Received '{ring_policy}'.")
def get_output_format(format: str) -> OutputFormat:
if format.lower() == "raw":
return OutputFormat.RAW
if format.lower() == "table":
return OutputFormat.TABLE
if format.lower() == "html":
return OutputFormat.HTML
if format.lower() == "csv":
return OutputFormat.CSV
raise ValueError(f"'output format' value should be 'raw', 'table', 'html', or 'csv'. Received '{format}'.")
def script_main():
parser = argparse.ArgumentParser(description="Find all memory accesses that are accessed a given symbol")
parser.add_argument(
"--host",
type=str,
default="localhost",
required=False,
help='Reven host, as a string (default: "localhost")',
)
parser.add_argument(
"-p",
"--port",
type=int,
default="13370",
required=False,
help="Reven port, as an int (default: 13370)",
)
parser.add_argument(
"-s",
"--symbol",
type=str,
required=True,
help="The symbol whose accesses are looked for (e.g. WriteFile)",
)
parser.add_argument(
"-b",
"--binary-hint",
type=str,
required=False,
help="The symbol's binary name hint (e.g. ntoskrnl)",
)
parser.add_argument(
"--with-children-symbols",
action="store_true",
required=False,
default=False,
help="Show accesses from children calls",
)
parser.add_argument(
"--from-context",
type=int,
required=False,
help="The context from where the search starts",
)
parser.add_argument(
"--to-context",
type=int,
required=False,
help="The context(not included) at which the search stops",
)
parser.add_argument(
"--ring",
type=int,
required=False,
help="Show symbol's accesses if it is in this ring only, can be (0=ring0, 3=ring3)",
)
parser.add_argument(
"--processes",
required=False,
nargs="*",
help="Show symbol's accesses if it is in these processes only",
)
parser.add_argument(
"--threads",
type=int,
required=False,
nargs="*",
help="Show symbol's accesses if it is in these threads only",
)
parser.add_argument(
"--memory-access-operation",
choices=["read", "write"],
required=False,
help="Only show symbols that access the memory range using this operation",
)
parser.add_argument(
"--grouped-by-symbol",
action="store_true",
required=False,
default=False,
help="Group results by symbol",
)
parser.add_argument(
"-o",
"--output-file",
type=str,
required=False,
help="The target file of the results. If absent, the results will be printed on the standard output",
)
parser.add_argument(
"--output-format",
choices=["raw", "table", "csv", "html"],
required=False,
default="raw",
help="Output format of the results",
)
args = parser.parse_args()
try:
server = RevenServer(args.host, args.port)
except RuntimeError:
raise RuntimeError(f"Could not connect to the server on {args.host}:{args.port}.")
memory_ranges_accessed_by_a_symbol(
server=server,
symbol=args.symbol,
binary_hint=args.binary_hint,
with_children_symbols=args.with_children_symbols,
from_context=args.from_context,
to_context=args.to_context,
ring_policy=get_ring_policy(args.ring),
processes=args.processes,
threads=args.threads,
operation=get_memory_access_operation(args.memory_access_operation),
grouped_by_symbol=args.grouped_by_symbol,
output_format=get_output_format(args.output_format),
output_file=args.output_file,
)
# %% [markdown]
# ## Parameters
#
# These parameters have to be filled out to use in the notebook context.
# %%
# Server connection
#
host = "localhost"
port = 37103
# Input data
symbol = "xxx" # symbol name
binary_hint = None # symbol's binary name hint
with_children_symbols = True
# Output filter
from_context = None
# from_context = 10
to_context = None
# to_context = 10
ring_policy = RingPolicy.All
# ring_policy = RingPolicy.R0Only
# ring_policy = RingPolicy.R3Only
processes = None # display result for all processes in the trace
# processes = ["xxx",]
threads = None # display result for all threads in the trace
# threads = [thread_id,]
memory_access_operation = None
# memory_access_operation = MemoryAccessOperation.Write
# memory_access_operation = MemoryAccessOperation.Read
# Output target
#
output_file = None # display results inline
# output_file = "res.csv" # write results formatted as `csv` to a file named "res.csv" in the current directory
# Output control
#
# group results by symbol
grouped_by_symbol = False
# pandas output type
output_format: OutputFormat = OutputFormat.RAW
# %% [markdown]
# ### Pandas module
#
# This cell verify if pandas module is installed and install it if needed.
# %%
if in_notebook():
try:
import pandas # noqa
print("pandas already installed")
except ImportError:
print("Could not find pandas, attempting to install it from pip")
import subprocess
command = [f"{sys.executable}", "-m", "pip", "install", "pandas"]
p = subprocess.run(command)
if int(p.returncode) != 0:
raise RuntimeError("Error installing pandas")
import pandas # noqa
print("Successfully installed pandas")
else:
import pandas # noqa
# %% [markdown]
# ### Execution cell
#
# This cell executes according to the [parameters](#Parameters) when in notebook context, or according to the
# [parsed arguments](#Argument-parsing) when in script context.
#
# When in notebook context, if the `output` parameter is `None`, then the report will be displayed in the last cell of
# the notebook.
# %%
if __name__ == "__main__":
if in_notebook():
try:
server = RevenServer(host, port)
except RuntimeError:
raise RuntimeError(f"Could not connect to the server on {host}:{port}.")
memory_ranges_accessed_by_a_symbol(
server=server,
symbol=symbol,
binary_hint=binary_hint,
with_children_symbols=with_children_symbols,
from_context=from_context,
to_context=to_context,
ring_policy=ring_policy,
processes=processes,
threads=threads,
grouped_by_symbol=grouped_by_symbol,
output_format=output_format,
output_file=output_file,
)
else:
script_main()
# %%