miasm

From cea-sec github

Miasm is a free and open source (GPLv2) reverse engineering framework. Miasm aims to analyze / modify / generate binary programs.

It has a bunch of cool features ranging like static binary analysis, IR representation, unpacking, de-obfuscation etc

I usually like to try out a new tool and have a look at it features if they are later needed in a project or CTFs.

A couple of points to note if you’re new to Symbolic Execution

Symbolize - mark memory/register as “symbol” - like a mathematical variable. Symbolized memory doesn’t have a value. The solvers use this variable to calculate constraints and solve to get values.
Concretize - set “actual” value to memory like 0x41 for a byte, -22 for an int, etc.

The general idea of a symbolic execution engine is to let the user select parts of a program that needs to be evaluated, then symbolize parts of the memory and let the engine evaluate expressions for memory to a final state. On this final state, query the constraints solver to solve for selected memory to set condition/s true/false. In this challenge there were some equations that need to be verified for bytes of the flag. The symbolic execution engine should automatically calculate these equations and solve them to get the flag

Problem

A friend participated in a CTF and got a reverse task which was a trivial solve for angr. While solving that task I thought why not give miasm a try.

There was only 1 function of interest - main which looked like

main cfg

Each smaller basic block connected with a red line is a classic “Bad Boy”, final one at the bottom right is the “Good Boy”. flag is read as argv[1] and its length is compared to 9 in the first basic block. Each basic block then had a small equation for consecutive pair of bytes. Solving with angr was trivial.

p = angr.Project('/tmp/ctf-bin')

base = 0x400000
flag_addr = 0x1000
argv_addr = 0x1020
flag_addr_ptr = 0x1028

state = p.factory.blank_state(addr=base+0x11e4)
state.regs.rbp = 0x1200
state.mem[state.regs.rbp - 0x40].qword = argv_addr
state.mem[flag_addr_ptr].qword = flag_addr

for i in range(9):
    state.mem[flag_addr + i].byte = state.solver.BVS('C_%d' %i, 8)

ex = p.factory.simulation_manager(state)
ex.explore(find=base+0x1454)
if ex.found:
    s=""
    for i in range(9):
        s+=chr(ex.found[0].solver.eval(ex.found[0].memory.load(flag_addr + i, 1)))
    print(s)

yielded the flag

Since I already have knowledge on how the challenge works and have a solution I can give miasm a try. The documentation is quite lacking and there are very little number of blogs out including their own. The source is somewhat documented and can be worked with.

My script was heavily based on this

If you want to learn about a tool you should first start by solving small challenges with it.

Sandbox

To execute a binary within miasm’s env a sandbox is needed. It has a jitter that jits the elf that can be then “executed”.

from miasm.analysis.sandbox import Sandbox_Linux_x86_64
parser = Sandbox_Linux_x86_64.parser()
parser.add_argument("--f", default="/tmp/ctf-bin", required=False) # set a default elf path
options = parser.parse_args()

solution = "........."
options.mimic_env = True
options.command_line = [solution] # set argv[1] for the elf
sb = Sandbox_Linux_x86_64(options.f, options, globals())

sb.jitter.init_run(sb.entry_point)

Here we setup the sandbox sb and let it setup the entry_point. I have also setup a dummy solution as it goes into argv[1]

Dynamic Symbolic Execution(DSE) with miasm

Based on the emulation we can set up constraints on the memory by hooking functions or setting breakpoints in the code. If nothing is done, concrete values can be used by miasm

We need a DSEPathConstraint hooked up with our sandbox sb. According to its documentation

DSEPathConstraint is Dynamic Symbolic Execution Engine keeping the path constraint. Possible new “solutions” are produced along the path, by inversing concrete path constraint. Thus, a “solution” is a potential initial context leading to a new path.

While evaluating a path we can set constraints that can be solved by the backend z3 solver to produce new paths. Its similar to angr’s simulation manager but we need to symbolize and concretize the memory. It has 3 evaluating techniques - PRODUCE_SOLUTION_CODE_COV, PRODUCE_SOLUTION_BRANCH_COV and PRODUCE_SOLUTION_PATH_COV For this simple and small CFG all 3 performed same.

So we setup the necessary DSE variables and attach the jitter with the dse instance. A Machine is a generic wrapper over all architectures supported by miasm - x86/64, arm, ppc etc.

from miasm.analysis.dse import DSEPathConstraint
from miasm.analysis.machine import Machine

machine = Machine("x86_64")
dse = DSEPathConstraint(
    machine, produce_solution=DSEPathConstraint.PRODUCE_SOLUTION_BRANCH_COV)

dse.attach(sb.jitter)
dse.update_state_from_concrete()
sb.run()

update_state_from_concrete updates the values from the CPU, so the symbolic execution will be completely concrete from that point.

If the script fails upto running here change dse with an instance of DSEEngine

dse = DSEEngine(
    machine)

Hooking Functions

if we run the script we have now we might have an error like

RuntimeError: Symbolic stub 'b'xxx___libc_start_main_symb'' not found

This means that __libc_start_main which is a part of the libc will have to be implemented in the miasm env. This can be copied from a similar blog All the functions which are dynamically linked have to be implemented in the script as xxx_<function_name>_symb for symbolic execution.

def xxx___libc_start_main_symb(dse):
    # ['RDI', 'RSI', 'RDX', 'RCX', 'R8', 'R9']
    regs = dse.ir_arch.arch.regs
    top_stack = dse.eval_expr(regs.RSP) # read registers to replicate while returning
    main_addr = dse.eval_expr(regs.RDI)
    argc = dse.eval_expr(regs.RSI)
    argv = dse.eval_expr(regs.RDX)
    hlt_addr = ExprInt(sb.CALL_FINISH_ADDR, 64)

    dse.update_state({
        ExprMem(top_stack, 64): hlt_addr,
        regs.RDI: argc,
        regs.RSI: argv,
        dse.ir_arch.IRDst: main_addr,
        dse.ir_arch.pc: main_addr,
    })

This updates the dse as if __libc_start_main has been executed. This needs to be implemented before

dse.add_lib_handler(sb.libs, globals())

We also know that the paths have a “Bad Boy” path which has termination with a call to puts followed by exit. We will hook them too and setup such that we can find what paths to not finish on.

class FinishOn(Exception):

    def __init__(self, string):
        self.string = string
        super(FinishOn, self).__init__()

def xxx_exit_symb(dse):
    raise FinishOn("Fail")

from miasm.os_dep.win_api_x86_32 import get_win_str_a

def xxx_puts_symb(dse):
    string = get_win_str_a(dse.jitter, dse.jitter.cpu.RDI)
    raise FinishOn(string)

Next error that we hit is

RuntimeError: Symbolic stub 'b'xxx_strlen_symb'' not found

strlen is called to verify if the length of the string is 9. This will be an apt location to set the string as symbolic and setup constraints that can be later solved.

from miasm.expression.expression import *

str_ptr = 0
curr = b""
expr_map = dict()

strln = ExprId("strln", 64) # symbolize string length
z3_strln = dse.z3_trans.from_expr(strln)
dse.cur_solver.add(0 < z3_strln)
dse.cur_solver.add(z3_strln < 10)

def xxx_strlen_symb(dse):
    global str_ptr
    regs = dse.ir_arch.arch.regs
    ptr = dse.eval_expr(regs.RDI) # read string pointer
    str_ptr = int(ptr) # save string pointer to inspect later
    ret_addr = ExprInt(dse.jitter.get_stack_arg(0), regs.RIP.size)
    ret_value = strln
    update = {}

    # symbolize the string buffer as bytes
    for i, content in enumerate(curr):
        addr = dse.symb.expr_simp(ptr + ExprInt(i, ptr.size))
        expr_map[i] = ExprId("C_%d" % (i), 8) # save expressions so that we can query the solver for these

        dse.cur_solver.add(0x2f < dse.z3_trans.from_expr(expr_map[i])) # should be printable ascii alphanumeric
        update[ExprMem(addr, 8)] = expr_map[i]
    # update the state and emulate strlen call
    dse.update_state({
        regs.RSP: dse.eval_expr(regs.RSP + ExprInt(8, regs.RSP.size)),
        dse.ir_arch.IRDst: ret_addr,
        regs.RIP: ret_addr,
        regs.RAX: ret_value,
    })
    dse.update_state(update)

Here we do a couple of things

Symbolize the string length as the return values in rax when the function returns
Set limits on the length as we already know
Symbolize the bytes of the string so that constraints can be calculated by the z3 solver in DSEPathConstraint
Add constraints on bytes to be alpha numeric(optional)

This serves as the init logic for our solver. update_state takes a map of expression to expression instead of integer values.

Path exploration

With all the needed functions hooked, we can proceed to evaluate the paths. For quicker emulation we can take a snapshot of the dse and restore it when needed from init. We then solve constraints from the last path, concretize the values from the solution and add it to a queue which can be then precessed.

dse.add_lib_handler(sb.libs, globals())
snapshot = dse.take_snapshot()
found = False
curr = b""
todo = set([b""])

while todo:
    flag = todo.pop() # take 1 input to process
    curr = flag # set curr so that xxx_strlen_symb can iterate over it
    dse.restore_snapshot(snapshot, keep_known_solutions=True) # restore to "init" snapshot
    # Concretize the known bytes from curr
    if str_ptr:
        # read the current bytes
        print("Already {}".format(sb.jitter.vm.get_mem(int(str_ptr), 9)))
        for idx, content in enumerate(curr):
            currb = sb.jitter.vm.get_mem(int(str_ptr) + idx, 1)
            # . was the original value set in argv[1]
            if currb == b'.' and content > 0x30:
                # update if a better solution was found
                sb.jitter.vm.set_mem(int(str_ptr) + idx, bytes([content]))
        # read the updated bytes of the string
        print("Done {}".format(sb.jitter.vm.get_mem(int(str_ptr), 9)))
    try:
        sb.run()
    # puts and exit trigger a FinishOn which also has good boy/bad boy string
    except FinishOn as finish_info:
        print(finish_info.string)
        if "Level Unlocked" in finish_info.string:
            found = True
            break
    # iterate over all solutions and query the expr_map expressions for new candidate string bytes
    for sol_ident, model in viewitems(dse.new_solutions):
        candidate = []
        ln = max(model.eval(dse.z3_trans.from_expr(strln)).as_long(), 9)
        for i in range(ln):
            try:
                candidate.append(int_to_byte(model.eval(
                    dse.z3_trans.from_expr(expr_map[i])).as_long()))
            except (KeyError, AttributeError) as _:
                candidate.append(b"\x00")
        # add the new candidate to queue
        todo.add(b"".join(candidate))

Once a solution is found for the 2 paths from a basic block, constraints on expressions from expr_map - bytes of “flag” string can be solved and concretized. Since the checks are progressive, this will eventually solve all the constraints upto the good boy block - Level Unlocked

Full script can be found here