Content-type: text/html Man page of OPDIS

OPDIS

Section: Development Tools (1)
Updated: March 2010
Index Return to Main Contents

 

NAME

opdis - disassemble binary data

 

SYNOPSIS

opdis [-c|--cflow=memspec]
      [-l|--linear[=memspec]]
      [-a|--architecture=name]
      [-s|--syntax=att|intel]
      [-f|--format=fmtspec]
      [-o|--output=filename]
      [-d|--debug]
      [-q|--quiet]
      [-B|--bfd[=target]
      [-E|--bfd-entry]
      [-N|--bfd-symbol[=bfdname]
      [-S|--bfd-section[=bfdname]
      [-m|--map=mapspec]
      [-b|--bytes=string]
      [-O|--disassembler-options[=string]
      [--list-architectures]
      [--list-disassembler-options]
      [--list-syntaxes]
      [--list-formats]
      [--list-bfd-symbols]
      [--dry-run]
      objfile...

 

DESCRIPTION

opdis disassembles binary object code according the options specified by the user. The default output is a standard disassembled listing (address + hex bytes + 1asm), and support for XML, pipe-delimited, and custom output formats is provided.

opdis is a front-end to libopdis, which in turn is a wrapper for libopcodes, part of the GNU binutils distribution. It differs from objdump in the following ways:

* it can disassemble files that libbfd does not support
* the user can specify the addresses to disassemble
* control-flow disassembly can be performed
* the bytes to disassemble can be specified on the command line

 

OPTIONS

-c
--cflow=memspec
Add a control flow disassembly job with entry point memspec. See ADDRESS FORMAT and DISASSEMBLY.

-l
--linear=[memspec]
linear disassembly job Add a linear disassembly job for bytes at memspec. If memspec is not supplied, opdis will disassemble all bytes in the first target starting at offset 0. See ADDRESS FORMAT and DISASSEMBLY.

-a
--architecture=name
Set the BFD architecture for the target. The default architecture is "i386". See --list-architectures.

-s
--syntax=att|intel
Set the assembler syntax to use for disassembly. This allows the user to select from the print_insn functions provided by libopcodes for the x86 architecture. This option is ignored for non-x86 targets. TheR default value is att. See --list-syntaxes.

-f
--format=format_string
Set the output format. The supported formats are:
asm : Print the raw output of libopcodes with the VMA of each instruction included as a trailing comment.
delim : Print all components of each instruction and operand in a pipe-delimited format. The format for each instruction is

offset|vma|bytes|ascii|prefixes|mnemonic|isa|category|flags|op1|op2|...

Note that the number of operand fields is variable, and could be zero. This means that the pipe-delimited format is irregular: one line may have multiple operand fields, while another line will have none.
The format for each operand is
ascii:category:flags:value[:name]
The name field only appears if the operand is named (e.g. TARGET,DEST,SRC). The value field will have one of the following formats, depending on the category field:
{ascii;id;size;flags} for register operands
{segment_reg;offset} for absolute operands
{base_reg;index_reg;scale;shift_op;segment_reg;displacement} for expression operands
All other operand categories display the immediate value.
dump : Print the VMA of each instruction followed by the instruction bytes, prefixes, mnemonic, operand ascii values, and instruction comments.
xml : Print the complete instruction and operand data structures in XML format, with an embedded DTD.
fmt_str : An sprintf-style format string for custom output formats. See FORMAT STRINGS.

The default value is dump. See --list-formats.

-o
--output=filename
Set the output file to print the disassembly to. The default is STDOUT.

-d
--debug
Print libopdis debug messages to STDERR.

-q
--quiet
Suppress runtime messages.

-B
--bfd=target
Use libbfd to load and manage the target. All -l and -c jobs will use libopdis BFD routines, and memory maps for the target will be ignored. This option is not necessary if the -E, -N, or -S option is present. See BFD SUPPORT.

-E
--bfd-entry
Add a control flow disassembly job using the BFD entry point of the file as its entry point. This will load the file using the BFD library, and will attempt to detect the target architecture. See BFD SUPPORT.

-N
--bfd-symbol=bfdname
Add a control flow disassembly job using the BFD symbol bfdname as its entry point. This will load the file using the BFD library, and will attempt to detect the target architecture. See BFD SUPPORT.

-S
--bfd-section=bfdname
Add a linear disassembly job for the contents of the BFD section named bfdname. This will load the file using the BFD library, and will attempt to detect the target architecture. See BFD SUPPORT.

-m
--map=memspec
Map bytes at an offset into a target to a VMA. See ADDRESS FORMAT and MEMORY MAPS.

-b
--bytes=string
Specify bytes to disassemble. The bytes must be in a space-delimited string, and can be in octal, decimal, or hexadecimal format. The interpretation of the bytes in the string is performed by running strtoul on each byte. The base can be forced by placing a directive at the start of the string: \b or \B for binary, fR or \O for octal, \d or \D for decimal, and \x or \X for hexadecimal. Any number of -b options can be present.

-O
--disassembler-options[=string]
Set the options string for the libopcodes disassembler. See --list-disassembler-options.

--list-architectures
List the supported BFD architectures. See --architecture.

--list-disassembler-options
List the libopcodes disassembler options for the target architecture. See --disassembler-options and -a.

--list-syntaxes
List the available syntax options. See --syntax.

--list-formats
List the available output formats. See --format.

--list-bfd-symbols
List the symbols found in a BFD target. See BFD SUPPORT.

--dry-run
Print a list of the targets, jobs, and memory maps without actually doing any disassembly.

 

DISASSEMBLY

opdis implements two disassembly algorithms:

linear, short for linear sweep. This disassembles instructions sequentially, in the order they are encountered in the target buffer.
cflow, short fot control flow. This attempts to follow the flow of execution in the target buffer, recursing to follow branch (e.g. call and jump) targets and halting disassembly when an unconditional jump or return is encountered.

opdis uses jobs to represent user requests for disassembly. Each job is composed of a choice of algorithm (i.e. linear or cflow), a target, and a VMA in the target to use as the starting address for the algorithm. In the case of BFD jobs, the algorithm is implicit, and a symbol is used in place of the VMA. The -c, -l, -E, -N, and -S options are used to request disassembly jobs.

Jobs are executed in the order that they are requested. Any number of jobs may be requested. It is recommended that --dry-run be used to preview jobs before they are performed.

If no jobs are requested by the user, a linear disassembly of all target buffers is performed.

 

DATA MODEL

The data model used by opdis and libopdis uses seven types of objects: offsets, VMAs, instructions, operands, CPU registers, address expressions, and absoute addresses.

An offset is a position in a target buffer.

A Virtual Memory Address or VMA is the load address for an offset.

An instruction is a memory address that has been sucessfully decoded into an assembly-language instruction. It consists of the following fields:

offset : The offset of the instruction in the target buffer.
vma : The load address of the instruction.
size : The number of bytes in the instruction.
bytes : The undecoded bytes of the instruction.
ascii : The raw ASCII representation of the instruction generated by libopcodes.
prefixes : Mnemonics for any prefix bytes preceding the instruction.
mnemonic : The mnemonic for the instruction opcode.
isa : The instruction set (or subset) that the istruction belongs to, e.g 'general purpose', 'fpu', 'sse'.
category : The type of instruction, e.g. 'control flow', 'stack', 'bitwise'.
flags : The flags for the instruction, e.g. 'call', 'jump', 'xor'.
operands : The arguments to the instruction.
comment : Comments generated during disassembly.

See the libopcodes API documentation for the possible values of isa, category, and flags. Note that the only fields guaranteed to be filled by the disassembler are offset, vma, size, bytes, and ascii. See NOTES for details.

An operand is an argument to an assembly language instruction. An operand can be a numeric value (also known as an immediate value), a CPU register, an address expression, or a segment:offset address (also known as an absolute address). An operand has the following fields:

ascii : The raw ASCII representation of the operand generated by libopcodes.
category : The type of operand: 'register', 'immediate', 'absolute address', or 'address expression'.
flags : The flags for the operand: any combination of 'read', 'write', 'exec', 'signed', 'address', and 'indirect'.
value : The value of the operand.

A register operand has the following fields:

ascii : The name of the register.
flags : The register flags.
id : The ID of the register. Registers which have different names but the same ID, e.g. %eax and %rax, are aliases of each other.
size : The size of the register in bytes.

An expression operand has the following fields:

base : The CPU register containing the base address.
index : The CPU register containing the index value to be shifted and added to the base.
scale : The scale )shift) factor applied to the index.
shift operation : The method of shifting (e.g. logical/arithmetic, left/right, shift/rotate) used in calculating the expression.
displacement : An offset or VMA added to the rest of the expression. Note: the displacement can be an absolute address.

An absolute operand has the following fields:

segment : The segment register.
offset : An offset or VMA added to the value in the segment register.

 

TARGETS

A target is a buffer for disassembly. Targets may be declared in one of two ways: by specifying the specific bytes to disassemble in an argument to the -b option, or by declaring object file arguments on the command line. Targets are assigned an ID in the order they appear in the command line, with the first target given ID 1. The first target is always the default target for all operations. It is recommended that --dry-run be used to preview targets before performing disassembly.

The disassembler checks for unique addresses while disassembling, and will not disassemble addresses it has already encountered. Thus, while it is possible to combine multiple targets, the VMAs of the targets must not overlap. Note that when multiple -b options are provided, and the user has not specified any memory maps, opdis will map the bytes sequentially into memory starting at VMA 0x0. The options "-b '90 90 90 90' -b 'cc cc cc cc'" will result in a memory map from VMA 0x00-0x03 containing the contents of the first buffer, and a second map from VMA 0x04-0x07 containing the contents of the second buffer.

It is not possible to load targets with different architectures in the same invocation of opdis.

 

BFD SUPPORT

opdis, like libopcodes, supports BFD targets through libbfd. Three additional jobs are available for BFD targets:

Disassemble Entry : Perform a control flow disassembly starting at the BFD entry point
Disassemble Symbol : Perform a control flow disassembly starting at a BFD symbol
Disassemble Section : Perform a linear disassembly of a BFD section

A BFD will be created for a target if one of these jobs is requested, or if the -B option is used.

BFD targets do not require the use of the -a flag as libbfd will detect the architecture of the object file. For non-BFD targets, the --list-architectures option is provided to print a list of the target architectures supported by the local libbfd.

The format for specifying a bfdname is

[target:]name

where target is the ID of the target containing the symbol and name is a valid BFD symbol. The target is only required if more than one target is being disassembled. A list of symbols found in a target can be printed to STDOUT by using the --list-bfd-symbols option.

 

MEMORY MAPS

A memory map associates a VMA with an offset into a target buffer. This is useful when a target buffer must appear to be at a specific load address, either to clarify the output or to combine multiple targets into a single address space. It is recommended that --dry-run be used to preview memory maps before performing a disassembly.

Note that memory maps are only used when creating disassembly jobs. The actual disassembler algorithms rely on ofsets from the load address of the target buffer (provided by a memory map), and do not respect the size of memory maps delcared by the user (i.e. a linear disassembly of size 0 will continue until the end of the buffer, not the end of the memory map). Memory maps are ignored for BFD targets. See ADDRESS FORMAT.

 

ADDRESS FORMAT

The format for specifying an address (a memspec) is

       [target]:offset|@vma[+size]

where target is the ID of the target containing the address, offset is the offset of the addressin the target buffer, vma is the load address of that offset, and size is the size of the memory region being specified. The target is onlty required if more than one target is being disassembled. Either offset or vma must be specified, except in defining memory maps when vma is required.

Note that : is used to indicate that the next argument is an offset, @ is used to indicate that the next argument is a vma, and + is used to indicate that the next argument is a size. This means that the arguments can appear in any order, except for target which is undelimited and must appear first. The target, offset, and size arguments all have default values which take effect if they are not specified. The default target is 1, the ID of the first target. The default offset is 0. The default size is 0, which specifies the entirety of the target buffer.

 

FORMAT STRINGS

Custom output formats are supported via a printf-style format string that allows the user to determine what information about an instruction or operand is printed.

 

Component Selection

A single character specifying what metadata to print. The %i (instruction) and %o (operand) operators represent complex objects. The metadata for these objects are available as arguments to the operator, allowing the display of categories, flags, etc.

A The ascii field of the operator. This is the default, and prints the raw libopcodes representation of the object.

C The category field of the object.

F The flags field of the object. The flags are comma-delimited.

I The isa field of the object. This is only applicable to instruction objects.

 

Base Selection

A single character specifying the base to print addresses or bytes in.

X Print data in hexadecimal format.

D Print data in decimal format.

O Print data in octal format.

C Print the ASCII (character) value for a byte. Only applicable to the %b operator.

 

Operand Selection

A single character specifying which operand to print. If the operand is not present in the instruction, nothing will be printed.

a Print all operands, separated by commas.

t Print the target operand of the instruction.

d Print the destination operand of the instruction.

s Print the source operand of the instruction.

# Print the #th operand of the instruction, where # is a number between 0 and 9.

 

Operators

A single character specifying what data to print.

%i Print an instruction component. Can be followed by a component selection argument. By default, the ascii component is printed. See Component Selection.

%o Print one or all operands. Can be followed by an operand selection argument and/or a component selection argument (NOTE: operand selection must always precede component selection). By default, the ascii component of all operands is printed. See Operand Selection and Component Selection.

%b Print the instruction bytes. Can be followed by a base selection argument. By default, the base is hexadecimal. See Base Selection.

%l Print the length of the instruction in bytes.

%m Print the instruction mnemonic field.

%a Print the address of the instruction. Can be followed by a v argument or a o argument to specify which address to print (vma of offset, respectively). Can be followed by a base selection argument (NOTE: address selection must precede base selection). The default is to print the VMA in hexadecimal format. See Base Selection.

%p Print the instruction prefixes field.

%c Print the instruction comment field.

%? Print a conditional delimiter. The delimiter, specified by the character following ?, is only printed if next % directive returns a string. The intent of this operator is to allow delimiters to be printed between operands only if the operands appear in the output.

%t Print a conditional tab. See %?.

%s Print a conditional space. See %?.

%n Print a conditional newline. See %?.

%% Print a literal '%'. See %?.         

 

NOTES

The level of detail available to opdis is determined by the decoders available in libopdis. Currently, libopdis provides three decoders: x86 AT&T syntax, x86 Intel syntax, and generic. The AT&T syntax decoder (used by default or when -s at is supplied) provides the most detail, and generates output that is best suited for analysis. The generic decoder, the only decoder available for non-x86 architectures, only provides the raw libopcodes representation (the asciiR field) of the instruction and no operand information. Additional architecture-specific decoders must be added to libopdis to overcome this limitation.

opdis does not emit instructions as they are disassembled. Instead, instructions are stored in a binary tree and printed in order of VMA after all dfisassembly jobs have completed.

 

EXAMPLES

Disassemble an object file from its entry point:

opdis -E a.out

Disassemble an object file from the symbol 'main':

opdis -N main a.out

Disassemble the .text section of an object file:

opdis -S .text a.out

Disassemble the .text section of an object file and write to disasm.xml:

opdis -S .text -o asm.xml -f xml a.out

Perform a linear disassembly of 1024 bytes at offset 0x100 in an object file:

opdis -l :0x100+1024 a.out

Perform an intel-syntax control disassembly starting at offset 0x200 in an object file:

opdis -s intel -c :0x200 a.out

Disassemble shellcode:

dist/opdis -b '31 c0 bb 08 84 04 08 53 89 e1 31 d2 b0 0b cd 80 00 00'

(see http://www.shell-storm.org/shellcode/files/shellcode-44.php)

Map the specified bytes to VMA 0x1000 and disassemble:

opdis -m :0@0x1000 -b '2e 2e 74 50 90'

Map target 1 to VMA 0x1000, target 2 to VMA 0x1100 and disassemble:

opdis -m 1@0x1000 -m 2@0x1100 -b '2e 2e 74 50 90' -b 'cc cc cc cc'

Print the size, bytes, and mnemonic (if present) of each instruction:

opdis -f '%l:%b%?:%m' a.out

 

SEE ALSO

objdump(1), od(1), readelf(1), and the Info entries for binutils.

 

COPYRIGHT

Copyright (c) 2010 thoughtgang.org.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 3.0 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License".


 

Index

NAME
SYNOPSIS
DESCRIPTION
OPTIONS
DISASSEMBLY
DATA MODEL
TARGETS
BFD SUPPORT
MEMORY MAPS
ADDRESS FORMAT
FORMAT STRINGS
Component Selection
Base Selection
Operand Selection
Operators
NOTES
EXAMPLES
SEE ALSO
COPYRIGHT

This document was created by man2html, using the manual pages.
Time: 05:30:01 GMT, March 10, 2010