dd86k's blog

Machine code enthusiast

Development Progress 2021 Q3

Author: dd
Published: August 8, 2021
Last modified: May 20, 2023 at 13h44
Categories:

Hey! Been a while, right? I figured I should post here more often, instead of spamming social media!

ddcpuid

ddcpuid lastly saw version 0.18.0 on June 24, 2021. This version adds thread count, Intel AMX support, DUB support (even as a library), and fixes compiling with GDC with optimizations enabled.

The upcoming version, 0.18.1, will feature the --level and -L switches, making the utility print which x86-64 optimization group the host processor fits in:

>ddcpuid | findstr /i Brand
Brand       : Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
>ddcpuid --level
x86-64-v2

The manual is always getting an eye from me for typos, so expect the optimization level chapter to be corrected with added information.

Expected to release this later this August. Not much planned with the utility either.

alicedbg

At the start of the year, alicedbg saw version 0.0.1 on March 20, 2021. I barely consider that a release, because since then, there have been major work with the overall project.

In short, new error system, new disassembler engine (v4), new tools, and new knowledge.

Debugger Engine

Right after the release of 0.0.1, I made the disassembler engine to be a little more debugger-aware. I added a fetch function where the disassembler automatically picks the source to work with.

Wait, that’s not really part of the debugger engine. Speaking of.

Disassembler Engine

When version 0.0.1 was released, I was utterly disappointed in myself. The state of the disassembler was in total disarray. It was absolute garbage. It was inaccurate, it was buggy, it was hard to work with, I just had to remake it from scratch. Both decoder and syntax engines.

When I restarted the work on the engine, I was feeling rather depressed afterwards, since the previous engine was up to the AVX2 extension. After all, this is a tremendous amount of work for one person to take.

For about two months in, things were moving, rather very slowly. I didn’t know where to start. Nevertheless, I persisted through, and I’m rather much happier about the current progress on the engine.

The newer engine now deals with more metadata. One day I was inspired by the best disassemblers out there: Zydis and Capstone. So hey, why not do something similar? The point now is that the decoder engine clearly states which type is each operand, how large is the immediate, and well, nothing extra on the register type just yet.

The new x86 decoder is more aware of the type it’s handling, the legacy prefix count, the maximum opcode size, etc. The new syntax engine is more aware of the metadata it’s dealing with.

Despite currently still implementing the legacy opcode map, here’s a rough preview of a new operation mode: Analysis:

$ alicedbg -s nasm -m i386 -A 266680c2dd
input      : (5) 26 66 80 c2 dd
output     : (5) 26 66 80 c2 dd
instruction: add dl, 0xdd
prefixes   :
segment    : es
mnemonic   : add
operands   : register=dl immediate=i8

With the new engine, and inspired from the ZydisInfo utility, the tool can now accept a hexadecimal setting.

At the time of writing, here’s what the adbg_disasm_opcode_t structure looks like:

/// The basis of an operation code
struct adbg_disasm_opcode_t {
	// size mode:
	int size; /// Opcode size
	// data mode:
//	enum targetMode;	/// none, near, far
//	ulong targetAddress;	/// CALL/JMP absolute target
//	int targetOffset;	/// CALL/JMP relative target
	// file mode:
	const(char) *segment;	/// Segment register string
	const(char) *mnemonic;	/// Instruction mnemonic
	size_t operandCount;	/// Number of operands
	adbg_disasm_operand_t[ADBG_MAX_OPERANDS] operands;	/// Operands
	size_t prefixCount;	/// Number of prefixes
	const(char)*[ADBG_MAX_PREFIXES] prefixes;	/// Prefixes
	size_t machineCount;	/// Number of disassembler fetches
	adbg_disasm_number_t[ADBG_MAX_MACHINE] machine;	/// Machine bytes
}

The idea here is to have the more information possible, and give the decoder all the possible tools to let it perform as it wishes, since x86 is so incredibly complex. I do have plans to change the prefixes field, say, give it a visibility field, so the decoder can still add the prefixes into the group, but turn it on or off after going further in the decoding process, by group.

Anyway, this is nothing new, Zydis and Capstone have their own ways to process these kind of things. This is still Work In Progress, after all. However, while both Zydis and Capstone are optimized for block operations, alicedbg have functions for both ranges and singular disassembly styles. Although the former is just an “easy” function that starts the range buffer and disassembles it for you in one swoop.

When compiled with the trace build type, gives us very spammy diagnostic information, so I don’t have to shoot myself in the foot when debugging:

$ alicedbg -s hyde -m i386 -A 266680c2dd
TRACE:adbg.disasm.disasm.adbg_disasm_configure: platform=2
TRACE:adbg.disasm.disasm.adbg_disasm_opt: opt=0 val=5
input      : (5) 26 66 80 c2 dd
TRACE:adbg.disasm.disasm.adbg_disasm_add_segment: segment=es
TRACE:adbg.disasm.arch.x86.adbg_disasm_x86_modrm_legacy_rm: mode=3 rm=2
TRACE:adbg.disasm.arch.x86.adbg_disasm_x86_modrm_legacy_reg: reg=2
TRACE:adbg.disasm.disasm.adbg_disasm_add_mnemonic: mnemonic=add
TRACE:adbg.disasm.disasm.adbg_disasm_add_register: register=dl
TRACE:adbg.disasm.disasm.adbg_disasm_add_immediate: type=0
output     : (5) 26 66 80 c2 dd
TRACE:adbg.disasm.disasm.adbg_disasm_mnemonic: prefixCount=0
TRACE:adbg.disasm.disasm.adbg_disasm_mnemonic: mnemonic=add
TRACE:adbg.disasm.disasm.adbg_disasm_mnemonic: operandCount=2
TRACE:adbg.disasm.disasm.adbg_disasm_mnemonic: i=1
TRACE:adbg.disasm.disasm.adbg_disasm_mnemonic: i=0
instruction: eseg: add (0xdd, dl)
prefixes   :
segment    : es
mnemonic   : add
operands   : register=dl immediate=i8

It’s unfortunately not a runtime option since I do plan to have --trace as something similar to strace/Procmon.

Oh, and, did you notice something? Yeah, that’s the Randall Hyde’s High Level Assembler syntax!

In fact, the syntax engine currently supports most of the Intel, AT&T, Netwide Assembler, Hyde’s HLA, and the Borland Ideal Turbo Assembler (enhanced mode) syntax. Both the ARM and RISC-V syntax are also planned to be implemented so each platform have their own native syntax.

Here are examples with alicedbg -m i386 -A 26660303 (and -s for the syntax):

  • Intel: add ax, word ptr es:[ebx]
  • Nasm: add ax, word ptr [es:ebx]
  • Att: add %es:(%ebx), %ax
  • Tasm: add ax, [word es:ebx]
  • Hyde: eseg: add ([type word ebx], ax)

Not bad, right?

I don’t really mind if the disassembler is a little slower if that’s the case, it’s in a much better state as of now than it was ever before. The feature-disasm-rewrite branch is still like 38 commits ahead of master.

Anyway, I have to get back to it. It’s still far, way far from perfect, but I plan to release version 0.1 as early as the end of the month of September this year.

Hoping my goal will be reached this time.