« So you want to be a coder... (Part 1 - Nuts & Bolts) | Main | NEW RESEARCH: Women and Coding »

10/20/2012

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Question

Very interesting tutorial.
As you can guess, i've got a few questions incoming.

General Registers :
You seem to imply that all named registers, EAX, EBX, ECX, EDX, ESI, EDI and EBP, are essentially the same.
But is that correct ?
A minor difference that can be extrapolated from your tutorial is that registers A, B, C & D can be accessed "partially" (AL,AH, AX). The same does not look possible on EDI, EDI & EBP.
I guess there are probably other subtle differences.
I'm asking because your sample program uses ESI & EDI.


Push instruction :
Is that a single opcode ?
Or is that a macro translated into a few opcodes ?
I'm asking because, last time i learned assembler (quite a long time ago), it was on a CPU for which "push on stack" was a completely hand-made construction. But i guess x86 are more optimised.

call & ccall instructions :
I guess now that call is probably an opcode, while ccall is probably a macro, consisting of a few push followed by a call.
It would be interesting to know a bit more about it, and the difference with stdcall (which was present in previous tutorial).

ret instruction :
if it is equivalent to "pop eip", then that means that previously pushed data is still on the stack, and must be manually cleaned (or used, maybe sometimes the result of the function is there).
Which makes it all the more interesting to understand what ccall & stdcall are doing, especially if some manual cleaning is required afterwards (well, more probably, i guess it is embedded into the macro...)

Regards

Stephen Nichols

These are all good questions. Let me answer each one in turn:

You're right to point out that there are differences between the registers. ESI, EDI, ESP and EBP do not have the byte-sized aliases. They do, however, have the word-sized aliases (i.e. SI, DI SP and BP). I've updated the tutorial text to clarify this.

ESI and EDI are generally used as indexers, although there's no requirement that they're used that way. Just habit on my part.

All of the instructions that I laid out in this tutorial are single instructions (single opcode) and not macros.

You're right about ccall (and stdcall for that matter). Those are macros for the two calling conventions provided by Flatasm. I'll be going into more detail on calling conventions soon. The main differences are:

1. ccall supports variable numbers of arguments. This makes it the responsibility of the caller to clean up the stack after making a call.

2. stdcall supports a fixed number of arguments. This makes it the responsibility of the callee to cean up the stack at the end of the function. The ReverseString function in this tutorial is an example of stdcall.

By way of example:

ccall Function,arg1,arg2,... becomes

push arg1
push arg2
...
call Function
add esp, (size of all args pushed)

stdcall Function,arg1,arg2,... becomes

push arg1
push arg2
...
call Function

Whenever you call a function in x86, the EIP of the return address is pushed onto the stack. When you return from a function (using RET), that saved EIP is restored and execution continues. As you pointed out, it's critical that you clean up your stack before exiting the function (depending on your calling convention). If you don't then your code will crash.

This is one of the reasons that buffer overruns on the stack can be so troublesome to debug. Imagine a buffer overrun that stomps on the return address for your function. When your function returns in that case, it returns to a bogus pointer. Nasty.

Mark Williams

Firstly can I say, Please keep this tutorial going. It's just what I have been searching for. It is GREAT!. I have a feeling I am missing the first prerequisite for programming but hopefully I can muddle through with the others. I am trying to come to terms with the Stack and associated instructions, in particular the "push ebp" before the "mov ebp,esp". Why is [ebp+8] used instead of [esp+24]?. Thanks again writing this Tutorial.

Stephen Nichols

Oh, I'll keep it going. Been really busy the past two weeks.

Generally speaking, ebp is called the "base pointer." And for good reason. We use it as the pointer to the base of the stack frame in our functions.

We do this because each time you push a value onto the stack esp is modified (decreased by the size of the thing you're pushing). Using esp to find other values on the stack makes things really annoying. The only real exception to this is when you're doing things relative to the current stack pointer on purpose.

I hope that helps!

ClarkTeegan

Which methons do you personally use to look for info for your future posts and which search websites or techniques do you mostly use?

Albert1

I read you didn't go to college... well, you're better than many professors at explaining this stuff!

Stephen Nichols

Thanks!

Yeah, I'm a high school dropout actually. It really makes telling my kids to stick with school difficult. We homeschool though so it's not too hard to stick with it. :)

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment