I love coding. I mean, I really really love it. Some might argue that I love it a bit too much. Fuck them. :) This love of coding I have has fueled a certain level of passion about coding that has led me to a high degree of mastery of the art. Simply stated, I'm really good at it.
I'm always taken aback when I hear someone say that coding is hard to learn. From my perspective, it's very simple. Perhaps that's because I've spent so long coding that it's second nature to me. Perhaps... but I'm not buying that. I remember clearly my early learning years of coding. And, it's just not that hard. It's a relatively simple skill that anyone with the proper prerequisites can readily achieve. What are these prerequisites? They are: intelligence, interest and determination.
If you know that you're an intelligent person with an interest in coding and the determination to fail until you succeed, then becoming a master-level coder is within your reach. I thought it'd be a productive use of my time to write some tutorials on coding suitable for those that meet the above criteria. Think you have what it takes? Read on!
What is Coding?
Before you start to code, you need to understand some basics about coding. Most importantly, you need to know what coding is. Here's the basic definition:
Coding is the creation and organization of text files that are compiled by programs called compilers into executable files that can be run by your computer.
What does a coder do all day? They type and organize text files in their language of choice and run those text files through compilers to generate executable code. If you want to be a coder, you better like typing! Because that's the job!
Learning to type text that compilers can understand in the first skill of a coder. So that's what we're going to focus on first.
Spinach for Coders
Yes, that's right, assembly language. If you're an experienced coder then you might balk at the idea of learning assembly. If you're one of those folks, I'd wager your lack of knowledge of this critical language is one of the reasons you suck as a coder. And, if you're a programming noob, then there's no better way to learn the basics of coding than to dive into this easy to learn language.
Assembly language is ideal because it's ultimately the target language for every other programming language. Code written in, say C++, is compiled to instructions that directly map to assembly language for execution by your computer. This is true for every language in existence. They all run on the foundations of assembly language.
The skills you'll pick up grasping the basics of assembly language will make you a better coder in any language you choose to work with. It's like spinach for coders!
We'll be focusing on x86 assembly language for these initial tutorials. Let's dig right in!
Step 1) Get The Tools.
I'm assuming you're working on a Windows PC. If not, you'll need to get creative in adapting my instructions to your platform. Good luck with that!
The first thing you're going to need is an assembly language compiler. Thankfully, there's a super easy free one that we'll be using called "flat assembler." Click here to visit the flat assembler download page and download the proper package for your platform. For my Windows 7 machine the correct package to download was "flat assembler 1.70.03 for Windows."
Once you've downloaded this, you're going to need to unzip the package into a folder on your computer somewhere. I put mine in "c:\fasm." Take a moment to browse the files and folders contained within the package. Please notice the "fasm.pdf" file in the root of the package. That's got a rather complete manual for the assembler suitable for bathroom reading. You should notice two executable files "fasm.exe" and "fasmw.exe." For these tutorials we'll be using "fasmw.exe" to do our work. Open that executable and you should be greeted with a window that looks like this:
As far as programming interfaces go, you can't get much simpler than this one! Take a few moments to familiarize yourself with the interface. Pay particular attention to the "Run" menu , as this will be what you'll be using to compile and run your programs.
Step 2) Hello World!
One of the first things any programmer does when encountering a new language is to write a "Hello World!" program. This ubiquitous program simply prints the text "Hello World!" to the screen and exits. I'll start by showing you what this program looks like in assembler and then break it down piece by piece for you. Stick with it until it makes sense:
There it is in all its glory! Hello World, in assembly language! Take some time to type this program into flat assembler by hand (or download it here). Drink it in. If it looks like Greek to you, that's okay. Once you're done typing the code in, let's dig in line by line:
1: format PE console
This line tells flat assembler what kind of program you're writing. In this case, we're writing a "PE console" program. This means that the output executable file format is "PE" (which is shorthand for Portable Executable). It's also a console program, meaning that it doesn't create any on-screen windows of its own and is expected to be run from the command prompt.
2: entry main
This line tells the compiler that the main entry point of the program is a label called "main." Once Windows loads your program, execution begins at the "main" label. You can see that label defined at line 12 of the program.
4: include 'win32a.inc'
Flat assembler comes with several include files to make calling Windows functions easier. This tells the compiler to load them to make our lives easier. You can find the source code for 'win32a.inc' in the includes folder where you installed FASM. The specifics of what's in this file aren't too important yet, but feel free to browse them if you're feeling brave.
6: section 'data' data readable
Here's something you should know about how executable files are organized: they're organized into sections that hold different kinds of data. In assembly language, you need to understand this basic concept of code organization or your program won't work.
Each section is named and has some attributes. Our simple program has three main sections: "code", "data" and "imports." This particular line of code tells the compiler that the section named data starts here. Everything encountered by the compiler from this line until the next section (or end of the program) will be organized under the data section.
Our section called "data" has two attributes: data and readable. This means that the section holds data (not code) and that data can be read by the code in the program. If we didn't mark this section as 'readable' then the program would crash when it tried to read any data in this section.
8: HelloWorld db "Hello World!",0
Since we're in the data section, we're defining static data that are used by our program. This line tells the compiler to emit a collection of data bytes (specified by the 'db') that contain the characters "Hello World!" followed by a zero. The zero is there to tell the code when the "Hello World!" sentence (a.k.a. string) ends. These emitted data bytes are labeled with the name "HelloWorld" for use in our code. Said another way, we just created a variable called "HelloWorld" that points to the string "Hello World!",0 in the data section our our executable.
10: section 'code' code readable executable
Hey, this is a simple program so our data section is really small. This line tells the compiler that we're starting a new section called "code" that contains code that is readable and executable. Simple!
This line creates a new label called "main" in the code. This label is referenced by line 2 of the program to tell Windows where to start running our program once it's loaded. We'll learn more about labels soon when I cover conditionals and looping.
13: ccall [printf],HelloWorld
Let me tell you this... code is nothing but a collection of functions that do stuff to data. Almost every program you'll write will rely on functions provided by the operating system to get stuff done. This simple program is no exception to this rule.
Here's all you need to know about functions right now: they are "called" and can have parameters passed to them. I'll cover function calling and parameter passing in more detail in an upcoming tutorial. Patience!
This line of code calls the function named "printf" with the variable "HelloWorld" we created in the data section at line 8 of the program. "printf" is a standard C function that displays a string of characters to the console. The specific details of that functions aren't important right now. Just know that this line of code is what causes "Hello World!" to be printed to the screen.
You might be asking: "What the heck are the brackets around printf for?" Well, that tells the compiler that you're calling a pointer to a function and not the function directly. I'll cover that in more detail in an upcoming tutorial.
15: ccall [getchar]
If we ended the program now, bad things would happen. The most obvious thing would be that the program would display "Hello World!" and immediately terminate (or crash). The console window would show and exit before we could see the text. So, I added this function call to wait for you to press ENTER before continuing. It does this by calling a standard function called "getchar."
16: stdcall [ExitProcess],0
The last thing you do in your program is to exit. This is especially true in assembly language. If you don't tell Windows that your program is done, it won't stop running. In fact, if you removed this line from the program it would crash right after you hit ENTER.
This is because computers are stupid. They only execute what they see. The aforementioned crash is caused by the fact that the code bytes beyond the end of our program are undefined. So, when the computer reads those undefined instructions, bad things happen. The result? Crash!
Programs exit with what's called an "exit code." This is a number that tells other interested programs (usually the one that started your program) something about the status of your program's execution. The exit code that we're using here is 0.
We tell Windows that the program is done by calling the standard function "ExitProcess" with the parameter 0.
The astute reader will notice that we've used two different methods of calling functions: ccall and stdcall. What's that all about? Well, for now, just know that there are two main conventions when calling functions. The C calling convention and the "standard" calling convention. The particular details of this will be further covered in an upcoming tutorial. Thankfully, flat assembler hides some of the details of this for us.
18: section 'imports' import data readable
This line tells the compiler that we're now starting a new section called "imports" that contains import data that is readable.
The code for our program is done, but we still have a little bookkeeping to do before it will compile and run without issues. This is the third and final section of our program. The "imports" section. It tells Windows which functions we're using that are externally defined.
If you recall, we're calling three external functions: printf, getchar and ExitProcess. These functions are contained in standard libraries that we need to import. So, the remainder of this section does just that.
20: library kernel,'kernel32.dll',\
This line tells the compiler to emit a dependency on the standard Windows library "kernel32.dll" into our imports section. This dependency is named "kernel" so we can reference it easier later in the imports section.
We actually rely on two import libraries, but I didn't want to include them both on the same line. I also wanted to introduce you line continuation. The use of the backslash for line continuation is a common feature of many compilers. It allows you to break an otherwise long line into mutliple lines.
Lines 20-21 could be written like this: library kernel,'kernel32.dll',msvcrt,'msvcrt.dll' However, it's nicer (in my opinion) to break this up into multiple lines using the backslash.
Anyway, this line adds a dependency named "msvcrt" to the standard library "msvcrt.dll"
23-24: import kernel,ExitProcess'ExitProcess'
These lines tell the compiler to import the function named 'ExitProcess' from "kernel32.dll" and assign it to a variable called "ExitProcess" for use in our program on line 16.
26-28: import msvcrt,printf,'printf',getchar,'getchar'
These lines tell the compiler to import the functions named 'printf' and 'getchar' from "msvcrt.dll" and assign them to variables called "printf" and "getchar" respectively.
We've touched on a surprisingly large number of subjects in this simple Hello World! program:
- Exectuable file formats
- Program entry points
- Code and data sections
- Static data
- Function calls with parameters
- Libraries and imports
All of these things are really basic concepts that you need to understand to become a master coder. Read up on them and experiment with the code I've provided here until it makes good sense to you.
Next time, I'll be covering registers and the stack. Until then, enjoy!
You can download the source code for this tutorial here.