Page 1 of 1

I Want To Learn Assembly But I Don't Know Where To Start!

#1 Martyn.Rae   User is offline

  • The programming dinosaur
  • member icon

Reputation: 555
  • View blog
  • Posts: 1,436
  • Joined: 22-August 09

Posted 02 April 2019 - 01:18 PM

Introduction

Let's start at the beginning. Now, unlike many sources of information regarding how to start learning assembly (the internet is strewn with them), I am going to take a much simpler approach by delving a little deeper into the microprocessor architecture. To do this, we are going to write a small program in pseudo code.

loop:       Put address out onto address bus
            Read machine code instruction
            Using the machine code instruction jump into subroutine
            Go back to loop
            ...
LDA:        Put address out onto address bus
            Read memory address
            Put memory address out onto address bus
            Read data into accumulator
            Return from subroutine



OK, the pseudo code is far from complete as the microcode (effectively a large number of on/off switches) required to perform these operations is not included. Microcode widths are extremely large 200+ bits wide on all but the simplest machine code architectures (I designed a microcomputer using the AMD bit-slice processors and ended up with 786 bit wide microcode). It is the microcode that determines what gates are opened or closed permitting the ones and zeroes to travel (a bit like traffic lights controlling the flow of traffic). That being said, I hope you are getting the idea here. A machine code instruction is simply a means of determining what subroutine to call within the processor's microcode. In other words a program is used to execute each and every machine code instruction (yipes - even assembly is interpreted!!!). Using our pseudo code above, to load a value into a register would take 8 clock cycles.

Back in the day most microprocessor programming manuals specified the number of clock cycles specific instructions would take to complete. This was important information for assembly programmers to know especially if they were working on device drivers. However modern microprocessors have two or three levels of cache, multiple arithmetic/logic and floating point units so knowing the clock cycles required to complete an instruction is meaningless if not nigh on impossible to compute.

Theoretically, one could add new instructions to an existing architecture simply by writing the appropriate sequence of microcode instructions and setting the microcode switches (I did this on a DEC PDP 11/34 minicomputer once - thanks to the microcode compiler!). Sadly, microprocessor chips do not give you that ability as the microcode is effectively 'hardwired' into the microprocessor circuitry.

Summing up this part then, an assembly program is effectively a sequence of machine code instructions, modifiers and data that is interpreted by the microcode to achieve some desired result. There are no side effects as each instruction specifies in exacting terms what needs to be done by the microprocessor and it does it perfectly (unless we remember the Intel 80386 IMUL instruction's microcode that had a teeny weeny bug in it under some obscure conditions).

Modern Day Compilers

The modern day compilers that produce machine code are truly awesome! There optimisation techniques are amazingly complex and you end up with code that matches the code that can be written by even the best assembly programmer. Unfortunately, they are also extraordinarily dumb insomuch as they cannot guess the programmers intent. Here, I am not talking about poor coding techniques but rather the fact that whilst the compiler can optimise several lines of code, it cannot optimise large chunks of code. Let me give you an example.

   char s[] = "ABCD";
   if (strncmp(s, "ABCD") == 0) {
       ...
   }



That little snippet generates the following:-

	const char s[] = "ABCD";
00EB1010  mov         eax,dword ptr [string "ABCD" (0EB20F8h)]  
00EB1015  mov         dword ptr [s],eax  
00EB1018  mov         al,byte ptr ds:[00EB20FCh]  
	if (strncmp(s,"ABCD", 4) == 0) {
00EB101D  push        4  
00EB101F  mov         byte ptr [ebp-8],al  
00EB1022  lea         eax,[s]  
00EB1025  push        offset string "ABCD" (0EB20F8h)  
00EB102A  push        eax  
00EB102B  call        dword ptr [__imp__strncmp (0EB20B4h)]  
00EB1031  add         esp,0Ch  
00EB1034  test        eax,eax  
00EB1036  jne         main+39h (0EB1039h)  



whereas in assembly we could write:-

          .data
s         db          'ABCD'
          .code
          mov         eax, s
          cmp         eax, 'ABCD'
          jne         ...



There is nothing wrong with the high level code that was written, it's just that the compiler can only take the code we have provided and try to optimise the code generated from that. We on the other hand as assembly programmers can see that the code required to complete the task is a move into register, a compare immediate instruction and jump if it's not equal.

The Future

As I have stated, modern day microprocessors are extraordinarily complex with more than 15 billion transistors. We are looking at a future with many microprocessor cores (Xeon Phi already has 61 cores). This fact alone does perhaps suggest that learning assembly is rather a pointless exercise but all good things come to an end. Unless something radical happens (quantum processors perhaps) most microprocessor engineers see Moore's Law coming to an end in about 5-6 years. What happens when the requirements for faster and more powerful systems cannot be fulfilled because the processing power is just not there to accommodate. Would the solution not be to write systems in assembly? As they say what goes around comes around!

Moving Forwards

To learn assembly, write a few lines of code in c, compile it and the using a debugger take a look at what the disassembly looks like then try to single step through each instruction. The Visual Studio IDE is perfect for this as you can split the window into the disassembly and the registers. This is how I learnt how to write my first assembly program on an ICL 1900. There was no internet and no books save the Assembler Reference Manual so I could read what the instruction actually did. From thereon in the world is your oyster so to speak as mastering the syntax of the assembler (i.e. how to write the machine code mnemonics and the format of instructions) is fairly straight forward and I should know as I have written assembly programs on the ICL 1900, Zilog Z80, DEC PDP 11/34, Univac 1100/61, Varian V77, CR80, MC68000, x86 and x64 - all self taught and without the use of the internet, books or forums.

Is This A Good Question/Topic? 2
  • +

Page 1 of 1