How to use inline ASM using WinAVR

I have been working on the optimization of one of my C codes. I needed one function to be as optimal as possible. I decided to use inline ASM to achieve this. I decided to write a few lines about this.

There are a few rules that are necessary to follow. Each ASM statement is divided by colons into 3(up to four parts):

  1. Assembler instructions part;
  2. A list of output operands (comma separated);
  3. A list of input operands (comma separated);
  4. Clobbered register – usually left empty.

asm(code : output operand list : input operand list [: clobber list]);

Due to the optimization strategy, the compiler may decide which registers will be used for ASM code or decide not to use inserted inline ASM code. To avoid this, it is recommended to use keyword volatile:

asm volatile(code : output operand list : input operand list [: clobber list]);

Lets go through it with some examples.

Let us say, we want to enable or disable global interrupts. The simple inline ASM sentence will do this:

asm volatile(“cli”::);

asm volatile(”sei”::);

empty command may be inserted like this:

asm volatile( “nop ;this is comment“ ”\n\t”

“nop ;this ASM inline includes 2 nops“ ”\n\t”

::);

Note: “\n\t” is used only for listing purposes- new line and tabbed commands.

When inserting inline ASM code to the c program, there is possible to use some special register, that doesn’t have to be assigned to any variables:

SymbolRegister
__SREG__Status register at address 0x3F
__SP_H__Stack pointer high byte at address 0x3E
__SP_L__Stack pointer low byte at address 0x3D
__tmp_reg__Register r0, used for temporary storage
__zero_reg__Register r1, always zero

Input and output operands are described by a constraint string followed by C expression:

ConstraintUsed forRange
aSimple upper registersr16 to r23
bBase pointer registers pairsy, z
dUpper registerr16 to r31
ePointer register pairsx, y, z
GFloating point constant0.0
I6-bit positive integer constant0 to 63
J6-bit negative integer constant-63 to 0
KInteger constant2
LInteger constant0
lLower registersr0 to r15
M8-bit integer constant0 to 255
NInteger constant-1
OInteger constant8, 16, 24
PInteger constant1
qStack pointer registerSPH:SPL
rAny registerr0 to r31
tTemporary registerr0
wSpecial upper register pairsr24, r26, r28, r30
xPointer register pair Xx (r27:r26)
yPointer register pair Yy (r29:r28)
zPointer register pair Zz (r31:r30)

The following table shows all assembler mnemonics which require operands and related constraints.

MnemonicConstraintsMnemonicConstraints
adcr,raddr,r
adiww,Iandr,r
andid,Masrr
bclrIbldr,I
brbcI,labelbrbsI,label
bsetIbstr,I
cbiI,Icbrd,I
comrcpr,r
cpcr,rcpid,M
cpser,rdecr
elpmt,zeorr,r
inr,Iincr
ldr,elddr,b
ldid,Mldsr,label
lpmt,zlslr
lsrrmovr,r
movwr,rmulr,r
negrorr,r
orid,MoutI,r
poprpushr
rolrrorr
sbcr,rsbcid,M
sbiI,IsbicI,I
sbiww,Isbrd,M
sbrcr,Isbrsr,I
serdste,r
stdb,rstslabel,r
subr,rsubid,M
swapr  

Constraint characters may be prepended by a single constraint modifier. Constraints without a modifier specify read-only operands. Modifiers are:

ModifierSpecifies
=The write-only operand, usually used for all output operands.
+Read-write operand (not supported by inline assembler)
&Register should be used for output only

Note: Output operands always must be write-only.

Input operand doesn’t have to be read-only, for instance if you need same register for input and output. Then you may use digit in the constraint string:

asm volatile("swap %0" : "=r" (value) : "0" (value));

Constraint “0” tells the compiler to use a register with 0 (%0).
Let’s look at the other example:

asm volatile("in %0,%1"    "\n\t"
             "out %1, %2"  "\n\t"
             : "=&r" (input) 
             : "I" (_SFR_IO_ADDR(PORTD)), "r" (output)
            );

Let’s take a look at the first line, “in %0,%1”. The operand %0 is replaced with a register where is input value stored. The register is write-only, and it is used for output only(& modifier). The operand %1 is replaced with “I” (_SFR_IO_ADDR(PORTD)), which respond as PORTD address.

Note: IO register has to be always input operand.

The second line of ASM code is similar. Just %2 operand is tied to any register from range (r0 to r31).

What if we need to pass 32-bit value to inline ASM? Then there is the ability to use different letters, which refer to different 8-bit registers:

uint32_t value=0xffffffff;
asm volatile("mov __tmp_reg__, %A0" "\n\t"
             "mov %A0, %D0"         "\n\t"
             "mov %D0, __tmp_reg__" "\n\t"
             "mov __tmp_reg__, %B0" "\n\t"
             "mov %B0, %C0"         "\n\t"
             "mov %C0, __tmp_reg__" "\n\t"
             : "=r" (value)
             : "0" (value)
            );

%A0 is the lowest byte of 32-bit value and %D0 is the highest byte. And then all operations are made with these bytes separately. And then can be returned as a 32bit output parameter by using the number as a modifier (“0” in this example).

The last thing I would like to cover is pointers. The input parameter can be defined as:

:”e” (ptr)

Then compiler selects registter z(r30:r31). Then:

%A0 refers to r30

%B0 refers to r31

But if you need to point to address location with address stored in Z register like

ld r24, Z

then you need to use variable with lower case letter like:

ld r24, %a0

Few words about Clobbers. Clobbers are necessary when you are using registers which have not been passed as operands, you need to inform the compiler. For instance:

asm volatile(
    "cli"               "\n\t"
    "ld r24, %a0"       "\n\t"
    "inc r24"           "\n\t"
    "st %a0, r24"       "\n\t"
    "sei"               "\n\t"
    :
    : "e" (ptr)
    : "r24"
);

In this example we are using r24 register. The compiler produces the following code fragment in listing:

    cli
    ld r24, Z
    inc r24
    st Z, r24
    sei

Another clobber definition may be “memory,” which means that the assembler may modify any memory location. But it forces the compiler to update all variables before executing the ASM code. Try not to use clobbers; it is possible because this gives more freedom to the compiler to optimize the code.

Suppose you need to reuse some assembler parts more than one time it is recommended to define macros. In AVRLibc, you may find many of them. To avoid compiler warnings, use __asm__ instead of asm and __volatile__ instead of volatile. Other options re the same as in a regular inline assembler:

#define loop_until_bit_is_clear(port,bit)  
        __asm__ __volatile__ (             
        "1: " "sbic %0, %1" "\n\t"      
                 "rjmp 1b"               
                 : /* no outputs */        
                 : "I" (_SFR_IO_ADDR(port)),  
                   "I" (bit)    
        )

I wrote a stub function (the function contains nothing but assembler code). Larger routines should make those stub functions because using macro asm routines may be painful because of code size inserted (not called) when the macro is called. My stub function for the AVR DDS generator:

void signalOUT(const uint8_t *signal, uint8_t ad2, uint8_t ad1, uint8_t ad0)

{

asm volatile( “eor r28, r28 ;r28<-0” “\n\t”

“eor r29, r29 ;r29<-0” “\n\t”

“Loop1:” “\n\t”

“add r28, %0 ;1 cycle” “\n\t”

“adc r29, %1 ;1 cycle” “\n\t”

“adc %A0, %2 ;1 cycle” “\n\t”

“lpm __tmp_reg__, %a3+ ;3 cycles” “\n\t”

“out %4, __tmp_reg__ ;1 cycle” “\n\t”

“rjmp Loop1 ;2 cycles. Total 9 cycles” “\n\t”

:

:”r” (ad0),”r” (ad1),”r” (ad2),”e” (signal),”I” (_SFR_IO_ADDR(PORTD))

:”r28″, “r29”

);

}

lister output fragment:

1768 /* #APP */

1769 00f6 CC27 eor r28, r28 ;r28<-0

1770 00f8 DD27 eor r29, r29 ;r29<-0

1771 Loop1:

1772 00fa C20F add r28, r18 ;1 cycle

1773 00fc D41F adc r29, r20 ;1 cycle

1774 00fe 261F adc r18, r22 ;1 cycle

1775 0100 0590 lpm __tmp_reg__, Z+ ;3 cycles

1776 0102 02BA out 18, __tmp_reg__ ;1 cycle

1777 0104 FACF rjmp Loop1 ;2 cycles. Total 9 cycles

1778

1779 /* #NOAPP */

Note: /* #APP */ and /* #NOAPP */ comments are generated by a compiler to show which sentences were not generated by compiler (inline ASM).

I wanted to make the Loop part of being as small as possible. So I managed to use 9 clocks per cycle. The code fragment is from https://www.myplace.nu/avr/minidds/minidds.asm

On the other hand, it will be easier to calculate signal timings because the inline asm is not affected by a compiler optimization.

Read more about using inline asm using WinAVR from https://www.nongnu.org/avr-libc/user-manual/inline_asm.html

8 Comments:

  1. why is the line of font so tiny? e.g.
    asm(code : output operand list : input operand list [: clobber list]);
    next line of tiny font
    asm volatile(code : output operand list : input operand list [: clobber list]);

    yet the font on the reast of the page is normal size readable font

  2. I agree about font mess. this is somehow related with WYSIWYG editor. Ill try to fix this issue. Thanks 😉

  3. GCC complains when I try to use “ldd”
    instruction. Can you give an example of proper use
    of ldd ?

  4. never mind, it works

  5. What encoding did you use to write this article? I have unreadable characters inside brackets in volitile() command. Very interesting article, but you have feeling like a kid who is looking into a candy store through the window. You can see it, but you cannot have it.

    Could you please to send me this article in some more compatible format like *.pdf? Please, do not use *.doc. I have a bad experience not being able to read a text written on another computer.

    Thank you.

    Konstantin

  6. Seems that older articles were corrupted earlier during some major website upgrade. Will try to fix the article. Thanks for letting know. If you find more – just drop a comment. Thanks

  7. The article has been fixed. Sorry for this inconvenience.

  8. Thanks a lot dude, found your tips very useful in getting around issues caused by the automated optimisation in the GCC toolchain.

Leave a Reply