The compiler may store the local variables within a function in the registers or on the stack. Accessing the stack is slower than accessing registers, because it requires a memory access, so registers are typically prioritised as a storage location.
Most architectures (AVR included) don't have memory-memory (MM) operand instructions, i.e. they can't operate on two memory locations at once. Typically, you instead have register-memory (RM) operands, register-register (RR) operands, register-immediate (RI) operands, or memory-immediate (MI) operands. The reason is that the processor would have to service two memory accesses (read or write) in a single instruction, and that complicates the design of the processor core.
The method of variable initialisation can vary. Consider the following code:
int foo = 0;
The compiler could allocate an int-sized zero value in the firmware's data block, and read it back. The compiled assembly would look something like this:
lds r24, 0x0060 ; read from data address 0x60 into r24 register
lds r25, 0x0061 ; read from data address 0x61 into r25 register
This assumes an 8-bit AVR MCU with a 16-bit int type. This causes two memory reads, and you've got to allocate two bytes in the firmware just for storing a zero. The assembly instructions themselves are also four bytes long each. So just for this initialisation we would've used 8 bytes of code, two bytes of data in the firmware blob, and performed two memory accesses.
Instead, the compiler can just make the initialisation part of the code, using a load immediate instruction:
ldi r24, 0x00 ; load zero into r24
ldi r25, 0x00 ; load zero into r25
This uses just two bytes per instruction, requires no extra space to store the initial value, and performs no memory accesses at runtime. The initialisation is performed using immediate values stored in the assembly itself.
However, if you initialise a local variable with the value of a mutable (meaning non-constant) global variable, like so:
int bar = 123;
int test() {
int foo = bar;
// ...
return 0;
}
In this case the compiler can't assume that the value of bar will be 123 when the test() function is called, because the global variable value might have been altered. The initial value of the global variable is stored in the firmware's data section, which is loaded into RAM at boot. The compiler then copies the value from the memory address of bar into the registers that represent foo. This ends up looking exactly like our first example, where lds instructions were used to access memory.
However, if the compiler's optimisation pass can prove that the value of bar never changes throughout the program runtime (e.g. if you never set its value in your code) then it can instead treat it as if it were a constant value, and choose to have the foo variable initialised from immediates instead of memory.
Now, imagine you've got a program with lots of local variables. Eventually you'll run out of registers to store things in (this is known as register pressure), so you need to store the extra values elsewhere. That's where the stack comes in. You can push register values onto the stack, then pop them back out of the stack later when you need them again.
The stack is just a location in memory. As such, you can't (usually) directly pull data from memory into the stack in a single instruction. You must first load the value from memory into a register, then push the value in that register onto the stack.
Another complication comes with function calls. Let's say you have a function with a bunch of local variables, and those variables are stored in registers. You then call into another function, which also has a bunch of local variables, which also get stored in registers. What if those register usages collide? Well, if you don't handle this problem, the values that were in your registers before the call end up getting clobbered (i.e. overwritten) during the call, so all your local variable values end up wrong!
To solve this, we specify requirements for register preservation in a calling convention. A calling convention is a specification for how arguments are passed to functions (e.g. using register passing, the stack, etc.), which registers are used for return values, how register preservation is handled, what cleanup needs to be done at the end of a call, and which parts of each of these things the caller and callee are responsible for. This allows the compiler to know whether or not certain register values need to be preserved within a function call, which might mean that it needs to push some register values onto the stack (in what we call a "register save region") at the start of a call and pop them back off the stack afterwards.
When it comes to the actual storage and initialisation of global variables that are stored in memory, the details vary depending on the architecture.
8-bit AVR uses something called a modified Harvard architecture. This type of architecture has separate memory regions for data and code. The program flash on the AVR is split into two sections, and when the device boots it loads these sections into the data and code memory regions. This is a very simple system.
This is in contrast to von Neumann architectures like x86, where all memory is treated the same. Some ARM cores use von Neumann architectures, whereas others use modified Harvard architectures, so things vary a lot on the ARM side depending on which generation of ARM chips you're using. On von Neumann architectures you'll typically find some concept of memory page protection (read/write/execute flags) that allows you to differentiate between readable, writeable, and executable memory (and combinations thereof), and executables will contain separate sections for things like initialised data, uninitialised data, constant data, and executable code, each of which is loaded into a memory region with different memory protection.