Also, note that R0 is 0 by default, as passed by u-boot due to legacy coding.
At first, we ensure that if we are running in SVC mode, if not we need to change to SVC. (In our case, we are in SVC mode as passed by u-boot).
Next, disable interrupts if not already done (We already have it disabled).
Next we check if we are running from the addr. from where we had compiled, using our compiled LC0 addr. with the current LC0 addr. (LC0 table is in the same file).
In our case, LC0 is @0x80008138 while compiled addr is @0x138. We have an offset of 0x80008000, so we append this offset to all entries in the LC0 table. The changes are as follows:
Label Reg Compiled val New value(with added offset)
LC0 r1 0x0138 0x0138
__bss_start r2 0x0015564c 0x8015d64c
_end r3 0x00155668 0x8015d668
zreladdr r4 <-Already holds 0x80008000
_start r5 0x0 0x80008000
_got_start r6->r11(changed) 0x00155618 0x8015d618
_got_end ip (r12) 0x00155640 0x8015d640
user_stack+4096 sp (r13) 0x00156668 0x8015e668
---------------------- malloc space (0x8016e668)
64K Max.
---------------------- user stack (0x8015e668)
---------------------- _end (0x8015d668)
---------------------- __bss_start (0x8015d64c)
---------------------- _got_end (0x8015d640)
---------------------- _got_start(0x8015d618)
---------------------- _start (0x80008000)
Relocate all entries in GOT(r11) as well.
Clear bss (r2) till bss_end(r3).
Now C environment should be setup,
Turn the cache on, set up some pointers, and start decompressing.
Jump to cache_on (@0x80008160)
cache_on:
Turn on the cache. We need to setup some page tables so that we
can have both the I and D caches on.
We place the page tables 16k down from the kernel execution address,
and we hope that nothing else is using it. If we're using it, we will go pop!
* On entry, r4 = kernel execution address r6 = processor ID r7 = architecture number r8 = atags pointer r9 = run-time address of "start" (???) On exit, r1, r2, r3, r9, r10, r12 corrupted This routine must preserve: 346 * r4, r5, r6, r7, r8
Sub branching to call_cache_fn()
/* 575 * Here follow the relocatable cache support functions for the 576 * various processors. This is a generic hook for locating an 577 * entry and jumping to an instruction at the specified offset 578 * from the start of the block. Please note this is all position 579 * independent code. 580 * 581 * r1 = corrupted 582 * r2 = corrupted 583 * r3 = block offset 584 * r6 = corrupted 585 * r12 = corrupted 586 */
(proc_types @0x800083f8 for beagle)
In this function, we read processor id from c0,c0 co-processor reg and
start comparing with proc_types table entries. Each table value, along
with its mask is compared until match is found.
For Beagle, this is the matching table entry:
.word 0x000f0000 @ new CPU Id 761 .word 0x000f0000 762 W(b) __armv7_mmu_cache_on 763 W(b) __armv7_mmu_cache_off 764 W(b) __armv7_mmu_cache_flush
Once a match is found, we jump to corresponding functions. In our case,
we jump to __armv7_mmu_cache_on
Here, first we preserve lr value in r12.
Read ID_MMFR0 (For beagle, its 0x31100003), test with VMSA(0xF) and see that
they are not equal.Hence, jump to __setup_mmu.
__setup_mmu:
R3 = R4 - 16384 (R4=0x80008000, Subtract 16K below entry point, where MMU Page table
will be set)
We will first see if R3 value is aligned for 0xff, if not align it.
In our case, R3=0x80004000 is perfectly aligned, so do nothing!
Now, start initializing the page tables, turning on cacheable and bufferable bits for
RAM area only.
From R3, lsr 18 times to get 0x2000
Again lsl 18 times, we get 0x80000000.Save in R9. This would be the start of RAM address!
(What we did here is to ignore lower 18 bit offset and consider only upper 16 bits)
Add another 0x10000000 to above value to get a reasonable RAM size that we can assume.
So, it would be 0x90000000. Save it in R10.
So R9=0x80000000 and R10=0x90000000 gives an estimated RAM start and end values.
Next we will take another magic value of 18 and OR with 3<<10, which ends up as 0x0C12 in
R1. This will be our initial VA
Finally, set R2 = (R3+16k) = 0x80008000.
#VA-PA1
Compare R1 with R9 (VA with PA):
If VA > PA, we set R1 = R1 | 0x0C to make it cacheable, bufferable
Next, compare R1 with R10 (if now VA > PA end), if true
If VA > PA, we set R1 = R1 | 0x0C, to clear cacheable and bufferable.
So, what we do from above procedure is to set any VA address between PA start and PA end
to be cacheable and bufferable!
Finally, after above checks, save VA at the page table.
#VA-PA2
Now, update VA addr. Add another 0x100000 to existing VA value.
This is nothing but creating first level Page Table, with 1MB sections.
Repeat the above steps from #VA-PA1 to #VA-PA2 for entire 16K PTE.
Below is the final dump of the 16K PTEs:
No comments:
Post a Comment