Monday, July 26, 2010

Entry point Debugging

Save reg. R1 & R2 containing Machine ID & ATAG pointer respectively.
Also, note that R0 is 0 by default, as passed by u-boot due to legacy coding.

At first, we ensure that if we are running in SVC mode, if not we need to change to SVC. (In our case, we are in SVC mode as passed by u-boot).

Next, disable interrupts if not already done (We already have it disabled).

Next we check if we are running from the addr. from where we had compiled, using our compiled LC0 addr. with the current LC0 addr. (LC0 table is in the same file).
In our case, LC0 is @0x80008138 while compiled addr is @0x138. We have an offset of 0x80008000, so we append this offset to all entries in the LC0 table. The changes are as follows:

Label                 Reg                                Compiled val      New value(with added offset)
LC0                      r1                                    0x0138                0x0138
__bss_start           r2                                    0x0015564c         0x8015d64c
_end                     r3                                    0x00155668          0x8015d668

zreladdr                 r4 <-Already holds 0x80008000

_start                    r5                                  0x0                       0x80008000
_got_start              r6->r11(changed)           0x00155618           0x8015d618
_got_end               ip (r12)                          0x00155640           0x8015d640

user_stack+4096   sp (r13)                          0x00156668           0x8015e668


---------------------- malloc space (0x8016e668)

64K Max.

---------------------- user stack (0x8015e668)


---------------------- _end (0x8015d668)


---------------------- __bss_start (0x8015d64c)



---------------------- _got_end (0x8015d640)



---------------------- _got_start(0x8015d618)




---------------------- _start (0x80008000)


Relocate all entries in GOT(r11) as well.
Clear bss (r2) till bss_end(r3).


Now C environment should be setup,
Turn the cache on, set up some pointers, and start decompressing.
Jump to cache_on (@0x80008160)
cache_on:
Turn on the cache.  We need to setup some page tables so that we 
can have both the I and D caches on. 
We place the page tables 16k down from the kernel execution address,
and we hope that nothing else is using it.  If we're using it, we
will go pop!
* On entry,
 r4 = kernel execution address
 r6 = processor ID
 r7 = architecture number
 r8 = atags pointer
 r9 = run-time address of "start"  (???)
 On exit,
  r1, r2, r3, r9, r10, r12 corrupted
 This routine must preserve:
 346 *  r4, r5, r6, r7, r8
Sub branching to call_cache_fn()
/*
 575 * Here follow the relocatable cache support functions for the
 576 * various processors.  This is a generic hook for locating an
 577 * entry and jumping to an instruction at the specified offset
 578 * from the start of the block.  Please note this is all position
 579 * independent code.
 580 *
 581 *  r1  = corrupted
 582 *  r2  = corrupted
 583 *  r3  = block offset
 584 *  r6  = corrupted
 585 *  r12 = corrupted
 586 */
 
(proc_types @0x800083f8 for beagle)
In this function, we read processor id from c0,c0 co-processor reg and
start comparing with proc_types table entries. Each table value, along
with its mask is compared until match is found.
 
For Beagle, this is the matching table entry:
.word   0x000f0000              @ new CPU Id
 761                .word   0x000f0000
 762                W(b)    __armv7_mmu_cache_on
 763                W(b)    __armv7_mmu_cache_off
 764                W(b)    __armv7_mmu_cache_flush
 
 
Once a match is found, we jump to corresponding functions. In our case,
we jump to __armv7_mmu_cache_on
 
Here, first we preserve lr value in r12.
Read ID_MMFR0 (For beagle, its 0x31100003), test with VMSA(0xF) and see that 
they are not equal.Hence, jump to __setup_mmu.
 
__setup_mmu:
 R3 = R4 - 16384 (R4=0x80008000, Subtract 16K below entry point, where MMU Page table
will be set)
We will first see if R3 value is aligned for 0xff, if not align it.
In our  case, R3=0x80004000 is perfectly aligned, so do nothing!
Now, start initializing the page tables, turning on cacheable and bufferable bits for 
RAM area only.
From R3, lsr 18 times to get 0x2000
Again lsl 18 times, we get 0x80000000.Save in R9. This would be the start of RAM address!
(What we did here is to ignore lower 18 bit offset and consider only upper 16 bits)
Add another 0x10000000 to above value to get a reasonable RAM size that we can assume.
So, it would be 0x90000000. Save it in R10.
So R9=0x80000000 and R10=0x90000000 gives an estimated RAM start and end values.
Next we will take another magic value of 18 and OR with 3<<10, which ends up as 0x0C12 in 
R1. This will be our initial VA
 
Finally, set R2 = (R3+16k) = 0x80008000.
 
#VA-PA1
Compare R1 with R9 (VA with PA):
If VA > PA, we set R1 = R1 | 0x0C to make it cacheable, bufferable
Next, compare R1 with R10 (if now VA > PA end), if true
If VA > PA, we set R1 = R1 | 0x0C, to clear cacheable and bufferable.
So, what we do from above procedure is to set any VA address between PA start and PA end 
to be cacheable and bufferable!
Finally, after above checks, save VA at the page table.
#VA-PA2 
Now, update VA addr. Add another 0x100000 to existing VA value.
This is nothing but creating first level Page Table, with 1MB sections. 
Repeat the above steps from #VA-PA1 to #VA-PA2 for entire 16K PTE.
 
Below is the final dump of the 16K PTEs:
 
 















No comments:

Post a Comment