Saturday, August 18, 2012

HOW CAN USER APPLICATION SWITCH TO KERNEL MODE EXPLICITLY?


Yet another question at stack overflow that I answered involved around a query from the user looking for ways to switch from user mode to kernel mode in an application.

Assuming that the user wanted to switch over explicitly during the program code execution, below was my answer for the same. Hope this will be useful for some one else looking answers for similar questions:

The only way an user space application can explicitly initiate a switch to kernel mode during normal operation is by making an system call such as openreadwrite etc.
Whenever a user application calls these system call APIs with appropriate parameters, a software interrupt/exception(SWI) is triggered.
As a result of this SWI, the control of the code execution jumps from the user application to a predefined location in the Interrupt Vector Table [IVT] provided by the OS.
This IVT contains an adress for the SWI exception handler routine, which performs all the necessary steps required to switch the user application to kernel mode and start executing kernel instructions on behalf of user process.
share|edit|delete|flag

DEPRECATED OPTIONS IN INSMOD COMMAND in Linux 2.6 and beyond

Some time ago, I had answered a question from an user regarding the switches available for the insmod command.

Basically, the user wanted to try using the insmod -m option that prevailed during the days of Linux 2.4.
The current man pages for insmod did seem to still display these options, though they have been removed from the days of Linux 2.6.

Below is a summary of the same discussion found in stack overflow:

I try to use insmod "/my/url/fil.ko" -m to debug what happens but each time I got error -1 Unknown symbol in module while in /cat/log/message I can see the error unknown parameter -m



4
down voteaccepted
Yes, the init_module function gets called as soon as you load it to the kernel using insmod. You can just add a line of printk and verify it being printed as soon as you insert the module.
You cannot pass a parameter such as -m to debug the kernel module.
You can only pass parameters that are intended to be handled within the kernel module that you have written, using MODULE_PARAMS.
share|edit|delete|flag

 believe support for the -m switch on insmod was removed starting from kernel 2.6. You can find more info. here:linux.derkeiler.com/Mailing-Lists/Kernel/2003-09/3268.html – Amarnath Revanna Aug 9 at 18:21 

UNDERSTANDING LINUX INTERRUPT HANDLER

Key points when thinking about Linux Interrupt handling mechanisms:

  • Linux kernel is not a process, but  a process manager
  • Kernel pages are not swappable, so no page fault can actually happen
  • When an interrupt occurs, the time taken by the interrupt handler is accounted in for the process's time slice which was currently in execution

The OS can switch to kernel mode in the following scenarios:

  1. Exception handling (Page fault/Illegal memory access etc)
  2. HW Interrupt. This can happen asynchronously as long as interrupts are enabled (Any form of interrupts including timer interrupts even for scheduler, can be taken into account)
  3. System calls
Kernel control paths:
  • System calls - executed in process context
  • Exception Handling - Again, caused during process instruction execution, hence again process context
  • H/W Interrupt - Can happen asynchronously and can have nothing to do with currently running process, hence runs in its own context - Interrupt context. 

Synchronization:

Kernel Pre-emption:

What is kernel pre-emption?
Kernel pre-emption is a mechanism in which a scheduler is allowed to forcibly evict the currently running process and replace it with another process of same or higher priority, even if the current process can still continue to run if allowed to.

In case of  kernel pre-emption, if interrupts are enabled, a high priority process can take over the execution over currently executing process. In this way, the first process control path is left unfinished.

In this situation, no other process code, other than that of an interrupt or exception handling can get executed in a uniprocessor system. In case of a multiprocessor system, this is not the case though!

Disabling Interrupts:

Another approach to achieve synchronization while executing in critical region is to disable interrupts. Note that, by disabling interrupts, we are in effect disabling both interrupts as well as scheduling. Only way to execute some other kernel control path can now be only in case of exception handling, which can never be ignored in any case (i.e. for cases like divide by zero kind of operations etc. it becomes a must do operation). Again, even though disabling interrupts can work well for a uniprocessor machine, the same does not hold good while working with a multiprocessor machines, as disabling interrupts on one processor does not stop the critical section of the code from being executed from another cpu.


Semaphores:

Since neither of the above two can effectively protect the critical resources from being accessed improperly, we need a mechanism for a process to lock the execution of the critical section from being simultaneously accessed by other processes (both in a uniprocessor & multiprocessor machines). There are different forms of locks available in the Linux kernel that can serve the locking purpose "ideally" under different situations, one among that being the Semaphore.

Semaphore is just a counter associated with a data structure.

Following attributes can be associated to describe a semaphore:
1. Has a counter
2. Has a list of all the tasks sleeping while waiting on this semaphore
3. Two APIs to handle this "semaphore" lock - up( ) & down( )

Normally a semaphore will have a default count value of 1 initialized when created.
The down( ) API is called whenever a thread that wants to gain access to the data structure to lock the access rights to itself. The down() will decrement the value of the semaphore count by 1 atomically and then check if the count value becomes negative. If not, it gets access to the data structure and continue executing. Mean while if another process comes down to access the same semaphore and calls down(), down API finds that the count value now turns negative, it is placed in the semaphore's wait list and moved to sleep state, while making a call to the scheduler to schedule a new process. 

Now, later when the first process finishes its access to the data structure, it will now decide to release the semaphore by calling the up(). This API would increment the count and checks if there are any pending processes in the waiting list and then wake up one of the process in the list and reschedules it for execution.

Spin Locks: 

Spin locks are normally used in interrupt handlers and other similar situations where it is not allowed to sleep. In case of a spin lock, if a process tries to gain the spin lock and fails, rather than sleeping, it would enter into a tight loop continuously checking for the spinlock availability. Thus, we can note that this kind of lock can hang an uniprocessor system and hence helpful only for a multi processor system. 

A spin lock is preferred over a semaphore under situations where we are not allowed to sleep as well as situations where it is much efficient to just loop continuously waiting for the lock than to bear the overhead of moving process to a list and rescheduling and bringing back in a later situation, as done by the semaphore.


UNDERSTANDING THE MACRO __INIT AND __EXIT IN LINUX KERNEL

It is always confusing for people using the macros __init and __exit as to how they can help in kernel memory management.

Recently, I tried to explain the same in one of the mailing list regarding the same, where one of the user wanted to know why these macros are applicable only when they are built as part of the kernel and how freeing memory for a built-in module is truly important as compared to releasing memory in case of a Loadable Kernel Module (LKM).

Here is an excerpt of the same discussion:


Hi Amar,

On Thu, Aug 16, 2012 at 1:08 PM, Amarnath Revanna
<amarnath.revanna@gmail.com> wrote:

>
> On the other hand, any other kernel module that you load using insmod or
> modprobe comes after this stage, wherein the kernel was already booted, and
> hence, no memory area of __init will ever be freed.
>
Modules are loaded with vmalloc, right?

Could you explain why the kernel can't free those __init symbols
from memory also in this case?

Thanks,
Ezequiel.
Hi Ezequiel,
When we look at the definition of __init & __initdata in  http://lxr.free-electrons.
com/source/include/linux/init.
h#L44,

we can notice that the functions represented by __init and any data represented by __initdata are going to be placed 
in a separate section of the final kernel binary image (zImage/uImage/vmlinux)  by the linker.

This section is going to be called the .init section.

The idea behind forming this separate .init section in the final kernel image is to hold all those functions and data structures 
that are going to be required only once during initialization, together.
By doing so, the kernel, once it boots up, would have already utilized all these resources once during bootup sequence and 
hence, can now be released from the memory. As a result, the kernel would simply discard this entire ".init" section from the 
RAM in one go, there by freeing the memory. The amount of memory being freed by removing this section is thus printed in 
the line:


" [1.011596] Freeing unused kernel memory: 664k freed "

Now, when we think about loadable modules, as you rightly said, are loaded into the kernel memory by obtaining the memory 
area from the heap using vmalloc. The interesting thing about this is that, since we are going to load only one module 
within this vmalloc'd area, we can normally expect the size of __initdata and __init function to be pretty small, in few bytes. 
Now, it becomes too difficult for the kernel to manage (keep track of and free) these smaller memory areas coming up from 
every individual loaded module.

Another thing to add is that, in case of freeing up an entire .init section from the RAM, we are recovering the entire .init 
section size'd _contiguous_ physical memory area back to the kernel. However, in case of Loaded Kernel Module (LKM) if we 
try to free up the __init memory of an LKM that was allocated using vmalloc, we may only be freeing up few bytes of memory 
that was virtually contiguous. This may not be of much significance for the kernel operation as compared to its overhead 
involved with managing a process to keep track of and freeing up all these __init memory in vmalloc area.

In short, its kind of a nice trade off done to leave this __init data cleanup for LKM while keeping its significance for all built in 
drivers/modules.

Regards,
-Amar