Manav's Tech Notes: Linux Kernel Developer FAQs

1. What is a semaphore?

Semaphores in Linux are sleeping locks. When a task attempts to acquire a semaphore that is already held, the semaphore places the task onto a wait queue and puts the task to sleep. The processor is then free to execute other code. When the processes holding the semaphore release the lock, one of the tasks on the wait queue is awakened so that it can then acquire the semaphore. Listing of the semaphore methods is:

sema_init(struct semaphore *, int)
Initializes the dynamically created semaphore to the given count

init_MUTEX(struct semaphore *)
Initializes the dynamically created semaphore with a count of one

init_MUTEX_LOCKED(struct semaphore *)
Initializes the dynamically created semaphore with a count of zero (so it is initially locked)

down_interruptible(struct semaphore *)
Tries to acquire the given semaphore and enter interruptible sleep if it is contended

down(struct semaphore *)
Tries to acquire the given semaphore and enter uninterruptible sleep if it is contended

down_trylock(struct semaphore *)
Tries to acquire the given semaphore and immediately return nonzero if it is contended

up(struct semaphore *)
Releases the given semaphore and wakes a waiting task, if any

2. What is the difference between semaphore and mutex?

Mutex can be released only by thread that had acquired it, while you can signal semaphore from any other thread (or process), so semaphores are more suitable for some synchronization problems like producer-consumer.

The mutex is similar to the principles of the binary semaphore with one significant difference: the principle of ownership. Ownership is the simple concept that when a task locks (acquires) a mutex only it can unlock (release) it. If a task tries to unlock a mutex it hasn’t locked (thus doesn’t own) then an error condition is encountered and, most importantly, the mutex is not unlocked. If the mutual exclusion object doesn't have ownership then, irrelevant of what it is called, it is not a mutex.

3. Discuss the TCP connection and termination sequence.

To establish a connection, TCP uses a three-way handshake. Before a client attempts to connect with a server, the server must first bind to a port to open it up for connections: this is called a passive open. Once the passive open is established, a client may initiate an active open. To establish a connection, the three-way (or 3-step) handshake occurs:

SYN: The active open is performed by the client sending a SYN to the server. It sets the segment's sequence number to a random value A.
SYN-ACK: In response, the server replies with a SYN-ACK. The acknowledgment number is set to one more than the received sequence number (A + 1), and the sequence number that the server chooses for the packet is another random number, B.
ACK: Finally, the client sends an ACK back to the server. The sequence number is set to the received acknowledgement value i.e. A + 1, and the acknowledgement number is set to one more than the received sequence number i.e. B + 1.

At this point, both the client and server have received an acknowledgment of the connection.

The connection termination phase uses, at most, a four-way handshake, with each side of the connection terminating independently. When an endpoint wishes to stop its half of the connection, it transmits a FIN packet, which the other end acknowledges with an ACK. Therefore, a typical tear-down requires a pair of FIN and ACK segments from each TCP endpoint.

4. What is the relationship between RTP, RTCP and RTSP?

RTP is a transport protocol for the delivery of real-time data,including streaming audio and video. RTCP is a part of RTP and helps with lip synchronization and QOS management, among others. RTSP is a control protocol for initiating and directing delivery of streaming multimedia from media servers, the "Internet VCR remote control protocol".

RTSP does not deliver data, though the RTSP connection may be used to tunnel RTP traffic for ease of use with firewalls and other network devices. RTP and RTSP will likely be used together in many systems, but either protocol can be used without the other. The RTSP specification contains a section on the use of RTP with RTSP.

RTSP does control / signalling, and is _not_ a transport protocol. The streams must be transported using another protocol, such as RTP or HTTP. RTSP is commonly used to help negotiate an audiovideo setup to decide what transport protocol and bit rate etc. is to be used.

5. How does a Linux process internally switches from User mode to Kernel mode?

Nearly all system calls [are] invoked from C programs by calling a library procedure. The library procedure executes a TRAP instruction to switch from user mode to kernel mode and start execution. The mechanism for generating software interrupts is the INT instruction. This is an Intel x86 opcode that interrupts the current program execution, saves the system registers, and then jumps to a specific interrupt handler.

After the handler has finished, the system registers are restored and the execution with the calling program is resumed. The INT instruction thus acts as (sort of) an alternative calling technique. Unlike ordinary procedure calls, which pass their args on the stack, interrupts store any needed args in registers.

An interrupt however, must have any needed arguments loaded into general registers first. The register assignments for the syscall handlers are as follows:

eax -- syscall #
ecx -- number of args (0-16)
edx -- pointer to buffer containing args from first to last

After these registers have been set, interrupt 99 is called. What is the significance of the value 99? None really -- this is simply the interrupt number selected by the kernel for handling syscalls.

Each syscall has an entry point defined by a small assembly language function. Therefore, the syscall interface is an assembly file (called syscalls.S) containing a long list of functions, one for each syscall that has been defined. This file should look like this:

.globl sys_null
.type sys_null,@function
.align 8
sys_null:
 movl  $0, %eax        ; syscall #0
 movl  $0, %ecx        ; no args
 lea   4(%esp), %edx   ; pointer to arg list
 int   $99             ; invoke syscall handler
 ret                   ; return
.globl sys_mount
.type sys_mount,@function
.align 8
sys_mount:
 movl  $1, %eax        ; syscall #1
 movl  $4, %ecx        ; mount takes 4 args
 lea   4(%esp), %edx   ; pointer to arg list
 int   $99             ; invoke syscall handler
 ret                   ; return
. . .

The assignment of system services to syscall numbers is arbitrary. That is, it doesn't really matter which function is syscall #0, syscall #1, syscall #2, etc. so long as everyone is in agreement about the mapping. This mapping is defined in the syscalls.S assembly listing above, and much be matched item-for-item in the C interface header file. For our kernel, the C header is ksyscalls.h which uses an enum to define tags for each syscall:

enum {
    SYSCALL_NULL = 0,
    SYSCALL_MOUNT,
    SYSCALL_UNMOUNT,
    SYSCALL_SYNC,
    SYSCALL_OPEN,
    . . .
};

Exception Types:

fault - the return address points to the instruction that caused the exception. The exception handler may fix the problem and then restart the program, making it look like nothing has happened.
trap - the return address points to the instruction after the one that has just completed.
abort - the return address is not always reliably supplied. A program which causes an abort is never meant to be continued.

The 256 exception handlers that are loaded into the IDT are almost identical. After pushing the specific interrupt number, they all implement the same code sequence:

save all registers (including system registers)
call i386_handle_trap
restore all registers previously saved
return

Because of this, the assembly file that defines these handlers, arch_interrupts.S, is also written largely as a collection of #define macros.The function i386_handle_trap() serves as the master exception handler. As such, it handles all system interrupts, not just syscalls. However, we're interested specifically in the section that deals with interrupt 99, the syscalls handler.

The syscalls are handled in the case of the interrupt number 99. Again, there's no particular significance to the number 99. The Intel documentation allows for interrupt numbers 32-255 to be used freely by the OS for whatever purpose.

The highlights of the code are:

thread_atkernel_entry() is called upon entering kernel mode
the number of args (in ecx) and the argv address (in edx) are checked for validity (e.g. a kernel address is bad since syscalls are only intended for user apps)
user_memcpy is called to copy the args from the user stack to kernel memory
if all went well, the syscall dispatcher is called, passing the syscall # (stored in eax)
a 64-bit error code in returned in the [eax,edx] pair
thread_atkernel_exit() is called as kernel mode is exited.

The routine syscall_dispatcher() is a core kernel function that finally binds the syscall numbers to their corresponding internal implementations. Here is a snippet of the syscalls.c file that contains the dispatcher:

int syscall_dispatcher(unsigned long call_num, void *arg_buffer, uint64 *call_ret)
{
    switch(call_num) {
        case SYSCALL_NULL:
            *call_ret = 0;
            break;
        case SYSCALL_MOUNT:
            *call_ret = user_mount((const char *)arg0, (const char *)arg1,
                                   (const char *)arg2, (void *)arg3);
            break;
        case SYSCALL_UNMOUNT:
            *call_ret = user_unmount((const char *)arg0);
            break;
        case SYSCALL_SYNC:
            *call_ret = user_sync();
            break;
  
        . . .
  
    }
    return INT_RESCHEDULE;
}

7. Describe ways to reduce interrupt latency.

Sound programming techniques coupled with proper RTOS interrupt architecture can ensure the minimal response time. The recipe:

1. Keep ISRs simple and short.
2. Do not disable interrupts.
3.Avoid instructions that increase latency.
4. Avoid improper use of operating system API calls in ISRs.
5. Properly prioritize interrupts relative to threads.

8.  What is top-half and bottom-half processing?

Often a substantial amount of work must be done in response to a device interrupt, but interrupt handlers need to finish up quickly and not keep interrupts blocked for long. Linux (along with many other systems) resolves this problem by splitting the interrupt handler into two halves.

The so-called top half is the routine that actually responds to the interrupts, the one you register with request_irq. The bottom half is a routine that is scheduled by the top half to be executed later, at a safer time.

The big difference between the top-half handler and the bottom half is that all interrupts are enabled during execution of the bottom half, that's why it runs at a safer time. In the typical scenario, the top half saves device data to a device-specific buffer, schedules its bottom half, and exits: this operation is very fast.

The bottom half then performs whatever other work is required, such as awakening processes, starting up another I/O operation, and so on. This setup permits the top half to service a new interrupt while the bottom half is still working.

9. What are the different mechanisms to implement bottom halves in ISRs?

The Linux kernel has two different mechanisms that may be used to implement bottom-half processing. Tasklets are often the preferred mechanism for bottom-half processing; they are very fast, but all tasklet code must be atomic. The alternative to tasklets is workqueues, which may have a higher latency but that are allowed to sleep.

Tasklets must be declared with the DECLARE_TASKLET macro:

DECLARE_TASKLET(name, function, data);

name is the name to be given to the tasklet, function is the function that is called to execute the tasklet (it takes one unsigned long argument and returns void), and data is an unsigned long value to be passed to the tasklet function.

The function tasklet_schedule is used to schedule a tasklet for running.

Workqueues invoke a function at some future time in the context of a special worker process. Since the workqueue function runs in process context, it can sleep if need be. You cannot, however, copy data into user space from a workqueue, unless you use the advanced techniques. The worker process does not have access to any other process's address space.

a work_struct structure, is declared and initialized with the following:

static struct work_struct short_wq;

    /* this line is in short_init(  ) */
    INIT_WORK(&short_wq, (void (*)(void *)) short_do_tasklet, NULL);

Call schedule_work to arrange the bottom-half processing.

/* Queue the bh. Don't worry about multiple enqueueing */
    schedule_work(&short_wq);

Manav's Tech Notes

Thursday, October 28, 2010

Linux Kernel Developer FAQs

No comments:

Post a Comment