Thursday, May 27, 2010

How do you find out which cpu a process is running on?

The output of the ps command can be changed to a user defined format by using the -o option.

The following command can be used to display the CPU and the processes assigned.

# ps -eo pid,args,psr

Options:

pid - The process ID of the process.

args - Command with all its arguments as a string.

psr - processor currently assigned to the process.

Go to the man pages for more information on the ps command.

What is the logic behind killing processes during an Out of Memory situation?

As per kernel source code, following is OOM-killer logic,

A function called badness() is defined to calculate points for each processes.

* Points will be added to following processes.

Processes with high memory size.
Niced processes.

* Points will be reduced from following processes.

Processes which were running for long time.
Processes which were started by superusers.
Process with direct hardware access.

The process with the highest number of point, will be killed, unless it is already in the midst of freeing up memory on its own.

Then the system will wait for sometime to see if enough memory is freed. If enough memory is not freed after killing one process, the above steps will continue.

As per select_bad_process function, if a processes is having 0 or less points it could not be killed. The oom kills will be continued until there is no candidate processes left to kill. If the system is not able to find a candidate process to kill, it panics.

static unsigned long badness(struct task_struct *p, unsigned long uptime)
{
unsigned long points, cpu_time, run_time, s;

if (!p->mm)
return 0;

if (p->flags & PF_MEMDIE)
return 0;
/*
* The memory size of the process is the basis for the badness.
*/
points = p->mm->total_vm;

/*
* CPU time is in tens of seconds and run time is in thousands
* of seconds. There is no particular reason for this other than
* that it turned out to work very well in practice.
*/
cpu_time = (p->utime + p->stime) >> (SHIFT_HZ + 3);

if (uptime >= p->start_time.tv_sec)
run_time = (uptime - p->start_time.tv_sec) >> 10;
else
run_time = 0;

s = int_sqrt(cpu_time);
if (s)
points /= s;
s = int_sqrt(int_sqrt(run_time));
if (s)
points /= s;

/*
* Niced processes are most likely less important, so double
* their badness points.
*/
*/
if (task_nice(p) > 0)
points *= 2;

/*
* Superuser processes are usually more important, so we make it
* less likely that we kill those.
*/
if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_ADMIN) ||
p->uid == 0 || p->euid == 0)
points /= 4;

/*
* We don't want to kill a process with direct hardware access.
* Not only could that mess up the hardware, but usually users
* tend to only have this flag set on applications they think
* of as important.
*/
if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_RAWIO))
points /= 4;
#ifdef DEBUG
printk(KERN_DEBUG "OOMkill: task %d (%s) got %d points\n",
p->pid, p->comm, points);
#endif
return points;
}

What is kipmi? Why is it taking too much cpu in my Red Hat Enterprise Linux system?

If your IPMI hardware interface does not support interrupts and is a KCS or SMIC interface, the IPMI driver will start a kernel thread for the interface to help speed things up. This is a low-priority kernel thread that constantly polls the IPMI driver while an IPMI operation is in progress.

kipmi is that low-priority kernel thread. If the system does large number of IPMI operations, there is a possibility of kipmi using too much cpu time.

The kipmi thread is a workaround to fix the hardware disability so there is no real fix. Sometimes firmware updates can fix this though.

As a workaround, you can disable kipmi using following steps. This may decrease the speed of IPMI operations though.
* Edit /etc/modprobe.conf and add following entry.

options ipmi_si force_kipmid=0

* Restart the ipmi using following command.

# service ipmi restart