Monday, December 23, 2013

Exploiting the /proc filesystem

It has been quite a while since I have written my last article. After pondering what to write for a couple of days, I have decided to share a couple of /proc filesystem tricks. Here is a list of real-life scenarios that illustrate how you can exploit this filesystem:

Environment Variables

Let's start with a simple example. You have a Java process that is misbehaving and you would like to make sure that the JAVA_HOME environment variable for the process context is pointing to the correct JDK installed on the system. Just cat the /proc/[pid]/environ file to view the list of environment variables for the process.

File Descriptors

To get the list of file descriptors opened by a process run an ls command for /proc/[pid]/fd. This directory would list the file descriptors as symbolic links to files or sockets with inode numbers. This is a particularly useful trick in case you are dealing with a long strace output and you have a system call with an associated file descriptor. If the file descriptor is a socket, you can gather even more information by grepping for the inode number in /proc/[pid]/net/tcp, /proc/[pid]/net/udp, or /proc/[pid]/net/unix depending on the socket type.

Swap Space

You are dealing with a server that has started running a process consuming large amounts of memory and the server has swapped some data from memory. It is not big deal for Linux if swapping is not continuous; however, it is a problem from a monitoring perspective because Linux is not going to move that data back into memory. One trick to use the swapoff command to force the data back into memory. If there is not enough space on the other hand, you have to force drop the page caches from memory by running "echo 1 > /proc/sys/vm/drop_caches" and then do a swapoff and a subsequent swapon to free up the swap space.

Connection Tracking

If you ever run into "nf_conntrack: table full, dropping packet" errors... You are using iptables and the kernel is hitting a limit on the number of connections that can be tracked. This is controlled by the net.netfilter.nf_conntrack_max system variable which defaults to 65536. You can either try to increase this default value or cat the /proc/net/nf_conntrack file to figure out what type of connection constitutes the majority of tracked connections and update your iptables rules accordingly to not track them - see the raw table with the NOTRACK option in the iptables manual.

Child Threads

You have a MySQL instance running with a very large connection limit and you have a slave I/O thread that is misbehaving. The MySQL instance is lagging behind yet there seems to be no apparent problem. The next idea you come up with is to strace the I/O thread; however, the "show processlist" output is showing connection ids not Linux process ids. Luckily, there is a solution. If you have persistent database connections, stop the I/O thread and start it again. Then, run an "ls -al" in /proc/[pid]/task which will in turn show you a list of child pids with creation dates. Because you have restarted your I/O thread, the most recently created one should be the I/O thread. You can also verify this by comparing the task creation date to the "Time" column for the I/O thread in the "show processlist" output. If you don't have persistent connections, identifying the correct thread is a bit more tricky. Simply restart the I/O and SQL threads at different times, wait for a while, and then identify the correct thread by comparing the "Time" column from the "show processlist" output again.

Process Limits

Every process has to run within certain limits set by the kernel. One of these limits is the number of child processes that can be spawned by a parent process. For example, when we started running into MySQL connection errors, the underlying problem turned out to be the connection limit (each MySQL connection is served by a child thread) being more than what the kernel allowed for. These limits can be different for each process and can be viewed by catting the /proc/[pid]/limits file. And the best part? You can change the limits dynamically by echoing to this file instead of restarting that production MySQL instance.

No comments:

Post a Comment