strace

January 15, 2009
Tags: ,

I’m getting lot of posts in tonight so that I won’t feel so bad when I wait till the weekend to write more. :-) Besides, this topic is important enough that I just want to get it out there.

Strace is one of the single most important troubleshooting tools you will ever use. And I say that without a trace of hyperbole. A few years ago I had a client who had been doing UNIX Administration for a very long time – longer, I think, than I had. However, it seemed like every time he called me with a problem that he couldn’t solve, I’d have it fixed in a few minutes. Finally, with amazement in his voice, he asked me what my secret was. And I showed him how to use strace.

Strace is, simply put, a way to “spy” on programs to see how they are interacting with the kernel. They do this by attaching to the process using a special debug interface, and watching which system calls are called (and with which arguments), and what the return codes are. In many cases, if you know which system call is returning an error, and which arguments it is passed, figuring out what needs to be done to fix the problem becomes almost trivial.

Let’s do an example.

Here’s an example of an strace.

-bash-3.2$ touch blah/blah
touch: cannot touch `blah/blah’: No such file or directory
-bash-3.2$ strace touch blah/blah
execve(“/bin/touch”, ["touch", "blah/blah"], [/* 21 vars */]) = 0
brk(0) = 0×9f3a000
access(“/etc/ld.so.preload”, R_OK) = -1 ENOENT (No such file or directory)
open(“/etc/ld.so.cache”, O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=44525, …}) = 0
mmap2(NULL, 44525, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7f0f000
close(3) = 0
open(“/lib/librt.so.1″, O_RDONLY) = 3
read(3, “\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\2008\252\0004\0\0\0″…, 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=44060, …}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f0e000
mmap2(0xaa2000, 33324, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xaa2000
mmap2(0xaa9000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0×6) = 0xaa9000
close(3) = 0
open(“/lib/libc.so.6″, O_RDONLY) = 3
read(3, “\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0000?\213\0004\0\0\0″…, 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1602320, …}) = 0
mmap2(0×89e000, 1320356, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0×89e000
mmap2(0×9db000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0×13d) = 0×9db000
mmap2(0×9de000, 9636, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0×9de000
close(3) = 0
open(“/lib/libpthread.so.0″, O_RDONLY) = 3
read(3, “\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\360g\241\0004\0\0\0″…, 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=125744, …}) = 0
mmap2(0xa12000, 90592, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xa12000
mmap2(0xa25000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0×12) = 0xa25000
mmap2(0xa27000, 4576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xa27000
close(3) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f0d000
set_thread_area({entry_number:-1 -> 6, base_addr:0xb7f0d6c0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
mprotect(0xaa9000, 4096, PROT_READ) = 0
mprotect(0×9db000, 8192, PROT_READ) = 0
mprotect(0xa25000, 4096, PROT_READ) = 0
mprotect(0×895000, 4096, PROT_READ) = 0
munmap(0xb7f0f000, 44525) = 0
set_tid_address(0xb7f0d708) = 31920
set_robust_list(0xb7f0d710, 0xc) = 0
rt_sigaction(SIGRTMIN, {0xa163d0, [], SA_SIGINFO}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {0xa162e0, [], SA_RESTART|SA_SIGINFO}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=10240*1024, rlim_max=RLIM_INFINITY}) = 0
uname({sys=”Linux”, node=”katesama.duskglow.com”, …}) = 0
brk(0) = 0×9f3a000
brk(0×9f5b000) = 0×9f5b000
open(“/usr/lib/locale/locale-archive”, O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=56471120, …}) = 0
mmap2(NULL, 2097152, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7d0d000
close(3) = 0
close(0) = 0
open(“blah/blah”, O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK|O_LARGEFILE, 0666) = -1 ENOENT (No such file or directory)
futimesat(AT_FDCWD, “blah/blah”, NULL) = -1 ENOENT (No such file or directory)
open(“/usr/share/locale/locale.alias”, O_RDONLY) = 0
fstat64(0, {st_mode=S_IFREG|0644, st_size=2528, …}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f19000
read(0, “# Locale name alias data base.\n#”…, 4096) = 2528
read(0, “”, 4096) = 0
close(0) = 0
munmap(0xb7f19000, 4096) = 0
open(“/usr/share/locale/en_US.UTF-8/LC_MESSAGES/coreutils.mo”, O_RDONLY) = -1 ENOENT (No such file or directory)
open(“/usr/share/locale/en_US.utf8/LC_MESSAGES/coreutils.mo”, O_RDONLY) = -1 ENOENT (No such file or directory)
open(“/usr/share/locale/en_US/LC_MESSAGES/coreutils.mo”, O_RDONLY) = -1 ENOENT (No such file or directory)
open(“/usr/share/locale/en.UTF-8/LC_MESSAGES/coreutils.mo”, O_RDONLY) = -1 ENOENT (No such file or directory)
open(“/usr/share/locale/en.utf8/LC_MESSAGES/coreutils.mo”, O_RDONLY) = -1 ENOENT (No such file or directory)
open(“/usr/share/locale/en/LC_MESSAGES/coreutils.mo”, O_RDONLY) = -1 ENOENT (No such file or directory)
write(2, “touch: “, 7touch: ) = 7
write(2, “cannot touch `blah/blah\’”, 24cannot touch `blah/blah’) = 24
open(“/usr/share/locale/en_US.UTF-8/LC_MESSAGES/libc.mo”, O_RDONLY) = -1 ENOENT (No such file or directory)
open(“/usr/share/locale/en_US.utf8/LC_MESSAGES/libc.mo”, O_RDONLY) = -1 ENOENT (No such file or directory)
open(“/usr/share/locale/en_US/LC_MESSAGES/libc.mo”, O_RDONLY) = -1 ENOENT (No such file or directory)
open(“/usr/share/locale/en.UTF-8/LC_MESSAGES/libc.mo”, O_RDONLY) = -1 ENOENT (No such file or directory)
open(“/usr/share/locale/en.utf8/LC_MESSAGES/libc.mo”, O_RDONLY) = -1 ENOENT (No such file or directory)
open(“/usr/share/locale/en/LC_MESSAGES/libc.mo”, O_RDONLY) = -1 ENOENT (No such file or directory)
write(2, “: No such file or directory”, 27: No such file or directory) = 27
write(2, “\n”, 1
) = 1
close(1) = 0
exit_group(1) = ?
-bash-3.2$

I’ve bolded the important part. Note the open() system call – where it tried to open blah/blah (the directory blah doesn’t exist, so it fails) and the return code – ENOENT. ENOENT is a system error code (you can find a nearly comprehensive list of them in /usr/include/asm-generic/errno-base.h and /usr/include/asm-generic/errno.h (or you can do a man errno if you don’t care what the actual numbers are)). It means, literally, “No such file or directory” (that code gets translated into something human readable in the “touch” program.)

But you are literally seeing what you’ve asked the kernel to do, how it responded, and why it said it couldn’t do it.

Now for my example, that’s pretty simple. But what about programs that don’t log well, or at all? What about programs that spit out generic errors that you don’t understand and mean nothing to you? That’s where strace becomes valuable.

Strace has a few useful options.

-f tells strace to follow forks. Many processes will spawn children – by default strace won’t attach to them. It will with these options.

-s <num> tells strace to print out more (or less, but you probably won’t use it that way) information than it does normally.

-e <system call> allows you to specify one or more system calls that you want strace to only print, ignoring the rest. You can see how verbose strace is, so this will help you in some cases to pare through the garbage to find what you need.

Strace is not a panacea, though. There are two circumstances where it’s not advised. The first is when you try to run an suid program. Strace, for security reasons, doesn’t like it when you do that. The other, is when the program you are running is time critical. Strace allows most programs to run at an acceptable speed, but it does noticably slow them down.

Also, sometimes when you quit strace, a program will continue to think it’s in attached state and halt entirely. You can fix this by killing the program with the -CONT signal.

This is one tool you simply cannot afford to leave out of your bag of tricks. Strace. Don’t have root without it.

  • Share/Bookmark

One Response to “strace”

  1. Thanks. Nice intro for me.

Leave a Reply