strugee.net

Revisiting my Tor relay

(Okay, so I miserably failed my blog-every-day thing. Shut up. Maybe next time I'll try every week or something... anyway.)

A couple of days ago I logged into the Tor relay I run to show someone the ARM graphs. I had a fair amount of traffic, so the graphs were fairly impressive, but I'm also in the habit of running apt-get update; apt-get upgrade every time I log into a server, so I did that too. To my surprise, I got a message telling me that there was a dependency problem with my kernel! So like the great sysadmin I am, I looked at such a fundamental system problem, shrugged my shoulders, and said, "oh, I should probably fix that". And then logged out.

Well, I did end up fixing it today. And boy, was it an adventure. My first step was to ignore the APT problems and edit my torrc, to reflect a) the fact that I'm not eligible for the AWS Free Tier anymore (so I needed to throttle bandwidth), b) my new email, and c) my new GPG key. With that being done, I knew that I could easily have the system fix dependency problems by doing a simple apt-get install -f. Easy!

Well, no. That tried to install some Linux kernel headers, which seemed all well and good, until I got this:

Unpacking linux-headers-3.2.0-90 (from .../linux-headers-3.2.0-90_3.2.0-90.128_all.deb) ...
dpkg: error processing /var/cache/apt/archives/linux-headers-3.2.0-90_3.2.0-90.128_all.deb (--unpack):
unable to create `/usr/src/linux-headers-3.2.0-90/arch/arm/plat-pxa/include/plat/dma.h.dpkg-new' (while processing `./usr/src/linux-headers-3.2.0-90/arch/arm/plat-pxa/include/plat/dma.h'): No space left on device
No apport report written because the error message indicates a disk full error
dpkg-deb: error: subprocess paste was killed by signal (Broken pipe)

Um, what? How am I out of free space? Okay, whatever. I knew that there were probably a lot of packages cached in /var/cache/apt/, including old, vulnerable packages that had been replaced by the unattended upgrades system. I did an ls, and found only about five .deb files - something must have been automatically cleaning that directory. I was getting a little worried now, but I nuked the files anyway and reran apt-get install -f. Same thing. Well, okay, maybe I didn't get rid of enough stuff? How much did I need?

$ df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda1      4.0G  2.2G  1.6G  59% /

At this point I'm in full-on "something-is-seriously-wrong-and-I-need-to-recover" mode. How was it possible that I had only used 59% of the filesystem, but dpkg was saying my disk was full? A little searching the internet later, I found the culprit:

$ df -i
Filesystem     Inodes  IUsed IFree IUse% Mounted on
/dev/xvda1     262144 257479  4665   99% /
udev            74758    377 74381    1% /dev
tmpfs           76179    259 75920    1% /run
none            76179      3 76176    1% /run/lock
none            76179      1 76178    1% /run/shm

I hadn't run out of disk space. But I had run out of inodes. (Isn't this supposed to happen to other people?)

I tried removing some stuff via APT, but that refused to do anything due to the dependency problems. My next thought was that there were probably a bunch of old processes running that were essentially holding a bunch of inodes hostage. I couldn't install debian-goodies, so I couldn't use checkrestart, but I improvised by looping over all running services in a for loop, and restarting them.

Still nothing.

I'm not proud of what I did next. But I was backed into a corner, so I did something only dpkg is supposed to do. I ran rm -r on a couple directories in /usr/src. And boy, it was like magic. Suddenly apt-get install -f worked like a charm. It started to upgrade a couple packages, rebuilding some GRUB configuration files... and then came to a screeching halt.

Setting up linux-headers-3.2.0-90-virtual (3.2.0-90.128) ...
dpkg: dependency problems prevent configuration of linux-headers-virtual:
linux-headers-virtual depends on linux-headers-3.2.0-68-virtual; however:
Package linux-headers-3.2.0-68-virtual is not installed.
dpkg: error processing linux-headers-virtual (--configure):
dependency problems - leaving unconfigured
No apport report written because the error message indicates its a followup error from a previous failure.
dpkg: dependency problems prevent configuration of linux-virtual:
linux-virtual depends on linux-headers-virtual (= 3.2.0.68.81); however:
Package linux-headers-virtual is not configured yet.
dpkg: error processing linux-virtual (--configure):
dependency problems - leaving unconfigured
No apport report written because the error message indicates its a followup error from a previous failure.
Errors were encountered while processing:
linux-headers-virtual
linux-virtual
E: Sub-process /usr/bin/dpkg returned an error code (1)

Are you kidding?? More errors?

Turns out that APT is essentially the only thing on this system that makes large changes to the filesystem. So the probability that APT would be the program to trigger the inode limit was pretty high. It started an upgrade run, then got interrupted in the middle by the "no space left on device" error, leaving the dependency tree in a state that we in the tech community call "100% totally screwed". (This is the technical term.)

I'll spare you the gory details, but I ended up trying to chase down packages in the Ubuntu archive, running ubuntu-support-status because I was wondering if the packages I was looking for actually weren't in the archive, because they were unsupported, using aptitude instead of apt-get (because aptitude's dependency resolver tends to be better), etc. Finally the solution turned out to be doing dpkg --install on the exact right .debs in the exact right order, which finally satisfied APT's dependency woes, allowed apt-get install -f to fix the configuration problems, and allowed the hundreds of packages which had been waiting for an upgrade to finally install. Whew!

Anyway, I need to upgrade the version of Ubuntu the system is on (currently it's 12.04.5 LTS), because Tor is out of date (among other reasons). However, since that will involve taking the system down for a reboot, I wanted to memorialize the following:

$ uptime
00:01:47 up 392 days, 17:15,  1 user,  load average: 0.05, 0.04, 0.05

Holy moly. This system is bordering on 400 days of uptime. That's over a year of continuous run time! Astonishing.

Wish me luck with this upgrade...

tl;dr: inode limits are killer.


WebMention replies

~