strugee.net

Posts from March 2017

Getting on board with configuration management

For a long while I've really disliked configuration management. This mostly stemmed from my experience managing Apache via Puppet, which I found indirect and unnecessary - the only reason I did this was basically to get version control. In fact, I even started a project called bindslash which I literally described as "not configuration management".

However, last Thursday, steevie (my primary server) crashed again. So I went into a fallback DigitalOcean VM I'd set up the last time this had happened and updated stuff. I presented my LibrePlanet slides from that. And eventually I bit the bullet and set up a secondary email server which, to my great surprise, has not received a flood of spam yet (though I'm sure it will at some point).

The whole ordeal really made me understand the benefit of configuration management. I would've spent less time and been less stressed if I could just plug in a config management system to get a useful failover system. So as of today, I'm on board with configuration management, and bindslash is dead.

I still kinda hate Puppet, so I think I'll try out Ansible and maybe Chef. Ansible's agentless model in particular probably makes a lot of sense for my needs. It also makes me sad to kill bindslash, since I still think it would be a useful project and there's definitely a place for it in the world. But I no longer have any reason to work on it, so I'm just going to stop pretending I'll ever finish it. If anyone is interested in that approach, talk to me and I'll happily give you the name, the repo, my thoughts on its design, etc.

Anyway. Now to set up outbound mail on the failover VM.

*big sigh*


RC week 11

This is week 11 of being at the Recurse Center.

Day 7

Arrived ~13:30, departed ~22:45, total time at RC 9h15m.

Brought in cookies and spent a lot of time discussing things on Zulip, honestly. Also figured out when I'm teaching things for Web Dev 101. Spent most of the day, however, writing about default-secure systems.

Day 8

Arrived ~16:15, departed ~21:00, total time at RC 4h45m.

Arrived super late today because I woke up in the morning, saw the blizzard outside my window, and said "hell no." When I did go in, the blizzard was still going, so I put on snow pants, snow boots, a thick sweater and my winter jacket, and my hat and gloves. Which took a while.

Spent some time discussing 10 different reasons not to put arsenic in the milk with Jackie and Andrew, which she later wrote a blog post about here. Spent the rest of the day doing maintenance on some Stratic modules. Also spent a little time on the phone with my dad diagnosing my server's disk problems. I had him reseat the SATA cables, with no effect, unfortunately.

Day 9

Arrived ~12:15, departed ~23:25, total time at RC 11h10m.

Didn't do a lot of new stuff, but took care of some miscellaneous business. In particular I did some work on polishing hubot-seen, including putting out a 1.0.0 release (and then a 1.0.1 release when that turned out to have broken the world). In the evening, finally found the bug preventing us from upgrading pump.io to Express 4.x, which was unbelievably satisfying. Turned out that the bug was in some test code, not the application itself, and so I had been misreading the stack trace for months. (The top item was a test file, but I incorrectly assumed that that was just some test code invoking a bunch of internal stuff. Nope.) Also has a meeting with some of the other Winter 1s about feelings and the end of the batch and stuff.

Last but not least, caught up on weekly blogging. (I've been really bad for a while now...)

Day 10

Arrived ~11:45, departed ~00:00, total time at RC 12h15m.

Did a ton of pump.io work. I merged the Express 4.x branch, then went in and did a bunch of other more minor dependency upgrades. Also went to a soldering workshop run by Claire immediately before presenting HTTPS Part 2 in the afternoon.

In the evening, went to presentations and then game night, where I spent the entire time playing poker. I put in $3 and got back $7.95 (read: $8). Sick.

Friday

Arrived ~13:30, departed ~21:45, total time at RC 8h15m. As always, Friday doesn't count as a day because RC is technically not in session.

Did some work on improving pump.io's dependency situation. The result is a system that is very, very close to being 100% up-to-date, which I'm super proud of to be honest. That also let me turn on dependency security monitoring through Node Security Platform, with plans to enable Greenkeeper as well. Had the monthly meeting in the middle of all that which went really well and was very productive.

In the evening, went to Bottle Share, then put pump.io on the Linux Foundation's best practices badge app, the result of which you can find here.

Executive summary

Well, I got a lot done on pump.io (and Stratic) this week. So it was relatively productive. But I did basically nothing that was personally productive (i.e. that pushed me forward as a programmer). Hopefully next week will go better.

Total time at RC 45 hours 40 minutes; cumulative time 510 hours 30 minutes.


Express 4.x in pump.io core

So I thought I'd take a moment to announce that the upgrade from Express 2.x to Express 4.x is finally complete! I fixed up the last couple test failures last Wednesday, and the branch got merged on Thursday.

A long time coming

Believe it or not, the work to do this upgrade started almost an entire year ago. Express 2.x has been outdated and unmaintained for a long time now, so upgrading has been a high priority. However, it wasn't as simple as adjusting a version number - there were a staggering number of changes that needed to be made due to Express deprecating, removing, and changing things around. One of the most significant problems was the fact that the old template system that we used, utml, was not compatible with Express 3.x and above. That meant that we had to rewrite every single template into a modern language - an effort that resulted in over a thousand lines changed!

However, the time for Express 4.x has finally arrived. With that and some other trivial version bumps, I'm proud to announce that pump.io is fully up-to-date in terms of dependencies with only three, non-critical exceptions. Whooooo!

Immediate benefits

There are a lot of reasons this is immediately awesome:

  1. Express 4.x fixes significant performance problems that existed in Express 3.x
  2. Relatedly, Express 4.x fixes some security problems present in 3.x
  3. The fact that our dependencies are finally up-to-date means that we can (and do!) now make use of Greenkeeper and the Node Security Platform to automatically track dependencies to make sure they're up-to-date and not introducing security vulnerabilities

That last one is particularly significant. Greenkeeper and NSP will continuously monitor the project's dependencies and automate away a lot of the pain that's associated with keeping pump.io up-to-date. Everyone will get a more secure and stable codebase because of this setup.

Looking forward

The Express 4.x upgrade is a big change, and it's definitely possible that stuff has broken. We want to make sure that breakage doesn't make it into production. This change went into pump.io 4.0, which will go through our normal release cycle. That means it'll be in beta for a month before being released. As a part of that, Jason Self - who's kind enough to administer Datamost - has agreed to have a test day where Datamost upgrades to the beta for a day, then downgrade it again. This test day will give us much wider exposure than we would've gotten otherwise, which will be incredibly valuable feedback in the effort to identify and fix regressions. We haven't set a date yet, but if you'd like to join Jason in helping us find bugs, please get in touch with the community. We'd love your help.

Beyond the immediate release, though, there's still things to look forward to. Express 4.x gives us a better way to structure routing code, and a refactor to use this structure is planned. There's a lot of room for improvement. But really, the most important benefit is this: technical debt is a far less pressing issue than before. That means that we can shift focus and spend more time fixing user-facing bugs, adding useful features, and generally improving the experience for our users. I couldn't be more excited.


RC week 10

This is week 10 of being at the Recurse Center.

Day 35

Arrived ~13:00, departed ~21:30, total time at RC 8h30m.

Decided (in the morning) to sleep in because honestly, I was just really behind on sleep.

Didn't do a whole lot today. Took care of the monthly pump.io release, but spent most of the day writing Driftless at 1,000 mph. In the evening (i.e. after the Iron Blogger meeting) I spent some time fixing ejabberd's configuration to use the new access control syntax (I'd rewritten the config a while back, but hadn't deployed it yet because it broke logins).

Day 36

Arrived ~10:40, departed ~23:40, total time at RC 13h0m.

Spent basically all day working on realtime.recurse.com. I (mostly) finished up the bits that watch the filesystem and dispatch events (including the "periodic" submission logic), then started in on an automatic update mechanism. I'm pretty pleased with how it's turning out - I think it's pretty elegant. And, it's secure - updates are required to be cryptographically signed by yours truly. Went out to Black Burger with a bunch of people before going to Fat Cat in the evening. Then came back, worked on the updater a little more, and went home.

Day 37

Arrived ~14:40, departed ~23:20, total time at RC 8h40m.

Overslept by accident this morning. Spent a bit of time in the afternoon dealing with email, then focused on realtime.recurse.com - basically I was just working on the autoupater I started yesterday. My Python is definitely improving!

I'm actually really pleased with the updater. It's pretty elegant, I think. Basically whenever the server sees a request coming from the client, it checks the User-Agent header to see if the client's out of date and, if so, sends back an X-Requires-Upgrade header. Upon receiving this header the client will go fetch version information, which it'll use to download and verify an update bundle cryptographically signed by me. Yay for secure updates, and yay for simplicity! (Note that this design basically just reuses the connections the client is already making to the server, so it doesn't have to poll for updates all the time.)

I also spent a couple hours talking with Mikhail, discussing a lot of things - ranging from how Node.js's event loop works to the is keyword in Python to static site generator architecture compared to dynamic site architecture.

Day 38

Arrived ~9:50, departed ~23:00, total time at RC 13h10m.

Woke up, completely naturally, around 7 AM despite going to sleep at 3 AM. This was so surprising - and this is a true story - that I thought I had woken up at 7 PM and missed the entire day, including Security Club, Abstract Salad Factory, and Thursday night presentations. I was really mad, honestly. But then I looked at my watch and realized that I was on 24-hour time but it didn't say "19:00" and also my alarms were in the future and my phone was in 24-hour time too and also Anja on Zulip said "?" when I said I'd slept through Security Club. Despite the overwhelming evidence in the end, though, I still had a weird feeling that it was 7 PM. So that's the story of how, for a good 5 or 10 minutes, I genuinely believed I'd slept through the entire day.

Once I got to RC, I spent the morning finally(!) merging in a bunch of upstream ejabberd.yml config changes to steevie's ejabberd configuration, which got me closer to fixing the awful spam problem I have. Then I went to Abstract Salad Factory, followed by Security Club. Then in the afternoon (and after presentations) I started reading FreeBSD documentation since that's what I'm running my new Tor relay on - as I discovered a couple days ago, my old one apparently got hung during boot and was consuming 100% CPU due to the kernel image being corrupted or something. I chose FreeBSD because a) it's a rock-solid system, b) it's a good opportunity to gain experience with administrating a BSD, and c) it increases the diversity of the Tor network. Also, had a conference call in the afternoon with the EFF and Paul from TA3M Seattle about TA3M Seattle joining the EFF's Electronic Frontier Alliance, which was exciting for everyone.

Friday

Arrived ~13:00, departed ~22:30, total time at RC 9h30m. As always, Friday doesn't count as a day because RC is technically not in session.

Didn't do a whole lot of coding. Spent a while helping Jason debug Datamost's 3.0.0 upgrade (which apparently broke uploads). Attended presentations for the RC Game Jam, then fixed the documentation that caused Jason's problems. Spent a little bit of time polishing the website and README, too.

In the evening, fixed people being banned from ejabberd MUCs, then proceeded to fix my spam problem. Whoooooo! Then I kept working on my Tor relay.

Executive summary

Like any week, this had moments where I wasn't very productive. But overall I think it was pretty good - I made a lot of progress on realtime.recurse.com (and improved my Python in the process), and I made a lot of progress on setting up my Tor relay again (and learned a bunch about FreeBSD in the process). Also, I fixed my ejabberd spam problem. I learned nothing from that, but thank god I did it because the spam problem was honestly awful. The one issue was that I just didn't do a very good job getting up and making it to checkins.

Total time at RC 52 hours 50 minutes; cumulative time 464 hours 50 minutes.


Default-secure systems

So recently I presented on operational security and then started in on the nightmare that is HTTPS deployment. And like I did with language-level security (I still need to write part 2 of that post), I thought to myself, this is so difficult. Why isn't there something that will do this for me? That's what my latest project is.

Here's the tl;dr:

type(app)
=> Django/Express.js/etc. app
secure_system(app)
=> Docker image

Or in other words, you'll be able to take an existing web app that you've written, run it through this system, and it will spit out a complete, reasonably-secure system image.

Let's step back.

The status quo

Currently, when a developer wants to run a web app, they can either use something like Heroku, which is fully manged, or a VM from DigitalOcean or Amazon EC2 or something. There are a variety of reasons you might not want to use Heroku, but the only other option is a VM - and with a VM, you get a bare system where you have to set up everything from scratch. Lots of developers just don't have the operational experience to do this properly or securely, but it's not like they can go and get an operations team to do it for them. So they end up with systems that may have active security problems as well as little to no defense-in-depth mitigations for when security inevitably fails. Security is just another operational concern the developer has no time and no expertise to deal with, so it just doesn't happen. The developer spins up a VM, gets it to where it "works" and then moves on. This is not good enough.

I don't want to create a false dichotomy, though: this is not the developer's fault. Everyone has conflicting priorities and it's unreasonable to expect the developer to spend lots of time learning to administrate systems so that they can then spend even more time, you know, administering systems. The problem is that there just isn't enough options available - we have to provide something better.

A middle ground

This is what my project is about: creating a middle ground between fully-managed deployment platforms and barebones, setup-from-scratch VMs.

This project rests on the idea that operational security (at least, in a single-server, single-admin context) flows from consistency, least privilege, and proactive, defense-in-depth security measures. Here are a couple core design goals:

  1. Meet developers where they are. Configuration management like Puppet is a great way to enforce consistency, but it adds a level of indirection and is just another thing that people running hobbyist projects don't have time to learn.
  2. Tight integration with apps - this excludes more obscure types of web applications, but gives us a better footing to set up a solid deployment environment. It also may let us integrate more tightly with things like Content Security Policy in the future.
  3. Support virtual hosting. The ability to run multiple apps while paying for a single VM is a compelling reason people go with VMs over e.g. Heroku - we won't be helping anyone if we leave this out.
  4. Upgrades are optional. Any system image created by this project will present a system that is organized and can be maintained and modified by hand without breaking everything.
  5. Upgrades are possible. Tight app framework integration will aid with putting data into well-known places that can be backed up and migrated to a new image generated by a newer version of this system.
  6. Not designed for "real" production environments. Any project that has a dedicated operations person should not be using this; they should be rolling their own custom environments with something like Puppet. Accordingly, there won't be compromises in security in favor of flexibility - it's designed to cover 75% of cases "pretty well", which is still better than the status quo for smaller projects (almost 100% of cases don't have any security at all).

I'd also like to highlight one really important decision: the output is complete system images. Probably at first this will mean Docker containers but this could easily be turned into VM images. This is a critical part of the design because it allows us to make broad, sweeping changes - for example, preferring system components written in memory-safe languages, replacing OpenSSL with LibreSSL, or creating systemd unit files that lock down service runtime environments to reduce the impact of a compromise. These improvements aren't possible unless we control the whole system. And because upgrades are optional but possible, the developer can get security improvements by "just" upgrading a component that they use, in the same way that they'd upgrade a library or something, as opposed to security being a continuous process they have to worry about. Again, obviously not perfect - but much better than the status quo.

I hope to have a MVP out Real Soon Now™. But in the meantime, if you have thoughts, feel free to reach out.


~