strugee.net

Posts categorized as "pump.io"

Zero-downtime restarts have landed

I'm thrilled to announce that zero-downtime restarts, which I've been hacking on for the past week or two, have just landed in pump.io master!

Zero-downtime restarts require at least two cluster workers and MongoDB as a Databank driver (we'll eventually relax the latter requirement as we continue to test the feature). Here's how it works:

  1. An administrator sends SIGUSR2 to the master pump.io process (note that SIGUSR1 is reserved by Node.js)
  2. The master process builds a queue of worker processes that need to be restarted
  3. The master process picks a random worker from the queue and sends it a signal asking it to gracefully shut down
  4. The worker process shuts down its HTTP server, which causes it to stop accepting new connections - it will do the same for the bounce server, if applicable
  5. The worker shuts down its database connection once the HTTP server is completely shut down, meaning that it's done servicing in-flight requests
  6. The worker closes its connection with the master process and Node.js automatically terminates due to there being no listeners on the event loop
  7. The master recognizes the death of the worker process, replaces it, waits for the new worker to signal that it's listening for connections, and repeats from step 3 until the queue is empty

This works because only one worker is shut down at a time, allowing the other workers to continue servicing requests while the one worker is restarted. We wait until the new worker actually signals it's ready to process requests before beginning the process for another worker.

Such a feature requires careful error handling, so there are a lot of built-in checks to prevent administrators from shooting themselves in the foot:

  • If there's a restart already in progress, SIGUSR2 is ignored
  • If there's only 1 cluster worker, the restart request is refused (because there would be downtime and you should just restart the master)
  • The master process will load a magic number from the new code and compare it with the old magic number loaded when the master process started - if they don't match, SIGUSR2 will be refused. This number will be incremented for things that would make zero-downtime restarts cause problems, for example:
    • The logic in the master process itself changing
    • Cross-process logic changing, such that a new worker communicating with old workers would cause problems
    • Database changes
  • If a worker process doesn't shut itself down within 30 seconds, it will be killed
  • If a zero-downtime restart fails for any reason, the master process will refuse SIGUSR2 and will not respawn any more cluster workers, even if they crash - this is because something must have gone seriously wrong, either with the master, the workers, or the new code, and it's better to just restart everything. Currently this condition occurs when:
    • A new worker died directly after being spawned (e.g. from invalid JSON in pump.io.json)
    • A new worker signaled that it couldn't bind to the appropriate ports

While these checks do a lot to catch problems, they're not a silver bullet, and we strongly recommend that administrators watch their logs as they trigger restarts. However, this is still a huge win for the admin experience - the most exciting part of this for me is that it's the first step we need to take towards having fully automatic updates, which has been a dream of mine for a long while now.

Admins running from git master can start experimenting with this feature today, and it will be released during the next release cycle - i.e. with the 5.1 beta and stable, not the current 5.0 beta. Since this is highly experimental, we want this to have as much time for testing as possible. You can also check out the official documentation on this feature.

I hope people enjoy this! And as always, feel free to report any bugs.


pump.io 5.0 beta released

I'm excited to announce that pump.io 5.0.0 is now officially in beta!

This is another big release and makes a wide variety of improvements. Here are some highlights from the changelog:

  • More complete documentation
  • Small improvements to the administrator experience
  • A better web UI, including some user experience polishing as well as an upgrade to more performant and better-licensed libraries
  • A fix for crashes related to "login with remote account" (although this one was backported in 4.1.1)
  • Significant security improvements in the systemd service shipped with the package
  • Lots of internal refactoring and simplification made possible by dropping Node 0.10/0.12 support

Many of these changes - particularly the systemd changes and the fact that (as previously announced) Node 0.10 and 0.12 are no longer supported - will require administrator intervention. Be sure to read our upgrade guide for details on how to deal with these changes.

All of these features add up to make pump.io 5.0 beta the most stable and secure release yet. As always, it will go through our beta period for about a month before being released as a fully stable version. If you try it out, the community would love to hear about it - and be sure to report any bugs you encounter!


How I accidentally started maintaining a social network with thousands of users

As some of my readers (particularly Recursers) know, a couple of weeks ago I became an Invited Expert at the Social Working Group at the W3C (World Wide Web Consortium). The W3C is a standards body. That means it's responsible for defining things like how things work on the web, such as how web pages are styled using CSS and how web developers can protect their apps from security vulnerabilities using Content Security Policy.

My first thought when I got the email that my application had been accepted was, "WHOOOOOOOOO!" It was probably one of the most thrilling moments of my whole life. My second thought was, "how in the world did I get here!?" The truth is, it was almost an accident.

It started when I got involved in the pump.io project. pump.io, for those who haven't heard me talk about this endlessly (e.g. at RC), is a decentralized social network. That means that there can be multiple servers run by different people that are part of the social network, but the users on those servers can interact with each other in just the same way they could if it was just one big centralized server[1]. I first got involved in the pump.io project in August 2015. I was experimenting with different social networking software and decided to deploy pump.io on my server. When I did I realized that pump... well, it didn't work very well. The web UI was kinda basic[2], everything was pretty buggy, and there were a lot of problems with the overall user experience. In fact, I know the exact day I set up pump.io (August 12th) because all throughout the experience I was filing bugs on things needing improvement. It was a shame, I thought, because this software seemed really neat. I thought it had a lot of potential.

After about two weeks it became clear that there was no activity in the upstream pump.io project. So after some deliberation, I ended up forking it (briefly). You can watch this talk around 16:00 to hear me talk about this a bit, though to be honest it's kind of just a footnote in the project's history. In the end Evan Prodromou, pump.io's author, ended up handing off some commit rights to community members.

Well, I thought, that was the end of that. Everything's smooth sailing from here on out! There were some big problems, though: the people who now had commit rights all were involved in other things and, more importantly, none of them knew JavaScript or Node.js! This makes me chuckle to this day, honestly.

So I started triaging issues. When people sent Pull Requests, I'd review them since it seemed like no one else was going to do it. #1114 was, as far as I can tell (or remember), the very first of these "unofficial" PR reviews. I kept going; I even reviewed Menno Vossen's epic PR which fixed all the tests (fixing the tests being a feat which, having tried to start that work myself, I am to this day in awe of and incredibly thankful for). For that last one in particular, you'll note that I merged it, not Chris Webber. At some point in January(?), he asked me in #pump.io on IRC if I'd like write access to the repository, to which I said (paraphrased) "heck yes!" So he made it happen.

I never really intended for that to happen. However, I was the one doing almost all of the work. After a while it just made sense. This is what, among other things, I find so incredible about freedom-respecting software: you can just do things. I didn't ask anyone for permission to do those reviews. I just saw the need for a reviewer, and decided I'd help out.

Fast-forward to today, and I'm now an owner of the pump.io organization on GitHub. I make technical decisions about what to prioritize and what should go into pump.io core. I do a lot of the day-to-day work running the project, too, and setting up technical and policy infrastructure (with a lot of help from the community, of course, plus input from Evan). That, too, just made sense, as did my becoming an Invited Expert - I was pretty deeply engaged with the SocialWG's ActivityPub specification already since it's based on the pump.io protocol, and I was really excited about said protocol being standardized. So I was participating pretty heavily and I think it just made sense to people in the Working Group for me to join. In fact, that also kinda happened by accident. I couldn't get edit access to the W3C wiki so we were speculating in #social on the W3C IRC server that it might be because I wasn't a "W3C member" or something. So some people at W3C were pinging the sysops team, etc., trying to mark me as a "trusted" user when someone - Sandro Hawke, I believe - said, "the other option is for you to just join the Working Group." To which I said, "well, but I'd have to join as an Invited Expert, and I don't think I qualify as an expert." Chris Webber's response? "You're just as much of an expert as me when I joined!"

tl;dr how in the world did I get here? I tried some software and got annoyed at it, so I just kind of "did some stuff" that led to me doing code reviews. That led to me getting involved in the decentralized social web which led to me "doing some more stuff" that got me involved in standards. Then because of that, I tried to edit a wiki and ended up being invited to apply as a W3C Invited Expert.

I mean, what the hell? Honestly. I can't emphasize enough that I didn't plan ANY of this. It just sort of... happened. And that, I think, is what's so cool about the free software community. It isn't about who you are, where you come from, or what your goals are. It's only about, do you show up? Do you show up and do awesome stuff?

I showed up, kind of by accident, and I now run a decentralized social network with thousands of users called pump.io.

What will happen if you show up?

Thanks so much to Anja and Julia for providing feedback on a draft version of this post.

[1]: I really hope this explanation makes sense and if it doesn't, I apologize - I use diagrams to explain this in real life.

[2]: Still is, but that should improve now that the technical debt work I've been focusing on for the past year is now basically done!


pump.io 4.0 in beta

pump.io 4.0.0 is officially in beta! Whooo!

Highlights

This is a positively huge release, and I'm so excited to share it with the community. Some highlights:

  • Express 4.x - I wrote about the significance of this change here, but suffice to say that this significantly improves security, performance, and future maintainability
  • Performance and correctness improvements to the web UI's JavaScript
  • Better administrative experience, including the ability to specify configuration via environment variables
  • Better interoperability with the IndieWeb

Upgrading

The upgrade to Express 4.x and the improvements to configuration loading have the potential to break some existing pump.io installations, although 95% of installs should be completely unaffected. If you want to help test this beta, please set aside extra time as necessary to perform this upgrade - full documentation can be found on ReadTheDocs.

As always, this release will follow our normal release cycle, which means that the stable 4.0.0 release will go out in about a month.

Test days

Due to the complexity of this upgrade, we've decided to have some test days during the beta where we upgrade prominent nodes for a day, then downgrade them again. This will help expose problems earlier and make the upgrade smoother for everyone. So far Jason Self, who runs Datamost, has volunteered for this - if you're interested in joining him, please get in touch!

Happy hacking!


Express 4.x in pump.io core

So I thought I'd take a moment to announce that the upgrade from Express 2.x to Express 4.x is finally complete! I fixed up the last couple test failures last Wednesday, and the branch got merged on Thursday.

A long time coming

Believe it or not, the work to do this upgrade started almost an entire year ago. Express 2.x has been outdated and unmaintained for a long time now, so upgrading has been a high priority. However, it wasn't as simple as adjusting a version number - there were a staggering number of changes that needed to be made due to Express deprecating, removing, and changing things around. One of the most significant problems was the fact that the old template system that we used, utml, was not compatible with Express 3.x and above. That meant that we had to rewrite every single template into a modern language - an effort that resulted in over a thousand lines changed!

However, the time for Express 4.x has finally arrived. With that and some other trivial version bumps, I'm proud to announce that pump.io is fully up-to-date in terms of dependencies with only three, non-critical exceptions. Whooooo!

Immediate benefits

There are a lot of reasons this is immediately awesome:

  1. Express 4.x fixes significant performance problems that existed in Express 3.x
  2. Relatedly, Express 4.x fixes some security problems present in 3.x
  3. The fact that our dependencies are finally up-to-date means that we can (and do!) now make use of Greenkeeper and the Node Security Platform to automatically track dependencies to make sure they're up-to-date and not introducing security vulnerabilities

That last one is particularly significant. Greenkeeper and NSP will continuously monitor the project's dependencies and automate away a lot of the pain that's associated with keeping pump.io up-to-date. Everyone will get a more secure and stable codebase because of this setup.

Looking forward

The Express 4.x upgrade is a big change, and it's definitely possible that stuff has broken. We want to make sure that breakage doesn't make it into production. This change went into pump.io 4.0, which will go through our normal release cycle. That means it'll be in beta for a month before being released. As a part of that, Jason Self - who's kind enough to administer Datamost - has agreed to have a test day where Datamost upgrades to the beta for a day, then downgrade it again. This test day will give us much wider exposure than we would've gotten otherwise, which will be incredibly valuable feedback in the effort to identify and fix regressions. We haven't set a date yet, but if you'd like to join Jason in helping us find bugs, please get in touch with the community. We'd love your help.

Beyond the immediate release, though, there's still things to look forward to. Express 4.x gives us a better way to structure routing code, and a refactor to use this structure is planned. There's a lot of room for improvement. But really, the most important benefit is this: technical debt is a far less pressing issue than before. That means that we can shift focus and spend more time fixing user-facing bugs, adding useful features, and generally improving the experience for our users. I couldn't be more excited.


~