Chaos in the system

Why simple solutions don't work when they should

About me

AKA "who am I this is ridiculous"

AJ, he/they

Security person

filter-other-days author

Linux user (via Qubes OS) but constantly eyeing the BSDs and illumos

Unix philosophy fan

I do not have any answers

About you

Shout it out

This talk in three sentences

Here is this program

It was way too hard to write this program

This program should not exist

What is filter-other-days?

Reliable logfile date filtering

Never incorrectly drops data under any circumstances

Suitable for security and reliability systems like Artificial Ignorance

Core shell script requires only POSIX

The problem

How to find all logs from the current day?

grep for the current day

But, programs can decide how to format dates

Artificial Ignorance

Want to notice unusual things in your system

Can't enumerate everything unusual or interesting

So, throw out things we know are uninteresting

Example

filter-other-days' approach

Find all dates we don't care about and throw them out

Looks a lot like Artificial Ignorance

Never silently drops information

grep -v

Examples

echo '2017-01-01' | filter-other-days
cat daemon.log syslog.log | filter-other-days
filter-other-days < logfile

Other features

Can work on any day on most systems (-d)

Supports multiple locales on most systems

Extremely portable

Well-documented

Single file that can be copied around and used standalone

Portability

We value portability

See systemd

It was way too hard to write filter-other-days

Context: localization support

GitHub bug #17

Make filter-other-days work for non-English languages

Problem: where do those strings come from? locale -k

Problem #1: how to actually test the thing?

Can't just download a system that's pure POSIX

Note: Heirloom

This isn't enough though...

Problem #2: operating system bugs

FreeBSD bug #237752: abmon vs. abmon_1, abmon_2, etc.

FreeBSD bug #241906: locale -k nonexistant exit code

NetBSD bug #54693: abmon vs. abmon_1, abmon_2, etc.

NetBSD bug #54692: locale -k nonexistant exit code

Problem #3: POSIX support straight up missing

OpenBSD does not have locale -k even though you'd think it would

filter-other-days 1.1.0 and 2.0.0 being released simultaneously

Problem #4: POSIX just isn't enough

-d cannot be done on POSIX

Need -d (GNU) or -r (FreeBSD, NetBSD, illumos, etc.)

Conclusion:

A pure-POSIX system wouldn't even be enough

So you need to test on systems you care about anyway

This is super annoying

Not helped by the fact that shell scripts fail at runtime

Yet we value portability.

Disconnect

Values vs. difficulty in reality

This program should not exist

filter-other-days' purpose is very strange

Extremely difficult to explain to someone not used to programming/ops

Root cause analysis

Why did I have this problem in the first place?

I do not understand my computer

This depite the fact that I have root

Root cause analysis cont.

Systems are too complex

Hard to know what's going on at any time

Even harder to know what's going on in the future

Combinatorial explosion

In an ideal world...

filter-other-days would not exist

I would be able to just grep

So, what are our options?

Option 1: do nothing

Perfectly valid

Maybe we don't need these hard guarantees

Tradeoffs

Option 2: reduce expressiveness

Example: HTML

Not Turing-complete

Highly optimizable by the browser

And, useful beyond browsers

Tim Berners-Lee Principle of Least Power

Option 3: constrain the environment

Example: FreeBSD CloudABI

Stop programs from accessing resources not explicitly granted by the administrator

Capabilities

Makes explicit input and output points

Option 4: improve observability

Maybe I just need better tools to introspect my system

E.g. "tell me if some program I don't know about writes to /var/log"

Option 5: rely on some central organizer

Example: Fedora and journald, ish

They take care of this problem for me

Holistic view I don't have

Or run your own: ELK

Option 6: impose order from above

Example: Qubes OS

Security operating system that runs all apps in virtual machines

Compromise in one doesn't spread. Damage is limited

Organizes complexity

Most of these are about managing complexity

Constrain the environment (CloudABI): limit bounds for one application

Rely on some central organizer (Fedora): find someone in a position to make holistic change

Impose order from above (Qubes): limit bounds of different groups of complexity

Discussion

Talk to me!

strugee.net

alex@strugee.net (email & XMPP)

alex@pump.strugee.net

@strugee2 (if you must)

strugee on GitHub

Access this presentation again

https://strugee.net/presentation-chaos-in-the-system