The three pillars of observability are bullshit.
There, I said it.
Normally I'm a lot more nuanced about things.
Normally I don't swear.
But today?
I've got a bit more time for these things.
The idea that you'll be able to understand your app if you implement “the three pillars”?
Ridiculous.
It's also a fiction that plays well into the hands of certain vendors.
Three pillars?
What about three chances to charge your company for the same data?
Ever seen an invoice from an observability vendor and had that sinking feeling in your stomach?
“What? How much?"
You look down the itemised bill…
Logs… OK… expensive but… I guess…
Traces… OK… per host costs… I see… hmm…
Metrics… holy cow… OK…
Then you add it all up.
It comes to an eye popping number.
And you’re spending it month after month.
Not one time. Every. Single. Month.
Five figures. Six figures.
That's not even the worst of it.
There's not even three pillars any more.
What about the whole plethora of other "pillars" that never get mentioned?
RUM? More money.
CI? More money.
Incident management? More money.
Error reporting? More money.
And yes, these things all can link between one another.
But honestly, having used these links? They're not that smart.
They require you to add some obscure tag into every log and ever metric and every trace and every RUM event and...
...JUST STOP.
What the three pillars are...
...is an expensive set of SILOs.
And that makes sense. The clue is in the name. It's a pillar.
The three pillars don't really communicate with each other.
Not pillars. Data types.
These are three data types.
They're the raw ingredients of observability.
And that's not all - I'd argue they can all be derived from a single data type
And that's traces.
What can we do instead?
Well...
For any observability data to be useful, it has to have these following attributes:
Event
Wide
Context aware
Here’s more information:
1. Event
An event is great.
You can aggregate events in any way you would like.
Events can have a name.
Each event can have many attributes.
You can count the uniques of any attribute of an event.
You can group by attributes.
Events are awesome.
2. Wide
The event must have a ton of attributes. The more the better.
Because ultimately at 3am when your site is down and you have to get it back up...
...when it looks like a hacker has broken into your system and you need to assess the damage...
...when you've been on a bug for 2 hours and you still can't figure it out...
...you couldn't have anticipated what you needed 4 months ago.
The most useful attributes? They're "high cardinality".
Huh? High cardinality attributes are IDs.
Git commit SHAs.
Host IDs.
Invoice IDs, Customer IDs, Event IDs. Request IDs. User IDs.
You get the idea.
Anything that will allow you to uniquely search across events for events related to a specific object.
But also we need more than just the IDs.
We need the relevant data at the time the event was issued.
Sending a business event? Let's include all the attributes from that event, right? Right.
Is there an error you want to create an event for that details something's gone wrong with the invoice?
Is the status of the invoice worth including in the event? Yup, you betcha.
Throw it all in there.
3. Context aware
Unless you've spent as long as I have splunking through logs, it's difficult to overestimate the importance of context.
What was going on when this event was triggered?
If you want to understand contexts?
Think about a stack trace.
A stack trace is context.
It’s very basic context - just an array of line numbers
And that’s why a stack trace is almost never enough to go off.
In fact, stack traces nowadays look like positively ancient.
They're an observability tool from a much simpler time when codebases were much smaller and distributed apps weren't a thing.
What we really want is nested contexts.
Nested Contexts
Take an example.
First, an HTTP endpoint was hit.
Then we went out to an API and made a request.
That came back within 0.1 seconds.
Then we enqueued 4 background jobs.
The first one took 13 seconds to complete.
Whilst that one was running, the second was also triggered.
And the error happened within that second background job.
This allows us to join the dots - to see what the system was doing whilst this error occured.
Now let's see which data types give us all those qualities.
Metrics? They give us none of these things.
They just say "hey, this number was X at this time".
Logs? They give us an event. Cool. They give us wide events. Yup.
However… context? They're pretty weak.
Building in context is possible and that's what we do at BiggerPockets - we have attributes that store which request was going on when the event was triggered.
Problem is, the context nesting is very limited.
Since you only have one event, you either have to create multiple events (which duplicates logs) or you can only have one layer of nesting (since the previous layers would override each other).
Traces? Now we're talking. These bad boys have everything above.
And best of all? You can derive metrics from traces. You can derive logs from traces.
The weakness of traces? They're much bigger than logs. They're more costly to store.
It's also harder to get your head around tracing than logging.
It's messier - you need to make contexts which often means passing blocks around.
The code becomes pretty convoluted fast and it obscures the meaning.
But? Traces are the ultimate observability data type.
Common Questions
So are you saying metrics aren't useful?
No! I'm saying they should be derived from more granular sources.
So are you saying that structured logging isn't valuable?
No! I'm saying you need to use it to send wide events.
But it's still got its limitations - no context.
Are you saying that traces don't have value?
No! I'm saying that traces can be appropriated to implement the "single wide event" ethos.
Lets forget the three pillars.
Let's favour traces - the most useful of the three.
Let’s instrument our apps with Open Telemetry.
Let’s vote with our feet.
Down with the three pillars.
Context aware, wide events forever.
Agree with this?
Sign the manifesto at kill3pill.com!