The Impact Of Feedback Loops
Poor feedback loops lead to bad software. And bad software has real consequences.
How many software engineering teams work
Many teams I've worked on work from tickets on a Jira board.
The goal is framed as "get all the tickets for this sprint done".
When that's done, they celebrate with a Slack post and everyone responds with an emoji.
The metrics reflect that they're killing it.
Many sprints later and all the tickets that comprised a project have been deployed.
They're a little behind the deadline but hey - it's software.
They high five each other, have a little zoom meeting with some self congratulatory back slapping.
Why?
Oh because "it's important to celebrate the wins".
Give me a f***king break.
Meanwhile in the real world
Software is fundamentally unpredictable.
So if there's a plan that's more than 1 week long...
...and the plan has been executed...
...and it all looks like it's gone to plan?
The reality is way messier.
The reality is way more toxic.
The reality is this.
In order to meet the arbitrary deadline, the engineers skipped some unit testing.
In the same vein, they have code that's very difficult to understand "because we took on some technical debt and that's ok"
The designers chucked pretty pictures over the wall and the engineers implemented them.
Turns out there was no time in the plan for user testing.
The API they were calling was a bit flaky and they forgot to set timeouts and implement retries.
But that's just a detail.
Testing was done at the end and a ton of issues came out of it.
After every deploy the engineers count that ticket as done.
After all it's deployed! That's the same thing as done.
There's even a column in Jira - they renamed it from "Done" to "Deployed"
So much missing work.
For all the missing work, the same lame excuses are wheeled out time and time again.
"We'll put that on the backlog and do it later"
"We can add that as a fast follow"
"We don't have time for that"
"That's not a priority"
After the project has been released and announced at the company town hall in a 16 slide presentation that sent everyone including the engineers who built the damn thing to sleep...
Turns out they're onto the next project.
The tickets stay on the backlog.
They age.
The "backlog" is over a thousand tickets strong now.
This team has been very busy releasing projects just like this one over the last year.
But so what?
Here's so what.
Meanwhile in the real, non-techie world
A man is visiting his mum in a care home.
It's a stressful period of his life.
He's trying to juggle the relationship with his partner, meet his own needs, make the most of his time away from home, cope with the emotional impact of his mum in slow decline, tidy her house, make sure she's well looked after, do research for his business...
He's travelling from Preston to Leyland by train.
He's in a rush.
He opens the Trainline app.
He adds a favourite journey from Preston to Leyland.
He looks at the next train time.
Oh crap! it's in 2 minutes! platform 1!
He bombs up the stairs over to the platform.
Navigates other passengers moving at a snails pace.
The train is waiting at the platform.
He jumps on.
"Oh man... I'm in such a rush. I'd better slow down and check this is the right train"
He leans out the carriage.
"Blackpool North... 11:49... yup that's right."
That's what the app said.
He sinks into the seat, relieved. He made it!
Now to buy the ticket on the app.
Press buy...
Put his CVC code in...
Bit of a pain - but at least he can remember it.
Press the buy button...
...this is taking ages...
...come on hurry up...
...yes! Finally.
There's an annoying modal that comes up every time.
He hates it. "You're going to <destination>”
There's no way of cancelling it or switching it off and pressing outside the modal does nothing.
And here it comes. With the usual stupid zoomy animation.
"You're going to Kirkham and Wesham!"
That's ... wait what...? He wants to go to Leyland...
He feels a creeping sense of dread.
Beep beep beep!
The doors close.
Too late. He realises what's just happened.
When he added the favourite journey he assumed that the app would then show him that journey.
It didn't.
Adding a new favourite journey, it turns out, sent him back to...
...the first favourite journey.
Which is in the exact opposite direction of his desired travel.
With a deep sigh he tries to find the next train that's actually correct.
It's another £8.20. Total waste of money.
Oh... better not miss the next stop.
It'd be great if the app would give him notifications when he's approaching his stop.
It does give notifications - notifications about train strikes in Birmingham.
And how is that useful to him?
Oh. That's right. It isn't.
His phone regularly pings him about some garbage Tweet from someone he followed 10 years ago...
...but it can't show him what he needs right now.
Thankfully he's not so busy booking the correct train that he misses this stop too.
He jumps off.
Now to pay...
...enter the CVC AGAIN...
...press buy...
...come on...
...and now the app just hangs.
A spinner overlays the purchase button.
He waits. And waits. And waits.
Nothing. No error message. No update.
How long exactly does he have to wait until the app gives him feedback?
It's anyone's guess.
The seconds tick by. He closes the app.
Sigh.
Fine. He'll go to the ticket office.
No-one there.
Fine. The ticket machine then.
He presses "Quick buy".
Nothing happens.
He presses again.
Still nothing.
Finally he figures out you need to press and hold your finger for nearly two solid seconds.
Despite us all having magical devices in our pockets that are super responsive and we can play farmville at 3am and be distracted from our shallow lives by incredible graphics and animations..
...the same technology somehow hasn't translated into touch screens at train stations that actually, like, WORK.
The "common journeys" screen that comes up next has destinations like "Liverpool Lime Street".
What?! How is that a common destination? From freaking Kirkham?
Sigh. He needs Leyland.
Presses L.
Waits for 5 seconds.
E.
5 seconds...
Y.
5 seconds.
It would be easier to forge the train ticket at this point.
Presses buy.
Presses again.
Swipes his card.
After a 20 second delay... it's done!
He's going to...
...erm his destination.
It’s been a stressful experience.
I know it was stressful, because this man was me.
What’s the link with observability?
Almost every dysfunction above is a failure of feedback loops.
Unit testing is a feedback loop to make sure the software is working.
User testing is a feedback loop to make sure the humans understand how to use the software.
Defects in testing are a feedback loop. Feedback that the design is wrong.
Observability is another feedback loop.
The difference?
Observability is a feedback loop in the only environment that matters - production.
If you have poor observability, you’re working in the dark.
How many users are having problems? No idea.
Why, exactly, are they having these issues? We’ve some theories.
How is the app being used? We think we know.
Are we leaking revenue? Probably not.
Wait… does observability replace testing?
Of course not.
Many practices like unit testing and user testing catch a ton of issues and mistakes with both the design and functionality early.
Fixing problems early on is essential to moving fast.
We can catch a ton of obvious mistakes early. And we should.
But no matter how much we try to predict how our systems will behave from our chair…
…it’s a fools errand for anything but the simplest of apps.
We can only have real, actionable insights when the app is being used by real users in real situations.
I’ve deployed pieces of work with and without watching my observability tool.
I’ve seen the difference and it’s stark.
In almost every case with observability, I learned something new, whether intentionally or by accident.
And it’s actually, kinda, (whispers) fun.
So do yourself a favour.
Find a way of adding some basic observability to your app.
Look at it often.
And you’ll start to see patterns you can act on to improve.
Want help?
I’m on a mission to teach Rails engineers how to observe their Rails apps in production.
Even if you’re not using Rails, many of the principles can be adapted for your technology.
Check out the resources below.