Please enable JavaScript.
Coggle requires JavaScript to display documents.
Jason and Vee (Reliability (Problems (Lack of Control (Downstream…
Jason and Vee
Reliability
Problems
Lack of Control
Downstream dependencies
example
Locations API
Infrastructure stability
Legacy systems
Inhibit progress
Risky to change
Lack of knowledge in teams
Reluctance to work on them
Fear
Unpredictable
Deployments
AdCentre
40 minutes
Error Prone
Rollbacks are a poor customer experience
Data
Incidents
97% uptime based on data
Reporting
Good data
Number of customers affected
Cost in dollars of failure
Cost in effort to recover
Health Metrics
Incidents
Customers should not perceive
failure or degradation
example
"Houston we have a problem" style page takeovers
we know they happen
But some products should never go offline!
But failure should be
handled better
Asking to repeat an entire
business process is bad
Example
Having to fill in an entire
form again
Good examples
Talent Search
Rollbacks were quick
Dashboards showed real-time state
of system health
Compensation actions were built
into the system
Dev Teams
Reliability is always a
concern
Which is great!
But in old systems health data
can be difficult and knowledge is
scarce due to complexity
Work Areas
CAJA
Guaranteed Hire
Team structures
Everyone should be at
the same level
Tech Lead
Product
UX
Challenges
Perception of hierarchy
Not helpful and the perception
can be taken advantage of
Tech Leads need to
know they are at the
same level
Product and Delivery
are seen as calling the
shots
Not Techical people
Makes Technology influence difficult
Platform reliability
invariably suffers
Can get caught in a trap of adding
more features to a brittle system
OKR's
Can sometimes apply
the brakes on platform
improvements
Can get too focused
on metrics
But are also good for
prioritising getting Product built
Delivery Management
Not very tech in focus
Online Tech
Tech Leads
Must know the
customer journey
Therefore have some business
acumen
Data should drive decisons
Not sentiment
Or hierarchy
Need to be experts in tech
Or know how to hire/cultivate
people who are
Take ownership of
Tech Health metrics
Tracking
reporting
Communications