The now (in)famous Joel Test asks 12 questions to determine the maturity of an organisation. Nearly 2 decades later the development ecosystem has evolved and many practises and techniques have become commonplace.
The DevOps Ladder is an attempt to document the best practises in the form of an information radiator that provides a guided path for continous improvement.
The vertical (Capablity) ladder includes 10 steps that are required throughout a project or product’s lifecycle. The horizontal (Maturity) ladder has 4 levels or steps. The goal of any project should be to implement all capabilities and then move them to of Level 1 ASAP.
Adding a capability or improving the maturity level is called climbing the ladder, There are many styles of climbing:
With a combination of styles being common:
Many DevOps and “agile” anti-patterns are highlighted by the outliers.
|Capability||Underwater||Level 1||Level 2||Level 3|
|Leadership||Command and Control||Servant||Transformational|
|Architecture||Ivory Tower||- Just Enough
- Last responsible moment
|Source Code / Version Control||Shared Drive / Zip||
versioning strategy e.g. semver
|Branching Strategies e.g.
|Build||Manual / IDE||Snowflake||Phoenix||Reproducible Builds|
|Continous Integration||Test Data Management|
|Test Automation||None||Nightly||Per Commit / PR||- Matrix
- Epemeral testing instance per PR
|- Fuzzy Testing
|Code Review||None||Pair Programming OR
|Static Analysis||- Security scanning
- Dependency scanning
- Architecture compliance
|Deliver||Using a checklist||1-step to production||Manual release strategies e.g. canary or blue/green
|Automatic rollout strategies based on business metrics e.g. using Spinnaker|
|Deploy||Heavyweight change control||Lightweight change control||Business driven|
|Monitor||System Metrics||App Metrics||Business Metrics|
|Docs||Getting Started / README||Architecture Decision Records||- Playbook
- Cultural Manifesto
Failure and Safety
DevOps has it’s roots in the lean and Agile movements where the concept of failure takes on new meaning:
Rather than viewing failure as the enemy, the agile view is that failure is a vital and necessary part of learning and expirementation - if you are not failing than you are not trying hard enough.
Embracing failure entails accepting the inherent nature of all systems to fail and build systems and processes that are more resilent to change and uncertainty - focusing more on mean time to recover (MTTR) than mean time to failure (MTBF).
How people react (and more importantly how they respond to other people) during and after failure is critical. Failure is a learning oppurtunity
The modern agile framework includes this concept of failure in 3 of it’s 4 pillars: - Learning and Experimentation - Safety and a requisition - Make people awesome
Build & SCM
Inverted Test Pyramid
Continuous Integration (CI)
CI is not about build and test automation, they are both core components of CI, but they are not CI.
CI is about ensuring that the different parts of a system are tested to ensure compatibilty as early as possible (ideally daily).
Continuous Delivery (CD)
CD is not about continously deploying to production, rather about the state of being that allows deployment into production at any time - this is possible due to the software always being delivered in a stable and tested state.
Recovery vs Repair
stateless > stateful (recovery > replication > clustering / repair)
Make the right thing, the easy thing
“If something is hard – do it more often and you will get better!” – Mary Poppendieck
Find and eliminate snowflakes
Ticket driven request queues are snowflake makers Immutable infrastructure can help in eliminating snowflakes
Find and eliminate information silo’s
Use documentation as code and design driven development Git is the ideal place to store documentation as it facilitates collaboration and ensures the environment and documents are always in
Every team and project’s ladder should be unique - which levels you are targeting will reflect the architectural decisions and trade-offs being made.
Mature ladders will have most if not all capabilties at Level 2 and a few at Level 3 - but never everything at Level 3. If everything is at Level 3 you are not stretching yourself - reconsider what L3 means for you. e.g. If you are already conducting code reviews then maybe stretch to having the reviewers automatically selected by git history or an OWNERS file - There is always something more you can do.
One or two L1 capabilties may also be OK on a mature ladder. e.g. Open Source projects rarely have planning above issue tracking - there is nothing inherently wrong with that. Likewise your deployment target may be very static and a L1 CI system is sufficient.