The DevOps Ladder

The now (in)famous Joel Test asks 12 questions to determine the maturity of an organisation. Nearly 2 decades later the development ecosystem has evolved and many practises and techniques have become commonplace.

The DevOps Ladder is an attempt to document the best practises in the form of an information radiator that provides a guided path for continous improvement.

The vertical (Capablity) ladder includes 10 steps that are required throughout a project or product’s lifecycle. The horizontal (Maturity) ladder has 4 levels or steps. The goal of any project should be to implement all capabilities and then move them to of Level 1 ASAP.

Climbing Styles

Adding a capability or improving the maturity level is called climbing the ladder, There are many styles of climbing:

With a combination of styles being common:

Anti Patterns

Many DevOps and “agile” anti-patterns are highlighted by the outliers.

The Ladder

Capability Underwater Level 1 Level 2 Level 3
Leadership Command and Control   Servant Transformational
Teams Dysfunctional Functional Cross Functional Empowered
Safety   Job Psychological  
Failure Feared Embraced Celebrated  
Architecture Ivory Tower - Just Enough
- Last responsible moment
Loosly Coupled  
Plan/SDLC/Process Adhoc/Email/Excel Jira
Github IssuesIssues
Kanban SAFeScrum  
Source Code / Version Control Shared Drive / Zip SVN
Artifact Repository
Git
versioning strategy e.g. semver
Branching Strategies e.g.
Git Flow
OR
Trunk Based
Build Manual / IDE Snowflake Phoenix Reproducible Builds
Continous Integration     Test Data Management  
Test Automation None Nightly Per Commit / PR - Matrix
- Epemeral testing instance per PR
Testing None Integration OR
Unit Testing
Integration
AND
Unit Testing
- Fuzzy Testing
- Matrix
- Downstream
Code Review None Pair Programming OR
Code Review
Static Analysis - Security scanning
- Dependency scanning
- Architecture compliance
Deliver Using a checklist 1-step to production Manual release strategies e.g. canary or blue/green
feature toggles
Automatic rollout strategies based on business metrics e.g. using Spinnaker
Deploy Heavyweight change control Lightweight change control Business driven  
Run   Snowflake Phoenix Run offline
Monitor Twitter System Metrics App Metrics Business Metrics
Docs   Getting Started / README Architecture Decision Records - Playbook
- Cultural Manifesto

Culture and Leadership

Leadership

Team

Failure and Safety

DevOps has it’s roots in the lean and Agile movements where the concept of failure takes on new meaning:

Rather than viewing failure as the enemy, the agile view is that failure is a vital and necessary part of learning and expirementation - if you are not failing than you are not trying hard enough.

Embracing failure entails accepting the inherent nature of all systems to fail and build systems and processes that are more resilent to change and uncertainty - focusing more on mean time to recover (MTTR) than mean time to failure (MTBF).

How people react (and more importantly how they respond to other people) during and after failure is critical. Failure is a learning oppurtunity

The modern agile framework includes this concept of failure in 3 of it’s 4 pillars: - Learning and Experimentation - Safety and a requisition - Make people awesome

CI/CD

Build & SCM

Test

Test Automation

Inverted Test Pyramid

Continuous Integration (CI)

CI is not about build and test automation, they are both core components of CI, but they are not CI.

CI is about ensuring that the different parts of a system are tested to ensure compatibilty as early as possible (ideally daily).

Continuous Delivery (CD)

CD is not about continously deploying to production, rather about the state of being that allows deployment into production at any time - this is possible due to the software always being delivered in a stable and tested state.

Run & Operate

Immutable Infrastructure

Recovery vs Repair

stateless > stateful (recovery > replication > clustering / repair)

Getting Started

Make the right thing, the easy thing

“If something is hard – do it more often and you will  get better!” –  Mary Poppendieck

Find and eliminate snowflakes

Ticket driven request queues are snowflake makers Immutable infrastructure can help in eliminating snowflakes

Find and eliminate information silo’s

Use documentation as code and design driven development Git is the ideal place to store documentation as it facilitates collaboration and ensures the environment and documents are always in

Maturity

Every team and project’s ladder should be unique - which levels you are targeting will reflect the architectural decisions and trade-offs being made.

Mature ladders will have most if not all capabilties at Level 2 and a few at Level 3 - but never everything at Level 3. If everything is at Level 3 you are not stretching yourself - reconsider what L3 means for you. e.g. If you are already conducting code reviews then maybe stretch to having the reviewers automatically selected by git history or an OWNERS file - There is always something more you can do.

One or two L1 capabilties may also be OK on a mature ladder. e.g. Open Source projects rarely have planning above issue tracking - there is nothing inherently wrong with that. Likewise your deployment target may be very static and a L1 CI system is sufficient.

More Reading