The DevOps Ladder

The now (in)famous Joel Test asks 12 questions to determine the maturity of an organisation. Nearly 2 decades later the development ecosystem has evolved and many practises and techniques have become commonplace.

The DevOps Ladder is an attempt to document the best practises in the form of an information radiator that provides a guided path for continous improvement.

The vertical (Capablity) ladder includes 10 steps that are required throughout a project or product’s lifecycle. The horizontal (Maturity) ladder has 4 levels or steps. The goal of any project should be to implement all capabilities and then move them to of Level 1 ASAP.

Climbing Styles

Adding a capability or improving the maturity level is called climbing the ladder, There are many styles of climbing:

With a combination of styles being common:

Anti Patterns

Many DevOps and “agile” anti-patterns are highlighted by the outliers.

The Ladder

Capability	Underwater	Level 1	Level 2	Level 3
Leadership	Command and Control		Servant	Transformational
Teams	Dysfunctional	Functional	Cross Functional	Empowered
Safety	Physical	Job	Psychological
Failure	Feared	Embraced	Celebrated
Westrum	Pathological	Bureaucratic	Generative
Architecture	Ivory Tower	- Just Enough - Last responsible moment	Loosly Coupled
Source Code / Version Control	Shared Drive / Zip	CVS Artifact Repository	versioning strategy e.g. semver	Branching Strategies e.g. Git Flow OR Trunk Based
Build	Manual / IDE	Snowflake	Phoenix	Reproducible Builds
Continous Integration	Long lived branches	Trunk Based	Test Data Management
Test Automation	None	Nightly	Per Commit / PR	- Matrix - Epemeral testing instance per PR
Testing	None	Integration OR Unit Testing	Integration AND Unit Testing	- Fuzzy Testing - Matrix - Downstream
Code Review	None	Pair Programming OR Code Review	Static Analysis	- Security scanning - Dependency scanning - Architecture compliance
Deliver	Using a checklist	1-step to production	Manual release strategies e.g. canary or blue/green feature toggles	Automatic rollout strategies based on business metrics e.g. using Spinnaker
Deploy	Heavyweight change control	Lightweight change control	Business driven
Run		Snowflake	Phoenix	Run offline
Monitor	Twitter	System Metrics	App Metrics	Business Metrics
Docs		Getting Started / README	Architecture Decision Records	- Playbook - Cultural Manifesto

Culture and Leadership

Accountability and Responsibility

Failure and Safety

DevOps has it’s roots in the lean and Agile movements where the concept of failure takes on new meaning:

Rather than viewing failure as the enemy, the agile view is that failure is a vital and necessary part of learning and expirementation - if you are not failing than you are not trying hard enough.

Embracing failure entails accepting the inherent nature of all systems to fail and build systems and processes that are more resilent to change and uncertainty - focusing more on mean time to recover (MTTR) than mean time to failure (MTBF).

How people react (and more importantly how they respond to other people) during and after failure is critical. Failure is an unplanned investment, the only thing you can control is the ROI.

The modern agile framework includes this concept of failure in 3 of it’s 4 pillars: - Learning and Experimentation - Safety and a requisition - Make people awesome

CI/CD

Continuous Integration (CI)

CI is not about build and test automation, they are both core components of CI, but they are not CI.

CI is about ensuring that the different parts of a system are tested to ensure compatibilty as early as possible (ideally daily).

Write tests. Not too many. Mostly integration
- The forgotten middle layer

Continuous Delivery (CD)

CD is not about continously deploying to production, rather about the state of being that allows deployment into production at any time - this is possible due to the software always being delivered in a stable and tested state.

Run & Operate

Focus on rapid recoverability / deployment

When building and running systems always focus first on the ability to recover at the cost of almost everything else.

Style	Database	Sesson Management	Deployment
Stateless	Event Sourcing	JWT	Immutable Infrastructure
Active / Active (Replicated State)	Data Guard Streaming Replication	Database / InMemory session replication	GitOps
Active / Passive (Shared State)	Oracle RAC SQL Server Clustering	Database / InMemory session clustering	Configuration Management
Active (Snowflakes)		Session Persistence	Manual

Failure design

Getting Started

Make the right thing, the easy thing

“If something is hard – do it more often and you will get better!” – Mary Poppendieck

Make terminating and replacing nodes a common and painless experience
Training, Copy and Paste, Policies and Non-Automated procedures are almost never easy

Find and eliminate snowflakes

Ticket driven request queues are snowflake makers
Immutable infrastructure can help in eliminating snowflakes

Find and eliminate information silo’s

Use documentation as code and design driven development
Git is the ideal place to store documentation as it facilitates collaboration and ensures the environment and documents are always in

Maturity

Every team and project’s ladder should be unique - which levels you are targeting will reflect the architectural decisions and trade-offs being made.

Mature ladders will have most if not all capabilties at Level 2 and a few at Level 3 - but never everything at Level 3. If everything is at Level 3 you are not stretching yourself - reconsider what L3 means for you. e.g. If you are already conducting code reviews then maybe stretch to having the reviewers automatically selected by git history or an OWNERS file - There is always something more you can do.

One or two L1 capabilties may also be OK on a mature ladder. e.g. Open Source projects rarely have planning above issue tracking - there is nothing inherently wrong with that. Likewise your deployment target may be very static and a L1 CI system is sufficient.