As you probably already read about, in the last Monthly Gathering, we spent some time discussing the state of DevOps in Whitesmith.
We already adopt many practices from it, but feel like we can make its adoption more coherent across all our projects and use it to find new ways to improve.
As a way to guide teams through retrospectives and making decisions on the road of improvement, we tried to clarify what the definition of awesome for DevOps would be for us. This naturally carries some bias about the things we feel like we need to improve more, but we thought it might be useful for someone else out there, so here it goes.
Our definition of awesome for DevOps
Team members are autonomous and productive
Anyone can start coding on a new project in 1h.
Any team member can easily deploy, rollback and access logs from any of their project components. Processes and automation exist to mitigate the risk of compromising the state of a system when a rollback is necessary.
The team uses their ability to deploy and rollback diligently and wisely - deploys carry risk, but delaying action frequently increases said risk.
Automation is used heavily and smartly
No manual operations on servers. We use scripting and automation for all tasks, ensuring they are stable, can be easily and consistently re-provisioned or re-configured and reducing the bus factor on the team.
All projects have sensible test coverage and use Continuous Integration (CI) that runs tests after pushing a new branch to Github. Sensible means it covers all the core functionality is easy to maintain and runs fast (< 10min). Pull-requests with broken builds are never merged until back to green.
Informed visibility of the process
We track indicators of speed and defects (bugs) of the process automatically:
- User Stories Cycle time, or the time it takes for a feature to go from idea to live;
- MTTR (Mean Time To Resolution) of defects, or the time we take between knowing about a bug and fixing it;
- Number of deploys through time;
- Number of bugs through time
These indicators, tracked by project and company-wide, are used to make more informed discussions and decisions during retrospectives. They help us understand and find the bottlenecks that matter and confirm any changes result in a better overall process.
Freedom and responsibility when picking tools and technologies
We optimize for cross-pollination over executive prescription as a way to define company standard tools.
Teams have a high degree of freedom regarding the tools that each team uses, as long as the entire team is comfortable with using them.
This means though that when picking tools we should optimize for team UX over cool factor. This also means they are free to pick and change tools every time there is a consensus of X is better than Y and the change will have a positive net effect on the overall flow.
Sensible deployment and infrastructure decisions
Applications or services in our consulting work or which are critical for company or product operations run on standard and battle-tested solutions the majority of the team can operate easily.
All Investment Time and hackathon projects are free to deploy however they feel best as long as the project team is empowered by the solution. This is used as an opportunity to explore new solutions and share knowledge internally of the most promising ones.
Retrospective and decisions with the whole in mind
All teams run retrospectives at least monthly and make an effort to visualize the pipeline and process from an idea to have it running in production, informed by metrics collected and our past experiences. From there we focus on finding:
- The steps which impacted the overall process speed (bottlenecks), while having in mind that improving any part that isn't the bottleneck will not improve our speed;
- The steps and actions which allowed for the introduction of bugs into the system, weighing on their impact and cost of fixing;
- The steps, processes or practices in place which limited team autonomy, which has a direct effect on overall process speed.
Each of those problems is then discussed until a solution is found that addresses it without compromising the other goals. For example, if we have a way to make the process faster, we need to be confident that it won’t lead to more bugs or reduce the team's autonomy. Otherwise, it would simply lead to an increase of unplanned work, which in practice will delay the project on the long-run.
How can we go from the current state towards that definition of awesome?
Obviously checking all the points is hard. It's a long list with some idealistic points, while we should always be sensible to the specific needs and compromises of each project. At the same time, on the parts that we clearly agree we need to improve, we need to ensure we improve consistently and avoid the self-defeating effort of changing everything at once.
The road then is to try, on each project, to make a regular effort of finding the one or two points above that will bring the greater return on the investment.
To achieve this, we are now going through a DevOps focused retrospective on all teams to find those points. Naturally, it is expected that this DevOps perspective is continuously present in all team retrospectives.
To support this effort and coach the teams as they feel the need to learn new processes or technologies, we made a group of informal volunteers which are available to help. To prevent dependency on them, however, which would be quite ironic when applying DevOps, the volunteers should never touch code on the team they are helping, but instead guide and teach the team to be empowered on their full project scope.
Are there any key factors you see missing on our definition of awesome, or perhaps tactics and practices that were really useful to you in this process? Let us know what you think on!