Andy Balaam – February 2, 2021
by Tomasz Ptak
Last week I had a socially-locked-down celebration in my home office: our team’s Continuous Integration infrastructure has become a bottleneck. Read on to see why I see this as success.
Long long time ago in an OpenMarket far far away
What would be your worst nightmare when it comes to building projects? Just think of one, or perhaps twenty would not be enough for you? A few years ago I joined a young team of talented engineers maintaining a lot of the company’s heritage and building foundations for its future. What could possibly go wrong?
One of my key strategies for developing software is to not trust my own work. Thankfully the engineering world has spoiled us with tools and wisdom on how to prove that the outcomes match intentions: test driven development, testing frameworks, CI/CD pipelines, automation, static code analysis, vulnerability scanners, practices, conferences, ideas, standards, you name it. We just needed to make sure it was put to use. And seeing the state of some of the many projects we owned, we needed that rather quickly.
What really matters?
I like to see the direction in which my work is heading: the effect I need to achieve is always a moving target, but the value it needs to bring, if communicated clearly and thought through, will guide you in the right direction. In practice it feels like using a compass in a maze – you need to be able to point at the exit but the shortest path will usually get you lost. The maze you need to walk through represents the projects you maintain. The more technical debt you face, the more difficult it is to get past. When you don’t have the technical debt, each project you own is a sprint lane.
That’s why if I want to deliver value to the customer, I need to ensure I tackle the technical debt.
If I have the direction and it’s clear, I can focus on where I’m heading. I can take time to ensure I get the feedback I need to prove my rights and find my wrongs. I can stop and think, I can look for better solutions, I can learn, improve and propagate the learnings back on other projects I am responsible for.
All that will work if the same rules apply on the organisational level: I am challenged to deliver value and am empowered to understand it, buy into it and build my ways to deliver.
What gets in the way?
I like the way of thinking about technical debt that Dave Rupert has shared in his blog post: the technical debt is a lack of understanding. While a lot of the debt comes from cutting corners to deliver something sooner with an intention to get it right later and usually never getting back to it, this is just part of the story.
It’s not just the corner cutting that leads to the debt. If engineers do not have the resources to maintain projects which are not actively developed to propagate the improvements over all the projects, this leads to fragmentation, and fragmentation leads to loss of understanding. If we keep trying those new latest hot tech things that are so cool to write and have just each of them in single applications among many projects, this also leads to loss of understanding. If we don’t set the team or organisational standards of working, this will also contribute to the problem. If we don’t treat our development tools and staging environments (be it cloud, docker-based tests or twin servers in your data centre) as the most important project that we deliver, if we don’t touch them because they kind of work, it will contribute to the debt, and we will start tripping over our own feet.
As a team we are suffering from the tooling fragmentation that the heritage has brought. To make it worse, many of the attempts to get out of it have led to adding even more to it.
Working over four versions of Java, two of Maven, a few of Gradle, Buck and a few shell scripts to build, with five deployment solutions, three of which were built in house as an attempt to unify everything can give you an idea.
The first step to fix it that we took was identifying the outliers and making them similar to the majority of projects. This does not solve all our problems but has helped us to perform step two: declaring the direction.
We have met and written down the tools and solutions we use for various aspects of our work and settled for one. If we find an area we missed, we add it. If we want to make a change, we need to do it using the “one in – one out” strategy, but for now we strongly prefer the “one out – another one out” approach.
Having a list of things is not enough though, and we have a lot of wiki pages to prove it, so we decided to go a step further: we have introduced gamification to our projects.
Our state of project ownership is maintained as code. We provide a list of projects, which team owns them, where code is located, what commands build/release it. Then we prepare a set of checks against the project: does it have a readme? Is it built with Maven or Gradle? etc. If a given criterion is not satisfied, the project loses points. Finally, we have a scoreboard showing the best projects and worst offenders that we look at every Friday before the standup. Friday is our fix-it day. If we change our choices, we change our checks and reevaluate the projects.
The journey so far
When I joined the company, my team used a very old “communal” build server for a few projects and just had its own Jenkins set up. The Jenkins was building only some of the projects, when it worked. We have unblocked it, added new agents, added all projects, removed outlying build solutions, added static code analysis and vulnerability scanners, added docker-based system and integration tests, parallelized some of the projects tests to speed up feedback, simplified the structure of most actively developed projects to have a main branch, no develop, gradle to build them, and a tag-based versioning, decommissioned a few services, merged some to reduce the releasing burden, introduced docker-based Jenkins agents to not rely on state of the Jenkins machines when building.
This has let us speed up our delivery cycle and positioned us in a better starting point to migrate our services to cloud while still delivering value to our customers with sufficient trust that we’re not breaking anything else.
It looks like we’ve now hit a limit of how many agents we can provision. What has enabled us to deliver at a scale and with confidence is now what is getting in our way. Looks like we’re starting a new cycle of improvement.
It’s easy to fall into a trap of a chronic not-yet-done-ness when one invests time, effort and emotions in making one’s own and everybody else’s lives easier. And it is technically true but it is important to understand that improvement to the ways of working is like breathing. We need the air to be of satisfactory quality, the lungs to be in order and with no faults and we need to keep pumping.
With support from the organisation, with enough self-care within the team and with resources to do the work, we can keep going while focusing on delivering what matters to our customers: quality empathetic interactions.