4 Key Metrics For Continuous Delivery

2022-09-01 13:29:10
ZenTao ALM
Original 595
Summary : Without data to quantify where we are and where we are aiming, how can we know if we are currently moving towards the right goals? The following blog post talks about the importance of metrics in the continuous delivery process.

"If you can't measure it, you won't be able to improve it. If you can't measure it, you won't improve it." -- Peter Drucker

The above quote from the master Peter Drucker resonates with us intuitively and demonstrates the importance of metrics in every area. Without data to quantify where we are and where we are aiming, how can we know if we are currently moving towards the right goals? The following blog post talks about the importance of metrics in the continuous delivery process.

I. How can metrics play a role in the continuous delivery process?

What makes you feel that you have reached a good state of continuous delivery? To answer this question, the GoCD team interviewed experts, programmers, operations staff, and partners at various points in the DevOps chain. They also interviewed business stakeholders, as successful continuous delivery allows technology to serve the business better and drive business benefits, allowing the potential work of implementing continuous integration to be recognized as valuable.


Through a comprehensive analysis, we have identified four valuable metrics:

  • Number of packages that can be released
  • Cycle time
  • The average time between failures
  • Average recovery time from failures

II. How many distributable packages do you have?

To achieve successful continuous delivery, you must commit codes consistently, especially to the master. If you commit to individual branches all the time, then such commits will not add any value to the codes that will be released to production.


Maintain a high frequency of delivery of releasable packages while relying on the team having completed trusted testing of the package. A common counter pattern is tested verification activities that last for hours or even a full day. In many scenarios, these tests are not trustworthy, which means that when the tests are finished, you may not be any more confident in the quality of delivery than you were at the beginning of the tests. This makes things costly because it makes everyone cautious about releases. Deploying software seems like a game of Russian roulette.


This metric emphasizes the importance of collaboration between the This metric emphasizes the importance of collaboration between the product and R&D engineering teams. Cross-functional teams must be able to build a road map that allows user stories to be broken down, allowing the team to release stories at an advanced level that delivers value to users. If the product team doesn't work with this collaborative model, then the R&D team is left to develop and delivers large-scale features all at once, perhaps not realizing until the last minute of the game that the large feature doesn't bring any value.


As a complement to rigorous roadmap planning, R&D teams should take advantage of new technologies such as feature toggles, which can be configured to turn features on or off (to control whether they are shown or not), thus enabling the release of features to the production environment while maintaining an unaware state for customers (in scenarios such as blue-green releases).

III. How long is your cycle time?

One of the most common pain points we hear from developers is the long cycle time, which starts with the submission of the codes through testing, verification and deployment, which is a long and anxious process for developers. XKCD's cartoon about fencing while waiting for codes to compile is a gentle and realistic reflection of this process.

Not only does this downtime not allow you to accelerate the delivery of value to your customers, but it can also cause work to be unfocused. Improving your team's cycle time relies on efficient testing and closing the feedback loop quickly.


Here are some practices to help you shorten the cycle time:

  1. Execute UT early, before Pipeline and complex automated tests that take a long time to implement, this will allow you to get some basic feedback early and save time.
  2. Having dependencies passed between Pipeline stages will avoid unnecessary duplication of builds, which can be helpful.
  3. Make the build as parallel as possible, which will also save time.
  4. Finally, make sure you have the right build resources so that whatever build you need to run, you will have plenty of agents to run the various tasks.

IV. How long is the average time between failures and recovery time for you?

The average time between failures and recovery time is often linked, as the balance between the two is very important. The average time between failures allows the team to ensure that the release is as stable as possible and try to avoid failures. However, focusing solely on the average time between failures and reducing failures can make teams overly cautious and reluctant to release new versions. Software development's core is continuously delivering new value to users and meeting their needs. Therefore, the average time to recovery from failure will be a key checkpoint, a metric demonstrating the team's ability to correct errors.


Achieving the desired average time between failures relies on closing the loop with earlier feedback and meticulous verification in a test environment. This type of verification should be performed in a test environment where the data and environment are identical to production. A strong local build capability is also a must, as failures are inevitable, so keeping failure recovery times as short as possible becomes especially important. When a Pipeline or a release fails, how long do you need to roll back to a stable version? Continuous and stable monitoring of the production environment is an essential capability. Teams should be informed of failures from monitoring and alerts rather than being informed of failures only when customers complain.


Failure recovery exercises like rollbacks can reduce the average recovery time. Establishing an automated rollback mechanism can save the team time in performing an in-depth analysis of the root cause of the problem. Rapid analysis to locate problems relies on an informative logging system. By obtaining valuable log information, developers can pinpoint a failure that occurred at 2 am.


Conclusion

Back to Peter Drucker's famous quote. To improve something, we first need to find ways to measure it and visualize it. That's why creating data dashboards and visualizing metrics is valuable, making teams more accountable and connected. On the other hand, I don't want to say that metrics are a panacea. There must be meaningless metrics and vanity metrics out there. What you want, ultimately, is to motivate employees to focus on the tough problems and, by solving them, allow them to add value to the team and the organization.


Write a Comment
Comment will be posted after it is reviewed.