We all know the scout rule: to gradually improve the source code by checking in a module cleaner than it was when you checked it out. But what happens when you apply similar tactics to reducing the bug count? During the summer, the Candy Crush team decided to try this out by setting a bug ceiling and succeeded in reducing their bug count to almost zero.
The Candy Crush team consist of about 60 people divided into four cross-functional sub-teams, of which 15 people work with mobile development. Per Malmén of King’s Tech Academy sat down with Candy Crush team members Robin Eklund (Senior Developer), Alexandre Genoud (Development Manager), and Johannes Christenson (Developer) to talk about how it started, what it is all about, what their experiences have been so far, and what advice they would give to other people considering trying this out.
Robin, Alexandre, and Johannes work with the mobile game that is run on an in-house engine written in C++. They develop on Windows, Mac OS, and Linux, and the game is published on all major platforms, such as iOS, Android, and Windows phone. The same work has been done on the canvas game with similar result as those mentioned in this article.
Per: How did it start?
Robin: We had a long list of 80 major bugs. We had tried various things to address the problem, but it always ended up in too much overhead.
Alexandre: What we used to do was to create a list with the bugs to fix. Then we would sit down for one or two days and fix them. It was very efficient and we felt awesome afterwards. But as time went by the bug count would go up again. So it was not very efficient in the long run.
Robin: Even creating the list was a big hustle. So we wanted to get to zero bugs and turned to our Agile coach, Mattias Karlsson, for some advice. Our first idea was to take one day each week and just fix bugs. The Agile coaches advised us against that, since it does not change the behaviour of handling bugs, it just adds a day for fixing things and then the next day nobody cares again.
Mattias’ idea was to set a ceiling for the bug count and to aim at not going above this number. We first set the ceiling to the current bug count, around 80. If the bug count went over the ceiling, the amount of bugs would be mentioned at the daily stand up for all the teams. If people fixed more bugs and the bug count went below 80, we would set the bug ceiling to that number, like 78. Then we continued to do this, with gradual decline through changing behaviour rather than addressing it all in one block. Today we are down to 7 major bugs and that actually exceeds our goals for Q3.
Per: When you go over the ceiling, is it a stop the line thing and everyone works on bugs?
Robin: Yes and no. I mean, we have 15 mobile developers in the Candy Crush team, so not everyone stops what they are doing to fix bugs. But usually at least someone from each team, maybe two people, will do the job. It is sort of a balance thing, because obviously it is a context-switch to drop everything and start working on bugs.
Alexandre: I was actually surprised that the bug count went down, because of how it works. There is no one forcing you to pick stuff up. It is a team responsibility, so if anyone has time over they will solve some bugs. I also think that is why it works, it is a team goal but we do not enforce it. Because switching context is ok when you feel like it, but not when you are forced to.
Johannes: I also think that one of the reasons it has been going down without anyone trying to push it, is that we have several teams and each team works on making sure that we are not above the ceiling. So if each team gets rid of one or two bugs, that means that we go down six bugs, when we were trying to fix two.
Robin: Yeah, we overshoot constantly. Which is why the ceiling is going down. I also think the scrum masters has a lot to do with it. Just bringing awareness, and visualisation. Each team has a manually plotted graph next to their scrum board and it is usually the first item of the stand up. Where are we on bugs? Oh, we are one above the ceiling. Then you know the person who does not have anything urgent to do that day will sit down and fix something.
Per: How often are you above the ceiling and what is the feeling of the team at that point?
Robin: We have done some cursory analysis on the amount of time spent above the active ceiling. Usually we go down quite fast. When we go up, it is usually the result of a QA round or check. But then we go down the same day. We had one period recently when we were above the ceiling for a week or two. We have not really come to any conclusion of why.
Johannes: In a way, it is gamified since you can follow the bug count and see the trend go down. If it goes up again it feels like a failure even if you did not create that bug.
Per: And you never increase the ceiling?
Robin: No, that would be like saying that we allow bugs.
Per: What was the feelings of the team at the beginning?
Alexandre: There was a bunch of people that needed to be convinced by the scrum masters and Agile coaches. Myself included, I am always sceptical. I do not buy into stuff immediately. I want to be convinced. I want to see it work. I was also sceptical since we started before summer when everyone goes on vacation. But then I saw the numbers decrease every day and I thought, this is actually working.
Robin: I had all the whys when the method was explained to me. I was very sceptical because I did not know if it would work. I am more drastic, like lets just stop all development and fix all the bugs and then we are good. That was my idea.
Johannes: I think it helped that the goal was not really outspoken. No one said, now we are going to focus on solving all the bugs, or we are going down to 20 bugs. We just started measuring the numbers and said we are not going to go above that ceiling. The though was, at least we wont be any worse. People started fixing a few bugs and the numbers started to go down.
Robin: I also think that might have helped. Because if someone tells me, we are going to zero and this is how we are going to do it, I am going to start questioning that. Is this really the best method? But if they just present it as, we are going to try this, without actually telling me why then maybe it is easier to adjust the behaviour. So the funny thing is that without really thinking about it, our goal was reached.
Johannes: I think that triggers many. To have something that is easy to improve on and that is measurable. You can lower the number even if it is only by one. Similarly, it is not a big accomplishment to finish one level in Candy Crush. But to finish all of them!? No one sits down and thinks, I will play all levels of Candy Crush in one go. That would be too hard!
Per: What kind of support did you get from the producers?
Robin: We communicated that we were trying this and they had no real objections. The producers have an ownership of the product too, so they also want to lower the amount of bugs. They were just afraid of the cost in comparison to the production time. By providing them with a solution where we go down gradually, they did not need to think about the pile of 80 bugs and that eased their minds.
Alexandre: Something that is quite better now, is that it feels ok to take time to fix bugs. Before you could feel bad if you worked on a bug for a long time. These days you are encouraged by your scrum master and therefor you get motivated to do the work.
Per: Is there any difference in the uptake in the different teams?
Robin: I do not know, stuff just gets fixed. You can probably do an analysis on the amount of closed bugs and what team did it, but nobody seems to care about that. All teams own the product.
Alexandre: We have not really talked about that. We get reminded everyday of the bug count and then someone goes and fixes bugs. But it is surprising really, because we do not decide who needs to do what, yet the numbers go down.
Per: What kind of bugs are remaining now? Is it the bugs that have been there for a really long time and are hard to fix?
Robin: The nice thing about the entire idea, is that developers are good at fixing bugs, if they are given the space to do so. We do not need to prioritise things like we did before. We just let the developers figure out what to fix. This also eliminates the process of gathering five developers in a room to make these types of prioritisation.
I thought it would be an initial surge and then it would sort of stabilise after that. And it is true, in a sense, there was a big surge in the beginning where it got down to 50. But then it has kept on decreasing. I also thought that the developers would fix the easiest bugs first and that the difficult ones would be left to the end. But it is not that simple, because a bug that is difficult to one person might be really easy to fix for someone else. So the distribution on what gets fixed is really widespread. Today, the list is a mix of new and old bugs and the only thing stopping us from getting to zero, is time.
Per: Do you notice any difference in working with the code base or the rate that you produce new bugs?
Robin: We have seen some positive changes as the bug count has been going down. While this is not only a result of the bug ceiling, as we are doing a lot of other stuff in parallel, the synergies are undeniable.
We have seen that the delay between the sprint end and the release has decreased from 10+ days to one day. We also do less Release Candidate builds per release. These two factor have lead to less time spent on QA.
Johannes: We also have less bugs reported during QA. This is probably due to, more focus on testing before finishing a story. Developers know now that introducing a bug will mean fixing it, so more effort is put on finding them before they go into a release candidate.
It is also better for our QA guys, because the bugs they report gets fixed. It also makes it easier for them to report bugs because it is easier for them to check if they are reporting a duplicate bug.
Per: Any tips for other teams that wants to try the same approach?
Robin: Focus on changing the behaviour and the way of working. Also, have a little more faith than I did.
Alexandre: It is easy to try, because it basically costs nothing to start doing. The scrum master just checks the bug count in the morning and tells people at the board. One thing it does require, is that the team takes ownership of the product. Since no one is telling you that you have to fix bugs.
Robin: That is true, this works because our developers have wanted to fix these bugs for a really long time. All we were lacking was the tools. We needed someone to say it is okay to do that and it is actually prioritised to do so. So if the developers of the team does not care, then it will not work.
Per: What is next?
Robin: Going forward, we will continue to work in the same way. When we reach zero major bugs, we will probably also start attacking minor bugs, while trying to leverage the approach for addressing crash rates and maybe even performance issues.