So recently, at Grafana Labs, we had a re-org of the R&D department! People might look at that negatively but I think its a great idea and very well executed. It was done to fix structural problems and to set us up for the next phase of growth.

Context

Grafana Labs has grown really fast. I started at the company with ~30 people in March 2018 and now, in August 2020 we're 180+ people. That is 6x growth in 2.5 years and while that is fast in itself, one thing to keep in mind is that we didn't grow that much in 2018 itself. A lot of the growth actually started late 2018 but really only accelerated in 2019.

This came in many different forms, we've added a sales org, we were running with maybe 3 people doing sales when I joined. We've added a solid marketing department, a customer support department but most importantly we've added a lot of products. When I joined, I helped launch the hosted Prometheus service, and soon after we've launched Loki. Then maybe about 6 months ago we started to see a lot of growth, we decided to add a synthetics monitoring product and focus on building "enterprise" versions of our OSS projects to have feature differentiation. All of this involved a lot of growth in numbers and teams, but this is only just the backend platform. The Grafana team had a similar but much larger growth.

Now a lot of this growth came organically, and there in lies the problem.

Issues

The organic growth meant existing teams were increased and new teams were added. And as we were adding more products that integrated together, we were noticing more and more communication and priority mismatch which was slowing us down and causing frustration. One thing we noticed that if a project is self-contained within a team, we were executing really well, but the moment the dependencies on other teams were introduced, our productivity suffered.

Let me give an example, I was involved with a project that involved 3 different teams, one of them being the API platform team that managed the APIs, UIs and admin panels for grafana.com. They were working on a refactoring to allow for future growth in our platform and unifying things across our different products. But on top of that, each product team had feature requests for them. How would they prioritise between 3 different product teams vying for their attention and their own refactoring and platform work?

On top of this, the different product teams were under different managers and different leadership, this further made alignment and priorities a difficult job. The platform team was doing a great job, but still because of missing alignment, some of the product teams were blocked on them and this was leading to slower execution. This was only going to get worse as we massively increase our headcount in the future, and we needed to fix this quick!

Solution

Now, imo, this is to be expected in a fast growing organisation. We were adding new teams and products quite quickly and it was difficult to forsee things. The best way to architect an org is to make sure it works well today and that was what we did. But as we grew, the teams grew further apart which didn't help. To fix this, the re-org put all the related teams under a single leader. Finally, a lot of care was taken to reduce the disruption to the ICs, it was mainly a re-org in the senior leadership team that fixed which manager reported to who in the SLT and what each SLT member had ownership of. Nothing much changed for the ICs :)

Today, everything Grafana Cloud related is under Tom Wilkie, while before it was Tom and 2 other people holding the pieces. We've similarly split other teams (enterprise and Grafana for example) to be nearer to who they interact with and I think this is going to help a ton! Now, is this better than structure before, I think so. Does it fix all the problems of before and will execution be faster? I hope so, but I'll write a follow up in 6 months :)

Execution

More than the solution itself, which seems obvious in hindsight, I wanted to share more about how it was executed. As most things inside Grafana, it started out as a design doc! The design doc laid out clear reasoning, and the approach we were taking. And it even had a nice FAQ section!

After the design doc was written and once stakeholders were happy with the state of it, it was presented to the leadership team for their buy in. Once they decided to execute it, they decided to do it pretty quickly, in a week!

The managers were notified first, about their new responsibilities (they didn't change much, mainly the leadership changed), and how this would affect each of their reports. Once the managers were notified, they made sure to talk to all their reports starting Monday of the week it was decided to be executed. This was to make sure that anyone who was impacted heard about it from their managers first - rather than a side channel like Slack. I got this nice message from Dee, my manager, and in the call he outlined what was changing why it was changing, and how I was affected (I wasn't). He made it absolutely clear that this was to allow us to grow and did not involve any layoffs! Also, this is when the ICs, including me, first knew about this. Everything until now, was happening in the background.

Once everyone already knew about this from their managers, the Friday of the week, we had an R&D all-hands where the leadership (including the CEO) explained the reasoning behind it and put any questions and concerns at ease. And starting Monday, the reporting lines and teams were changed and I'm not sure anyone would be discussing the re-org after the following week.

I liked that it was quick, maybe a little too quick, and that the R&D all-hands was not a surprise to anyone. We talked to our managers first and had most of our questions and concerns addressed long before the all-hands and we had time to digest it. I loved the all-hands because the leadership team was there to answer the questions and it was clear that this decision was not taken lightly.

Overall, this was my first reorg, and I think it went well. Still need to see if it's effective, but only time will tell :)