How We Enabled a Better Code Search Experience on Top of Gerrit

Colleagues working together

Photo by Annie Spratt from Unsplash

Gerrit is a great code review tool. I find it much better than GitHub, at least for a single repository. Many don’t like Gerrit because they’ve only ever experienced it with AOSP and find it hard to work with the repo tool, not necessarily Gerrit itself. Gerrit has a great and well documented API surface. It has enabled a large ecosystem of plugins with plenty of integrations with third parties. Unfortunately, the repository browser that comes with Gerrit (gitiles) leaves a lot to be desired.

The biggest advantage that gitiles has is that it follows the ACLs set in gerrit exactly. So, if you have something complex setup there, there aren’t many alternatives out there. However, Google Cloud Source Repository can be very useful if you have a simple setup.

I recently came across an excellent post by Alex Saveau about using Google Code Search to improve the search provided by GitHub. This seemed like an excellent tool for us at Zendrive, especially when gitiles provides no search. I quickly got an approval to experimenting with this from our security expert Chandrakanth. Code is sensitive and I didn’t want to take any chances in leaking ours.

Gitiles/Goole Code Search Comparison

Gitiles (left) vs Google Code Search (right)

To set it up, I started with and created a new repository. This has an option to link to GitHub or Bitbucket, but since we didn’t want that, I chose “Create new repository”. I gave it a name (zendrive) and chose to create a new GCloud project (cs-experiment) as well. Since this was just an experiment, I didn’t want to mess with any existing projects. You will need to have a credit card, despite the free quotas being very generous. Once the billing is setup, the repository is created and ready to be used.

To add the code, I chose to push it directly from my machine instead of configuring Gerrit to mirror to it automatically. Now, I had to figure out a way to push all commits that exist on Gerrit, but not my local unsubmitted changes. So, I did the following:

# Replace the email id, project id and repo name with your own
git remote add google ssh://
git push google --tags refs/remotes/origin/*:refs/heads/*

It took a few hours to index everything. But things were up and running. I went to settings and added a couple of people as “Source Repository Reader”. They tried it out and all looked good. Next step was to add mirroring via Gerrit.

Gerrit has an excellent replication plugin. We already use it to mirror various repositories to our GitHub page. I logged onto the machine and added the following to <site>/etc/replication.config:

[remote "cs-experiment"]
	url = <repository URL>
	push = +refs/heads/*:refs/heads/*
	push = +refs/tags/*:refs/tags/*
	projects = zendrive

Before Gerrit is able to replicate, 2 more things need to be done. First, I got Gerrit’s public key and added it to my account on GCloud. I let Gerrit authorize itself to GCloud as myself. But since there isn’t any other info present, it wasn’t a big deal. Next, I had to add GCloud’s machine id to ~/.ssh/known_hosts on the Gerrit machine. For that, I did

ssh -p2022

This asks for a confirmation. Say yes and it’s done. I reloaded the replication plugin on gerrit, so that it picks up the above configuration by running the following from my local system:

ssh <gerrit site url> gerrit plugin reload replication

This used my authorization and told Gerrit to read the replication config again. And sure enough, seconds later, newer changes started showing up on Code Search.

Now that all was setup, there was just one issue. Pushing everything from my local machine ended up creating a remote branch name HEAD. This is because HEAD is supposed to be a special pointer to a branch which indicates the default branch to use in operations like git clone. This is the same pointer that’s needs to updated if you want to rename the default branch from master. On my local system refs/remotes/origin/HEAD pointed to origin/master. But when doing the git push according to the above refspec, it pushed this HEAD to refs/heads/HEAD on Google Cloud. This ended up creating a regular branch there. This confused the site about what the default branch for the repository is supposed to be. Once I deleted it via git push google --delete HEAD, everything worked beautifully.

Productionizing the setup

I was able to demo the above to a few others in the company. After validating that this will indeed be useful, it was time to productionize it and roll it out to everyone in the company. I deleted the above repo so that I could create it at the right place. We already have a GCloud project that is used for OAuth login in Gerrit. I created a new repository in the same project. I also deleted Gerrit’s ssh key from my account. I reused the billing project from earlier and removed the “experiment” from it.

The GCloud project is owned by an account shared by the engineering team. This is done so that if any individual leaves the company, they don’t have to struggle to figure out all the resources they own and pass it along. I added Gerrit’s ssh key to the shared developer account. Everything was almost done. The last thing I needed was to update the replication config again.

[remote "codesearch"]
	url = <updated repository URL>
	push = +refs/heads/*:refs/heads/*
	push = +refs/tags/*:refs/tags/*
	projects = zendrive
    mirror = true
    authGroup = <group>

A few things of note here:

  1. I added the mirror = true attribute to indicate that branch deletions should be replicated.
  2. I also added an authGroup attribute. This restricts the branches mirrored to be the ones that everyone in that group could see. Hence, any protected and private branches were not mirrored.

And voila, we have an amazing code search tool built on top of Gerrit now.

Code Search announce

Announcement of the launch

Lastly, thanks to Vaibhav Gupta and Surya Shekhar Chakraborty for proof reading this.