How We Enabled a Better Code Search Experience on Top of Gerrit
Gerrit is a great code review tool. I find it much better than GitHub, at least for a single repository. Many don’t like Gerrit because they’ve only ever experienced it with AOSP and find it hard to work with the repo tool, not necessarily Gerrit itself. Gerrit has a great and well documented API surface. It has enabled a large ecosystem of plugins with plenty of integrations with third parties. Unfortunately, the repository browser that comes with Gerrit (gitiles) leaves a lot to be desired.
The biggest advantage that gitiles has is that it follows the ACLs set in gerrit exactly. So, if you have something complex setup there, there aren’t many alternatives out there. However, Google Cloud Source Repository can be very useful if you have a simple setup.
I recently came across an excellent post by Alex Saveau about using Google Code Search to improve the search provided by GitHub. This seemed like an excellent tool for us at Zendrive, especially when gitiles provides no search. I quickly got an approval to experimenting with this from our security expert Chandrakanth. Code is sensitive and I didn’t want to take any chances in leaking ours.
Setting up Google Code Search
To set it up, I started with https://source.cloud.google.com/ and created a new
repository. This has an option to link to GitHub or Bitbucket, but since we
didn’t want that, I chose “Create new repository”. I gave it a name (zendrive
)
and chose to create a new GCloud project (cs-experiment
) as well. Since this
was just an experiment, I didn’t want to mess with any existing projects. You
will need to have a credit card, despite the free quotas being very generous.
Once the billing is setup, the repository is created and ready to be used.
To add the code, I chose to push it directly from my machine instead of configuring Gerrit to mirror to it automatically. Now, I had to figure out a way to push all commits that exist on Gerrit, but not my local unsubmitted changes. So, I did the following:
# Replace the email id, project id and repo name with your own
git remote add google ssh://deepanshu@zendrive.com@source.developers.google.com:2022/p/cs-experiment-282017/r/zendrive
git push google --tags refs/remotes/origin/*:refs/heads/*
It took a few hours to index everything. But things were up and running. I went to settings and added a couple of people as “Source Repository Reader”. They tried it out and all looked good. Next step was to add mirroring via Gerrit.
Gerrit has an excellent
replication plugin.
We already use it to mirror various repositories to our
GitHub page. I logged onto the machine and added
the following to <site>/etc/replication.config
:
[remote "cs-experiment"]
url = <repository URL>
push = +refs/heads/*:refs/heads/*
push = +refs/tags/*:refs/tags/*
projects = zendrive
Before Gerrit is able to replicate, 2 more things need to be done. First, I got
Gerrit’s public key and added it to my account on GCloud. I let Gerrit authorize
itself to GCloud as myself. But since there isn’t any other info present, it
wasn’t a big deal. Next, I had to add GCloud’s machine id to
~/.ssh/known_hosts
on the Gerrit machine. For that, I did
ssh -p2022 deepanshu@zendrive.com@source.developers.google.com
This asks for a confirmation. Say yes and it’s done. I reloaded the replication plugin on gerrit, so that it picks up the above configuration by running the following from my local system:
ssh <gerrit site url> gerrit plugin reload replication
This used my authorization and told Gerrit to read the replication config again. And sure enough, seconds later, newer changes started showing up on Code Search.
Now that all was setup, there was just one issue. Pushing everything from my
local machine ended up creating a remote branch name HEAD
. This is because
HEAD
is supposed to be a special pointer to a branch which indicates the
default branch to use in operations like git clone
. This is the same pointer
that’s needs to updated if you want to
rename the default branch
from master. On my local system refs/remotes/origin/HEAD
pointed to
origin/master
. But when doing the git push
according to the above refspec,
it pushed this HEAD
to refs/heads/HEAD
on Google Cloud. This ended up
creating a regular branch there. This confused the site about what the default
branch for the repository is supposed to be. Once I deleted it via git push google --delete HEAD
, everything worked beautifully.
Productionizing the setup
I was able to demo the above to a few others in the company. After validating that this will indeed be useful, it was time to productionize it and roll it out to everyone in the company. I deleted the above repo so that I could create it at the right place. We already have a GCloud project that is used for OAuth login in Gerrit. I created a new repository in the same project. I also deleted Gerrit’s ssh key from my account. I reused the billing project from earlier and removed the “experiment” from it.
The GCloud project is owned by an account shared by the engineering team.
This is done so that if any individual leaves the company, they don’t have to
struggle to figure out all the resources they own and pass it along. I added
Gerrit’s ssh key id_rsa.pub
to the shared developer account. Everything was
almost done. The last thing I needed was to update the replication config again.
[remote "codesearch"]
url = <updated repository URL>
push = +refs/heads/*:refs/heads/*
push = +refs/tags/*:refs/tags/*
projects = zendrive
mirror = true
authGroup = <group>
A few things of note here:
- I added the
mirror = true
attribute to indicate that branch deletions should be replicated. - I also added an
authGroup
attribute. This restricts the branches mirrored to be the ones that everyone in that group could see. Hence, any protected and private branches were not mirrored.
And voila, we have an amazing code search tool built on top of Gerrit now.
Lastly, thanks to Vaibhav Gupta and Surya Shekhar Chakraborty for proof reading this.