Last week I had an interesting experience at work. An application that my team has been working on showed a weird bug with the latest build. It was very difficult to locate the source of the bug, and there were a plethora of commits between the last working build and the latest one (it’s actually terrible to let this happen, but terrible things happen all the time anyway). Finding the offending commit in this case could shrink the area that we need to look at. Lucky for us, git provides a very useful command exactly for this purpose: git bisect
. In this post we’ll examine how the command works, and some of its common operations.
We will use a hypothetical git repository with the following commits to illustrate the ideas behind git bisect
.
3acd382 (HEAD -> master) [FEATURE] Implement integration tests
4942b27 [FEATURE] Add datadog tracing
67bf061 [FEATURE] Update search algorithm
0512b9f [FEATURE] Add new indices to Users table
335961f [FEATURE] Migrate to DynamoDB
43195f0 [BUG-FIX] Hide PII from customer public info end-point
823c73d [FEATURE] New end-point to get customer public info
5c99d28 [BUG-FIX] Fix an issue with facebook login failure
9f1dbd3 [FEATURE] Twitter authentication
ef257bc [FEATURE] Facebook authentication
6e85ef8 [FEATURE] New login UI
96b3379 [FEATURE] Update register function
Assume that the last working commit that we know is 6e85ef8 [FEATURE] New login UI
and the build on the latest commit 3acd382 (HEAD -> master) [FEATURE] Implement integration tests
is buggy. There are around 10 commits between these two, so how do we find the offending commit with git bisect
? Firstly, we need to activate git bisect
mode.
git bisect start
We know the latest commit is broken, so we mark it as bad.
git bisect bad
We also mark 6e85ef8
as good because we know things are fine with that commit.
git bisect good 6e85ef8
The magic happens here: git tells us what revision to examine next. Note that git also automatically checks out the commit.
Bisecting: 4 revisions left to test after this (roughly 2 steps)
[43195f05cb990694d6399afeab809ec3098d4650] [BUG-FIX] Hide PII from customer public info end-point
What we need to do at this step is to do whatever it needs (e.g. run our unit tests, or do manual test to confirm the bug) to check if things work as expected in this commit.
Assume that 43195f0
still works fine, we can mark it as good.
git bisect good 43195f0
Now we are presented with another message from git that’s similar to the last one.
Bisecting: 2 revisions left to test after this (roughly 1 step)
[0512b9f4da2658edb5f7dc6a781f397be34eb213] [FEATURE] Add new indices to Users table
We know what to do: we need to check if [0512b9f4da2658edb5f7dc6a781f397be34eb213] [FEATURE] Add new indices to Users table
works fine. Assume that our tests fail with 0512b9f
, we mark it as bad.
git bisect bad 0512b9f
Again, git tells us which revision to examine next.
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[335961f0b5b70f8b9703f85772e02b71b4e10b4f] [FEATURE] Migrate to DynamoDB
And again, we have to test if things work in [335961f0b5b70f8b9703f85772e02b71b4e10b4f] [FEATURE] Migrate to DynamoDB
. Assume that our tests fail, we mark it as bad.
git bisect bad 335961f
Now git has enough information to tell us what commit is the culprit here.
335961f0b5b70f8b9703f85772e02b71b4e10b4f is the first bad commit
How can we be certain that this is the bad commit that causes the bug to appear in the first place? If we look at our revision history along with the marking of good or bad, we will know more about git bisect
strategy.
3acd382 (HEAD -> master) [FEATURE] Implement integration tests // 1. bad
4942b27 [FEATURE] Add datadog tracing
67bf061 [FEATURE] Update search algorithm
0512b9f [FEATURE] Add new indices to Users table // 4. bad
335961f [FEATURE] Migrate to DynamoDB // 5. bad
43195f0 [BUG-FIX] Hide PII from customer public info end-point // 3. good
823c73d [FEATURE] New end-point to get customer public info
5c99d28 [BUG-FIX] Fix an issue with facebook login failure
9f1dbd3 [FEATURE] Twitter authentication
ef257bc [FEATURE] Facebook authentication
6e85ef8 [FEATURE] New login UI // 2. good
96b3379 [FEATURE] Update register function
It appears that git bisect
works in a binary search manner. By asking us to check the mid point of the last bad and good commits, it is able to narrow down the area where the first bad commit takes place. Everytime a commit is marked as good or bad, git shows us the next point of interest, and finally comes to conclusion what the first bad commit is when there is only 1 bad commit left (step 5 above). Therefore, as long as we can ensure the marking of good and bad commits is correct, we can be certain that the result of git bisect
is legitimate. There are just roughly 10 commits in our example, so it may seem easy to do this manually. However, in a large code bases with hudreds of commits, having a tool to automate this could be extremely useful.
What if we make a mistake in marking commits? Unfortunately there’s no command to undo the marking at the moment this post is written, but there’s a way to overwrite git bisect
history. Git allows us to view the bisect history by git bisect history
. We can export git bisect
history to a temp file.
git bisect history > bisect_temp.txt
The output bisect_temp.txt
will look like this in our example.
git bisect start
# status: waiting for both good and bad commits
# bad: [3acd38216962ac4cecfe21e489b39c295cef7318] [FEATURE] Implement integration tests
git bisect bad 3acd38216962ac4cecfe21e489b39c295cef7318
# status: waiting for good commit(s), bad commit known
# good: [6e85ef8292982af7bcadc84c7487bd9fb2471b4b] [FEATURE] New login UI
git bisect good 6e85ef8292982af7bcadc84c7487bd9fb2471b4b
# good: [43195f05cb990694d6399afeab809ec3098d4650] [BUG-FIX] Hide PII from customer public info end-point
git bisect good 43195f05cb990694d6399afeab809ec3098d4650
# bad: [0512b9f4da2658edb5f7dc6a781f397be34eb213] [FEATURE] Add new indices to Users table
git bisect bad 0512b9f4da2658edb5f7dc6a781f397be34eb213
# bad: [335961f0b5b70f8b9703f85772e02b71b4e10b4f] [FEATURE] Migrate to DynamoDB
git bisect bad 335961f0b5b70f8b9703f85772e02b71b4e10b4f
# first bad commit: [335961f0b5b70f8b9703f85772e02b71b4e10b4f] [FEATURE] Migrate to DynamoDB
Let’s say that we make a mistake with [43195f05cb990694d6399afeab809ec3098d4650] [BUG-FIX] Hide PII from customer public info end-point
, it should be marked as bad instead of good. We can make changes to bisect_temp.txt
to mark it as bad manually, and then remove all lines after that. The content of bisect_temp.txt
after our modification looks like this.
git bisect start
# status: waiting for both good and bad commits
# bad: [3acd38216962ac4cecfe21e489b39c295cef7318] [FEATURE] Implement integration tests
git bisect bad 3acd38216962ac4cecfe21e489b39c295cef7318
# status: waiting for good commit(s), bad commit known
# good: [6e85ef8292982af7bcadc84c7487bd9fb2471b4b] [FEATURE] New login UI
git bisect good 6e85ef8292982af7bcadc84c7487bd9fb2471b4b
# bad: [43195f05cb990694d6399afeab809ec3098d4650] [BUG-FIX] Hide PII from customer public info end-point
git bisect bad 43195f05cb990694d6399afeab809ec3098d4650
Next, we save the file, and then run git bisect reset && git bisect replay bisect_temp.txt
. Now we can continue the process of testing our commits, and marking them as good or bad. We will eventually come to the real first bad commit this time!
Conclusion
We learned in this article how to use git bisect
to identify the first offending commit when it’s difficult to locate the bug in our code base. git bisect
a very useful tool that a developer should know, but beware of abusing it because it’s not always the best tool to debug. It can also be noted that at any point during our bisect operation it’s possible to stop it by git bisect reset
. Finally git bisect
actually provides some other useful functions, which can be found here.