Last week I had an interesting experience at work. An application that my team has been working on showed a weird bug with the latest build. It was very difficult to locate the source of the bug, and there were a plethora of commits between the last working build and the latest one (it’s actually terrible to let this happen, but terrible things happen all the time anyway). Finding the offending commit in this case could shrink the area that we need to look at. Lucky for us, git provides a very useful command exactly for this purpose: git bisect. In this post we’ll examine how the command works, and some of its common operations.

We will use a hypothetical git repository with the following commits to illustrate the ideas behind git bisect.

3acd382 (HEAD -> master) [FEATURE] Implement integration tests
4942b27 [FEATURE] Add datadog tracing
67bf061 [FEATURE] Update search algorithm
0512b9f [FEATURE] Add new indices to Users table
335961f [FEATURE] Migrate to DynamoDB
43195f0 [BUG-FIX] Hide PII from customer public info end-point
823c73d [FEATURE] New end-point to get customer public info
5c99d28 [BUG-FIX] Fix an issue with facebook login failure
9f1dbd3 [FEATURE] Twitter authentication
ef257bc [FEATURE] Facebook authentication
6e85ef8 [FEATURE] New login UI
96b3379 [FEATURE] Update register function

Assume that the last working commit that we know is 6e85ef8 [FEATURE] New login UI and the build on the latest commit 3acd382 (HEAD -> master) [FEATURE] Implement integration tests is buggy. There are around 10 commits between these two, so how do we find the offending commit with git bisect? Firstly, we need to activate git bisect mode.

git bisect start

We know the latest commit is broken, so we mark it as bad.

git bisect bad

We also mark 6e85ef8 as good because we know things are fine with that commit.

git bisect good 6e85ef8

The magic happens here: git tells us what revision to examine next. Note that git also automatically checks out the commit.

Bisecting: 4 revisions left to test after this (roughly 2 steps)
[43195f05cb990694d6399afeab809ec3098d4650] [BUG-FIX] Hide PII from customer public info end-point

What we need to do at this step is to do whatever it needs (e.g. run our unit tests, or do manual test to confirm the bug) to check if things work as expected in this commit. Assume that 43195f0 still works fine, we can mark it as good.

git bisect good 43195f0

Now we are presented with another message from git that’s similar to the last one.

Bisecting: 2 revisions left to test after this (roughly 1 step)
[0512b9f4da2658edb5f7dc6a781f397be34eb213] [FEATURE] Add new indices to Users table

We know what to do: we need to check if [0512b9f4da2658edb5f7dc6a781f397be34eb213] [FEATURE] Add new indices to Users table works fine. Assume that our tests fail with 0512b9f, we mark it as bad.

git bisect bad 0512b9f

Again, git tells us which revision to examine next.

Bisecting: 0 revisions left to test after this (roughly 0 steps)
[335961f0b5b70f8b9703f85772e02b71b4e10b4f] [FEATURE] Migrate to DynamoDB

And again, we have to test if things work in [335961f0b5b70f8b9703f85772e02b71b4e10b4f] [FEATURE] Migrate to DynamoDB. Assume that our tests fail, we mark it as bad.

git bisect bad 335961f

Now git has enough information to tell us what commit is the culprit here.

335961f0b5b70f8b9703f85772e02b71b4e10b4f is the first bad commit

How can we be certain that this is the bad commit that causes the bug to appear in the first place? If we look at our revision history along with the marking of good or bad, we will know more about git bisect strategy.

3acd382 (HEAD -> master) [FEATURE] Implement integration tests 				// 1. bad
4942b27 [FEATURE] Add datadog tracing
67bf061 [FEATURE] Update search algorithm
0512b9f [FEATURE] Add new indices to Users table                            // 4. bad
335961f [FEATURE] Migrate to DynamoDB                                       // 5. bad
43195f0 [BUG-FIX] Hide PII from customer public info end-point              // 3. good
823c73d [FEATURE] New end-point to get customer public info
5c99d28 [BUG-FIX] Fix an issue with facebook login failure
9f1dbd3 [FEATURE] Twitter authentication
ef257bc [FEATURE] Facebook authentication
6e85ef8 [FEATURE] New login UI                                              // 2. good
96b3379 [FEATURE] Update register function

It appears that git bisect works in a binary search manner. By asking us to check the mid point of the last bad and good commits, it is able to narrow down the area where the first bad commit takes place. Everytime a commit is marked as good or bad, git shows us the next point of interest, and finally comes to conclusion what the first bad commit is when there is only 1 bad commit left (step 5 above). Therefore, as long as we can ensure the marking of good and bad commits is correct, we can be certain that the result of git bisect is legitimate. There are just roughly 10 commits in our example, so it may seem easy to do this manually. However, in a large code bases with hudreds of commits, having a tool to automate this could be extremely useful.

What if we make a mistake in marking commits? Unfortunately there’s no command to undo the marking at the moment this post is written, but there’s a way to overwrite git bisect history. Git allows us to view the bisect history by git bisect history. We can export git bisect history to a temp file.

git bisect history > bisect_temp.txt

The output bisect_temp.txt will look like this in our example.

git bisect start
# status: waiting for both good and bad commits
# bad: [3acd38216962ac4cecfe21e489b39c295cef7318] [FEATURE] Implement integration tests
git bisect bad 3acd38216962ac4cecfe21e489b39c295cef7318
# status: waiting for good commit(s), bad commit known
# good: [6e85ef8292982af7bcadc84c7487bd9fb2471b4b] [FEATURE] New login UI
git bisect good 6e85ef8292982af7bcadc84c7487bd9fb2471b4b
# good: [43195f05cb990694d6399afeab809ec3098d4650] [BUG-FIX] Hide PII from customer public info end-point
git bisect good 43195f05cb990694d6399afeab809ec3098d4650
# bad: [0512b9f4da2658edb5f7dc6a781f397be34eb213] [FEATURE] Add new indices to Users table
git bisect bad 0512b9f4da2658edb5f7dc6a781f397be34eb213
# bad: [335961f0b5b70f8b9703f85772e02b71b4e10b4f] [FEATURE] Migrate to DynamoDB
git bisect bad 335961f0b5b70f8b9703f85772e02b71b4e10b4f
# first bad commit: [335961f0b5b70f8b9703f85772e02b71b4e10b4f] [FEATURE] Migrate to DynamoDB

Let’s say that we make a mistake with [43195f05cb990694d6399afeab809ec3098d4650] [BUG-FIX] Hide PII from customer public info end-point, it should be marked as bad instead of good. We can make changes to bisect_temp.txt to mark it as bad manually, and then remove all lines after that. The content of bisect_temp.txt after our modification looks like this.

git bisect start
# status: waiting for both good and bad commits
# bad: [3acd38216962ac4cecfe21e489b39c295cef7318] [FEATURE] Implement integration tests
git bisect bad 3acd38216962ac4cecfe21e489b39c295cef7318
# status: waiting for good commit(s), bad commit known
# good: [6e85ef8292982af7bcadc84c7487bd9fb2471b4b] [FEATURE] New login UI
git bisect good 6e85ef8292982af7bcadc84c7487bd9fb2471b4b
# bad: [43195f05cb990694d6399afeab809ec3098d4650] [BUG-FIX] Hide PII from customer public info end-point
git bisect bad 43195f05cb990694d6399afeab809ec3098d4650

Next, we save the file, and then run git bisect reset && git bisect replay bisect_temp.txt. Now we can continue the process of testing our commits, and marking them as good or bad. We will eventually come to the real first bad commit this time!

Conclusion

We learned in this article how to use git bisect to identify the first offending commit when it’s difficult to locate the bug in our code base. git bisect a very useful tool that a developer should know, but beware of abusing it because it’s not always the best tool to debug. It can also be noted that at any point during our bisect operation it’s possible to stop it by git bisect reset. Finally git bisect actually provides some other useful functions, which can be found here.