in Gerrit

Random startup failures on Gerrit instance in cloud

Lately, I’ve been playing around with testing the various main components of OpenStack Infra, namely the puppet manifests.

I ran into an interesting problem last week where starting Gerrit would work on first try, then it would fail afterwards.

The interesting thing is that if I increased the timeout value of the Gerrit init upstart script to some ludicrous high value (900 seconds), it would eventually start at some point.

I thought it could be due to upstream using a forked Gerrit version, but the git diff showed the differences were minimal.

As I was trying this Gerrit test on a HP Cloud instance, I tried running it on my rusty but still working home server on a Vagrant VM.

Turned out it would start and stop immediately without any problems, thus the problem clearly had something up with running it on a cloud instance.

I shared the problem with my colleagues and one of them said ‘hey, this could be something about entropy’.

Suddenly, something clicked on my mind and I remembered that in upstream the Nodepool images had haveged package baked in, thus I did an apt-get install haveged and voila, Gerrit would start and stop without ANY problems.

P.S. Thanks to my colleague Nicola Heald for putting me on track to resolution on this problem, I spent a whole morning doing all sorts of testing and didn’t think about entropy!

Be Sociable, Share!

Write a Comment