It turns out that the functionality that uses this is deprecated as of early
2014, so this supposedly isn’t an issue with newer puppet installs. However, if
you’re using an older puppet (3.0 or older), you might run into this problem.
The problem lies in the database schema for the puppet console. Basically,
every time a node checks in, it inserts a row into the database. The database
has some tables with columns that auto-increment (0, 1, 2, 3, etc). If you have
a lot of nodes reporting back frequently, this number will likely increase a
lot over time. In our case, we have 333 nodes reporting every 30 minutes or
more (we do development and thus we often manually run puppet agent with the -t
switch). In our case, to hit 37,000, it would have taken a little over 2 days
(30*(24*60)*333 = 1 day’s checkin count)
The columns that autoincrement use the int datatype. This datatype, as seen
here, uses 4
bytes. In case anyone doesn’t remember, there are 8 bits in a byte, which means
that 4 * 8 = 32
. That means that the maximum number that will fit
in any column with the int data type is 2(32-1)
, which equals
2,147,483,648. That means 2 billion puppet reports. It seems like a number not
easy to achieve, but it is quite possible - we did it.
The solution here is to change the data type on the columns in concern to be
bigint rather than integer. Again, as documented by the postgres folks
here, a bigint
is 8 bytes, which is a 64 bit number. That means the largest it can hold is
9,223,372,036,854,775,807 (about 9 quintillion). That said, let’s get to it.