Tuesday, November 20, 2012

Better tracing of Npgsql connection pool usage in the works

Sometimes, Npgsql users receive errors when working with connection pooling. The problem appears when they try to open a new connection and receive the following message: 

"Timeout while getting a connection from pool."

This is caused when an attempt is made to get a connection from a pool which  has all its connections used. Most of the time this problem is a difficult one to track because it generally happens when the system is in production and not in the development phase. Of course. :)

Some time ago, I received a report from Miłosz Kubański about such a problem. I told him I would work in a way to get more information so we could check what was happening.

In order to help us find the problem, I added a little "hack" to Npgsql: Whenever an error while getting a connection from the pool occurs, Npgsql would log a stacktrace of the allocation of all the connections which were in the pool. Theoretically, those connections which were allocated should not be retained and by taking a "snapshot" of those allocations, we could get some tips about which code allocated the connection and check for potential missing releases.

Last week I sent Milosz this modified version of Npgsql so he could give it a try and send me back the log. Yesterday he sent me the log. Inside it, I chose two stacktraces and asked him if would be possible there was any missing close.

Milosz replied saying there was indeed a missing close call. It was an exception inside a very complex system. 

Based on this feedback, I think this "hack" has shown to be very useful in future cases and I'll work to get this change in a "non-hack" status and add it to Npgsql code. I hope this helps Npgsql users to find possible causes for connection pooling problems.

When this feature is available I'll let you know.