Application hangs on replicaset node falling down

NatalliaF

I have a replica set of 2 nodes and an arbiter (all on different machines).
The node.js (express) app has a number of APIs for getting/updating etc. information both in the MongoDB database and in a database managed by another DBMS.
If I forcefully terminate one of the nodes, then Mongo Shell shows that the other node is running, it is primary, I can safely work with data in Mongo Shell. But when I try to call any API of the application, even one that does not work with MongoDB, the request seems to be sent into space: it is impossible to wait for a response, the request freezes. It feels like the application itself is hanging. But as soon as the node is restored, the application “wakes up” and returns a response.

router.get(
'/apiName',
async (req, res) => {
// I can't get even here
...
})
What could be the reason?

mongoose 7.4.1
node 14.17.1
express 4.17.1

Tried the following connection strings:

mongoURI = “mongodb://[login]:[password]@[node1_ip]:27017,[node2_ip]:27017/[db_name]?authSource=admin&replicaSet=[rs_name]&readPreference=primaryPreferred&directConnection=false”

mongoURI = “mongodb://[login]:[password]@[node1_ip]:27017,[node2_ip]:27017/[db_name]?authSource=admin&readPreference=primaryPreferred&directConnection=false”

mongoURI = “mongodb://[login]:[password]@[node1_ip]:27017,[node2_ip]:27017/[db_name]?connectTimeoutMS=30000&authSource=admin&replicaSet=[rs_name]&readPreference=primaryPreferred&directConnection=false”

Database connection in my code:
await mongoose.connect(mongoURI, {
useNewUrlParser: true,
useUnifiedTopology: true,
maxPoolSize: 10,
serverSelectionTimeoutMS: 30000,
});

The only thing that helps in such a situation is to remove the inactive node from the replica set without touching the connection string to the database. But I need to achieve automatic recovery of the application.

sean

NatalliaF when the application is hanging you can inspect its processes to see what it might be doing. To do so, first run the following command to get the process IDs of the Node processes:

pgrep -f node

Once you have the process IDs, run the following command replacing XXX and YYY with the IDs:

strace -p XXX -p YYY

You'll then see information related to the various system calls the app is making. This may give you some indication of what the problem might be.

NatalliaF

Thank you. "Process Monitor" says that when all nodes of the replica set are alive, the operations of the app are: "TCP TCPCopy", "TCP Send", "TCP Receive", results of all operations are "SUCCESS". But when one node is down, one more operation appears (in addition to the above ones): "TCP Reconnect". All operations "TCP Reconnect" have a "SUCCESS" result. Unfortunately, this does not give me any indication of what the problem might be (may be my knowledge is not enough to analyse the information).

sean

NatalliaF please email Opalstack support with your Opalstack account and app details, we'll then hop on your server and have a look.

NatalliaF

It turned out to be a «connect-mongo» library.
Maybe, someone has encountered this?
I’ll dig further.