Error: read ECONNRESET

hassansardarlecodeur · July 18, 2021, 11:15am

I am suddenly getting this error & there is no way I am able to trace the issue, my server keeps crashing again & again, and I haven’t done any changes in months for sure.

events.js:288
      throw er; // Unhandled 'error' event
      ^

Error: read ECONNRESET
    at TCP.onStreamRead (internal/stream_base_commons.js:205:27)
Emitted 'error' event on Socket instance at:
    at emitErrorNT (internal/streams/destroy.js:92:8)
    at emitErrorAndCloseNT (internal/streams/destroy.js:60:3)
    at processTicksAndRejections (internal/process/task_queues.js:84:21) {
  errno: 'ECONNRESET',
  code: 'ECONNRESET',
  syscall: 'read'
}

Anyone facing this issue or have solution to the crash?

vikr00001 · September 8, 2022, 7:19am

Did you solve this yet by any chance?

I get this too and I have no clue why yet.

2022-09-07 19:45:00-04:00events.js:3774qbcw

2022-09-07 19:45:00-04:00 throw er; // Unhandled ‘error’ event4qbcw

2022-09-07 19:45:00-04:00 ^4qbcw

2022-09-07 19:45:00-04:004qbcw

2022-09-07 19:45:00-04:00Error: read ECONNRESET4qbcw

2022-09-07 19:45:00-04:00 at TCP.onStreamRead (internal/stream_base_commons.js:209:20)4qbcw

2022-09-07 19:45:00-04:00Emitted ‘error’ event on Client instance at:4qbcw

2022-09-07 19:45:00-04:00 at Client._handleErrorEvent (/app/bundle/programs/server/npm/node_modules/pg/lib/client.js:319:10)4qbcw

2022-09-07 19:45:00-04:00 at Connection.emit (events.js:400:28)4qbcw

2022-09-07 19:45:00-04:00 at Connection.emit (domain.js:475:12)

hschmaiske · September 8, 2022, 8:43am

It seems like a connection issue with the DB. Are you using Mongo or Postgres? I saw a /pg/lib/client.js there

vikr00001 · September 8, 2022, 1:54pm

This is a really good clue! Looking at my server logs see a lot of DB access at the same micro-second:

 4qbcw
2022-09-07 19:44:00-04:00 SyncedCron: Finished "cron_job_to_let_users_know_they_have_been_approved_and_need_to_onboard_with_payroll". 4qbcw
2022-09-07 19:44:00-04:00 Executing (default): UPDATE "cron_job_runs" SET "finishedAt"=$1,"result"=$2,"updatedAt"=$3 WHERE "id" = $4 RETURNING "id","createdAt","updatedAt","error","finishedAt","intendedAt","name","result","startedAt" 4qbcw
2022-09-07 19:45:00-04:00 Executing (default): INSERT INTO "cron_job_runs" ("createdAt","updatedAt","intendedAt","name","startedAt") VALUES ($1,$2,$3,$4,$5) RETURNING "id","createdAt","updatedAt","error","finishedAt","intendedAt","name","result","startedAt"; 4qbcw
2022-09-07 19:45:00-04:00 Executing (default): INSERT INTO "cron_job_runs" ("createdAt","updatedAt","intendedAt","name","startedAt") VALUES ($1,$2,$3,$4,$5) RETURNING "id","createdAt","updatedAt","error","finishedAt","intendedAt","name","result","startedAt"; 4qbcw
2022-09-07 19:45:00-04:00 SyncedCron: Starting "cron_job_to_find_users_who_have_gone_offline". 4qbcw
2022-09-07 19:45:00-04:00 Executing (default): INSERT INTO "cron_job_runs" ("createdAt","updatedAt","intendedAt","name","startedAt") VALUES ($1,$2,$3,$4,$5) RETURNING "id","createdAt","updatedAt","error","finishedAt","intendedAt","name","result","startedAt"; 4qbcw
2022-09-07 19:45:00-04:00 SyncedCron: Starting "cron_job_to_retrieve_data_from_api". 4qbcw
2022-09-07 19:45:00-04:00 Executing (default): UPDATE "userData" SET "user_presence"=$1,"updatedAt"=$2 WHERE (("user_presence" = $3 OR "user_presence" = $4) AND "user_presence_time_of_last_update" < '2022-09-07 23:25:00.180 +00:00') RETURNING "id","epUserNumber","createdAt","updatedAt","admin_status","braintree_customer_id","credit_card_info_has_been_accepted","deviceData","email","gender","home_city","home_postcode","home_state","home_street","name_first","name_last","name_title","nonce","phone","picture_large","picture_medium","picture_thumbnail","rating","time_of_nonce","ucare_uuid_main_photo","user_presence","user_presence_time_of_last_update","user_presence_timezone","accepted_user_terms_and_privacy_terms","date_accepted_user_terms_and_privacy_terms" 4qbcw
2022-09-07 19:45:00-04:00 Executing (default): SELECT "id", "epAdvisorNumber", "createdAt", "updatedAt", "image_name", "street_address", "city", "home_state", "zip", "gender", "education", "userId", "name_first", "name_last", "ucare_uuid_main_photo", "ucare_uuid_drivers_license", "ucare_uuid_current_Paystub_1", "ucare_uuid_current_Paystub_2", "yearsNursing", "license_number", "license_number_michigan", "license_number_pennsylvania", "license_type", "licenses_all", "bio", "careerHighlights", "myGoalsForYourCare", "name_mi", "phone_1", "specialties", "nursing_certification", "this_nurse_is_cgi", "has_signed_fcra_auth", "has_signed_fcra_disclosure", "date_signed_the_two_fcra_docs", "nursys_response_status", "nursys_response_manage_nurses_full_xml", "nursys_submit_lookup_transaction_id", "nursys_response_retrieve_lookup_full_xml", "ssn_all_digits", "birth_year", "retired", "approval_status", "approval_status_concerns", "hourly_rate_deduced_from_paystub", "hourly_rate_ep_is_paying_this_nurse", "availability_status", "let_existing_patients_im_me_when_on_vacation", "vacation_start_date", "date_returning_from_vacation", "accepted_nurse_terms", "date_accepted_nurse_terms" FROM "advisors" AS "advisors" WHERE ("advisors"."nursys_response_status" >= 100 AND "advisors"."nursys_response_status" < 300); 4qbcw
2022-09-07 19:45:00-04:00 SyncedCron: Starting "cron_job_to_let_users_know_they_have_been_approved_and_need_to_onboard_with_payroll". 4qbcw
2022-09-07 19:45:00-04:00 Executing (default): SELECT "advisors"."id", "advisors"."userId", "advisors"."epAdvisorNumber", "advisors"."approval_status", "advisors"."approval_status_concerns", "advisors"."name_first", "advisors"."name_last", "advisors"."hourly_rate_ep_is_paying_this_nurse", "advisors"."hourly_rate_deduced_from_paystub", "userData"."id" AS "userData.id", "userData"."epUserNumber" AS "userData.epUserNumber", "userData"."createdAt" AS "userData.createdAt", "userData"."updatedAt" AS "userData.updatedAt", "userData"."admin_status" AS "userData.admin_status", "userData"."braintree_customer_id" AS "userData.braintree_customer_id", "userData"."credit_card_info_has_been_accepted" AS "userData.credit_card_info_has_been_accepted", "userData"."deviceData" AS "userData.deviceData", "userData"."email" AS "userData.email", "userData"."gender" AS "userData.gender", "userData"."home_city" AS "userData.home_city", "userData"."home_postcode" AS "userData.home_postcode", "userData"."home_state" AS "userData.home_state", "userData"."home_street" AS "userData.home_street", "userData"."name_first" AS "userData.name_first", "userData"."name_last" AS "userData.name_last", "userData"."name_title" AS "userData.name_title", "userData"."nonce" AS "userData.nonce", "userData"."phone" AS "userData.phone", "userData"."picture_large" AS "userData.picture_large", "userData"."picture_medium" AS "userData.picture_medium", "userData"."picture_thumbnail" AS "userData.picture_thumbnail", "userData"."rating" AS "userData.rating", "userData"."time_of_nonce" AS "userData.time_of_nonce", "userData"."ucare_uuid_main_photo" AS "userData.ucare_uuid_main_photo", "userData"."user_presence" AS "userData.user_presence", "userData"."user_presence_time_of_last_update" AS "userData.user_presence_time_of_last_update", "userData"."user_presence_timezone" AS "userData.user_presence_timezone", "userData"."accepted_user_terms_and_privacy_terms" AS "userData.accepted_user_terms_and_privacy_terms", "userData"."date_accepted_user_terms_and_privacy_terms" AS "userData.date_accepted_user_terms_and_privacy_terms" FROM "advisors" AS "advisors" LEFT OUTER JOIN "userData" AS "userData" ON "advisors"."userId" = "userData"."id" WHERE "advisors"."approval_status" = 400 ORDER BY "advisors"."id" ASC; 4qbcw
2022-09-07 19:45:00-04:00 SyncedCron: Finished "cron_job_to_retrieve_data_from_api". 4qbcw
2022-09-07 19:45:00-04:00 Executing (default): UPDATE "cron_job_runs" SET "finishedAt"=$1,"result"=$2,"updatedAt"=$3 WHERE "id" = $4 RETURNING "id","createdAt","updatedAt","error","finishedAt","intendedAt","name","result","startedAt" 4qbcw
2022-09-07 19:45:00-04:00 usersWhoWereUpdated.length: 2 4qbcw
2022-09-07 19:45:00-04:00 SyncedCron: Finished "cron_job_to_find_users_who_have_gone_offline". 4qbcw
2022-09-07 19:45:00-04:00 Executing (default): UPDATE "cron_job_runs" SET "finishedAt"=$1,"result"=$2,"updatedAt"=$3 WHERE "id" = $4 RETURNING "id","createdAt","updatedAt","error","finishedAt","intendedAt","name","result","startedAt" 4qbcw
2022-09-07 19:45:00-04:00 events.js:377 4qbcw
2022-09-07 19:45:00-04:00  throw er; // Unhandled 'error' event 4qbcw
2022-09-07 19:45:00-04:00  ^ 4qbcw
2022-09-07 19:45:00-04:00  4qbcw
2022-09-07 19:45:00-04:00 Error: read ECONNRESET 4qbcw
2022-09-07 19:45:00-04:00  at TCP.onStreamRead (internal/stream_base_commons.js:209:20) 4qbcw
2022-09-07 19:45:00-04:00 Emitted 'error' event on Client instance at: 4qbcw
2022-09-07 19:45:00-04:00  at Client._handleErrorEvent (/app/bundle/programs/server/npm/node_modules/pg/lib/client.js:319:10)

These are all running on async/await background threads. Could I have exceeded some sort of limit on simultaneous PostGres connections?

Also, to answer your question, I’m running Mongo hosted on Mongodb.com for Meteor Accounts, and PostGres on AWS for everything else, accessed via ApolloGraphQL. My app is hosted on a Galaxy professional plan with almost no clients using it as I’m in soft beta and haven’t launched yet.

vikr00001 · September 8, 2022, 3:35pm

Galaxy support agrees that it seems like a DB connection issue. I have a support ticket in to AWS to see what they can tell me. I’ll post what I learn here.

vikr00001 · September 11, 2022, 3:56am

I got a fantastic response from AWS tech support about this. If I were giving out awards, this would win for best tech support response of the year. Note: I legit mean it - I’m not being snarky.

It’s so helpful that I’m including the whole thing here:

Dear Customer,

Warm Greetings from AWS Premium Support. Hope you are safe and doing well!!

This is Vidit here from the AWS premium support team and I will be assisting you with your case today.

I understand that connection to your RDS PostgreSQL instance was dropped unexpectedly and you want to know why the issue occured. Please correct my understanding if required.

So, to assist you with your query, I looked into the logs you provided and also the database metrices and network connectivity to your instance and would like to mention that everything looked fine and the issue was not due to limit on max connections.

To further analyze the root cause, I reached out to the internal team and found that the issue was: “could not receive data from client: Connection reset by peer”.

There is no issue observed from RDS infrastructure end in the given timestamp which can be pointed out as the root cause of the issue. Also talking about the error:

i.e., “could not receive data from client: Connection reset by peer” → This is usually not a database issue, rather because your client application dropped the connection to the database and postgres logged it.

This might also happen when a client connection is not terminated gracefully.

I request you to kindly look for connections which are ‘idle in transaction’ and if you see that there are several such connections, I suggest you to tweak your timeout parameters on the client side, to make sure the connections are gracefully closed if they’re idle for a long time.

You can check for ‘idle in transaction’ connections via the below query:

select * from pg_stat_activity where (state = ‘idle in transaction’) and xact_start is not null;

Below is a link which clearly states that in order to troubleshoot the issue, the settings from the application side should be modified and if possible check the closing of the connection:

[+] postgresql - Postgres could not receive data from client: Connection timed out - Database Administrators Stack Exchange connection-timed-out

The same is discussed over below third party forum links for community PostgreSQL :

reset-by-peer-in-my-logs

Why am I seeing "could not receive data from client: Connection reset by peer" in my logs? - Heroku Help

You can also make use of the ‘idle_in_transaction_session_timeout’ parameter in RDS PostgreSQL, which is set to 1 day by default. This parameter helps to terminate any session with an open transaction that has been idle for longer than the specified duration in milliseconds.

More suggestions on the message - In order to resolve the “connection reset by peer issue”, you can look into tuning your application side connection timeout settings and/or set below parameters to keep TCP connections alive at RDS end.

tcp_keepalives_idle (number of seconds of inactivity after which TCP should send a keepalive message to the client.)

tcp_keepalives_interval (number of seconds after which a TCP keepalive message that is not acknowledged by the client should be re transmitted.)

tcp_keepalives_count (number of TCP keepalives that can be lost before the server’s connection to the client is considered dead.)

Once you’ve setup the parameters, you can run below query using master user to verify the changes are applied.

$ select name,setting from pg_settings where name like ‘%tcp%’;

I will also recommend you to add log_connections and log_disconnections parameters. To track successful and Failed connections.

I hope that this information is helpful, please feel free to reach back to me if you require further clarification or request on this case and I will be more than happy to assist you further.

Have a Great Day Ahead !

I’m using Sequelize, and found their page for tweaking these params here.

I’ll be studying them. Has anyone here tweaked these params before, e.g. pool.idle?

vikr00001 · September 11, 2022, 6:09am

I’m starting out with a Galaxy Compact Pro (512MB) container, and looking to find the best settings for Sequelize to use in accessing PostGres. The defaults are:

pool: {
        max: 5,
        min: 0,
        acquire: 30000,
        idle: 10000
    }
});

With the above I’m getting ECONNRESET errors. So I’m thinking about something like this:

pool: {
        max: 50,
        min: 0,
        acquire: 30000,
        idle: 5000
    }
});

Researching this, I just found this:

As stated in the official documentation, Sequelize sets up a connection pool on initialization that can be configured through the constructor’s options parameter.

By default, this pool is limited to 5 connections. This means that your application will use no more than five connections at a time, no matter how many requests it gets. As explained before, if six users concurrently hit any of your APIs that need to connect to the database, one of them will be queued and have to wait.

It’s difficult to estimate the number of connections that will be needed prior to launch. I guess I just have to watch what happens and adjust as I go – is that correct?

I’d been thinking that all this kind of thing would be handled by auto scaling, but I guess this is something I have to tweak independently, since autoscaling can’t change my Sequelize settings.

vikr00001 · September 13, 2022, 6:21pm

Observations

The connection pool settings are – at least in Sequelize – provided by the app and so are out of the control of auto-scaling.
So having auto-scaling on your database server and/or app host, can’t fix this.
It appears that the number of max connections, has to be tuned over time by the web app developer by watching the connections required by the server and continuously updating the connection pool max connection settings to keep up.

Questions

Is there a Sequelize competitor that automatically tunes and updates the max number of connections?
Is there a big memory hit if I set the Sequelize max number of connections to something super-high, like 5000?
Do other kinds of databases, e.g. Mongo, have ways of avoiding the need for a max number of connections?

vikr00001 · September 13, 2022, 7:58pm

I think I got the answer to this one. I increased from 50 to 5000 max connections with no change in memory usage as reported by Galaxy.

rjdavid · September 13, 2022, 9:42pm

Have you confirmed that you have already reached the 5000 max connections? If not, then you have not confirmed yet if there will be memory limits that might be hit before hitting 5000 connections

vikr00001 · September 13, 2022, 11:01pm

Great point! Yes, I may well run out of memory as those connections get allocated.

With 5000 max connections on my local dev system I started getting some errors that I’m not yet familiar with:

remaining connection slots are reserved for non-replication superuser connections

…so I dialed it back down to 50 until I know more.

vikr00001 · September 14, 2022, 4:14am

Hmmmm… I updated from the default of:

pool: {
        max: 5,
        min: 0,
        acquire: 30000,
        idle: 10000
    }
});

…to:

pool: {
        max: 50,
        min: 0,
        acquire: 30000,
        idle: 5000
    }
});

…and I still got an ECONNRESET error, on my Galaxy production server.

Maybe I have two situations to deal with – making sure I have:

Enough max connections
Enough idle time, which is the time they can be idle before being terminated.

But I don’t think I have any functions that keep a connection open and idle for more than 5 seconds.

Well, as a test, I just updated to:

  "pool": {
    "max": 50,
    "min": 0,
    "acquire": 30000,
    "idle": 30000
  },

This is a major hassle. If anyone has any thoughts/speculations/suggestions, please let me know.

hschmaiske · September 14, 2022, 11:14am

Do you use IP Whitelisting on Galaxy?

vikr00001 · September 14, 2022, 1:36pm

Yes, I do have IP Whitelisting on Galaxy. Database access works usually but every few days or weeks I get this ECONNRESET error and then I get the dreaded “app unavailable” email from Galaxy.

hschmaiske · September 15, 2022, 6:21am

We have recently changed the IPs on Whitelisting. Did you make the change?
The new IPs are listed under your app settings tab

Maybe this is what is causing the issue

vikr00001 · September 15, 2022, 2:11pm

Thanks for asking! Yes, I updated the IP addresses and confirmed with Galaxy tech support that everything was in order.

vikr00001 · September 16, 2022, 9:07pm

I’m still studying and learning about this.

My understanding of pool.max was not complete. The developer quoted above said:

This means that your application will use no more than five connections at a time, no matter how many requests it gets. As explained before, if six users concurrently hit any of your APIs that need to connect to the database, one of them will be queued and have to wait.

…but per the PostGres wiki pool.max is also related to the number of processors you have:

How to Find the Optimal Database Connection Pool Size

A formula which has held up pretty well across a lot of benchmarks for years is that for optimal throughput the number of active connections should be somewhere near ((core_count * 2) + effective_spindle_count)

This explains why my first plan of just putting a big number into pool.max didn’t work.

I heard back again from AWS tech support:

So, to help you with your query I would like to mention that increasing the max_connections for your specific use case would not be much helpful as previously the issue that I troubleshooted was not due to connection limit being reached but was due to your client application dropping the connection which can be because the connection was idle since a long time and it got timed out.

So I need to focus more on pool.idle. I haven’t seen an ECONNRESET error since I increased that from 5000 to 30000, so that’s encouraging, but not dispositive since it can be a very intermittent error.

Per AWS tech support, I’ve also updated some settings on the AWS RDS side:

tcp_keepalives_idle (number of seconds of inactivity after which TCP should send a keepalive message to the client.)

tcp_keepalives_interval (number of seconds after which a TCP keepalive message that is not acknowledged by the client should be re transmitted.)

I’ve set tcp_keepalives_idle to 10 and tcp_keepalives_interval to 5.

I’ll update this thread when I know more.

vikr00001 · October 4, 2022, 8:24pm

I just got my first ECONNRESET error in 21 days. So the new settings appear to have made the error less likely, but not solved it yet. I’ll update settings for sequelize and for Amazon RDS and try again. Once I pick the new settings I’ll post them here.

vikr00001 · October 5, 2022, 1:15am

Trying this next:

let sequelize = new Sequelize(dbName, dbUsername, dbPassword, {
	host: dbHost,
	...
	retry  : {
		match: [
			/ETIMEDOUT/,
			/EHOSTUNREACH/,
			/ECONNRESET/,
			/ECONNREFUSED/,
			/ETIMEDOUT/,
			/ESOCKETTIMEDOUT/,
			/EHOSTUNREACH/,
			/EPIPE/,
			/EAI_AGAIN/,
			/SequelizeConnectionError/,
			/SequelizeConnectionRefusedError/,
			/SequelizeHostNotFoundError/,
			/SequelizeHostNotReachableError/,
			/SequelizeInvalidConnectionError/,
			/SequelizeConnectionTimedOutError/
		],
		max  : 5
	}
});

truedon · October 7, 2022, 4:21am

The server is crashing most probably because it ran out of ram. This is very common. Mongo uses all available ram on the machine so you should make a swap also. AWS is a interesting setup which will just drain your pocket of money because they charge excessively for using their resources. If you want to not be in this pinch - get your own server put 8gb or 16gb of ram in her and add a 2gb swap and you’ll be good to go.