How many sockjs connections can one client create?

Currently, it looks like Googlebot is hammering our app. Out of 4.5m sockjs connections during the last 12 days, 2.5m came from Googlebot.

We also see from our logs one real user with a client that created up to 200 connections to our servers (the majority are sockjs) and, therefore, must be shown a captcha.

  1. So my question is, how many sockjs connections can we expect in a normal client?

  2. We heavily used dynamic imports to cut our bundle size by 50%. One hunch we had was that the dynamic imports add much stress to our servers, which we can move to our CDN if we drop dynamic imports in exchange for doubling our initial bundle size. How many sockjs connections do dynamic imports create? If there are ten files in a dynamically imported route, how many sockjs connections will the client create to fetch those ten files?

  3. Does dynamic import add much stress to the servers, considering that caching is per client and not per file?

Our quick/temporary solution is to separate the instance group handling bot traffic from the group handling regular traffic. At the very least, the user app should not be affected.

We are still looking to understand the pros and cons of dynamic imports. We hope to understand if the cons (additional load of serving the dynamically imported files through DDP per user who accesses the pages) far outweigh including everything in one large bundle which can be cached at a CDN level

Sharing my 2 cents: dynamic imports are delivered via POST requests so I don’t know why you are concerned about receiving files via DDP.


Thanks for correcting this, @filipenevola. We are currently out of ideas and have anchored our thinking to what might be causing these sockjs connections.

Is this a new behavior? Could it be some malicious user pretending to be Google?

Or have you added new pages recently to your sitemap.xml?

We have confirmed that these are Google bots e.g. Googlebot, Google-Pagerenderer. And these were indeed gradually increasing during the last 30 days.

We have hundreds of thousands of public pages which we let to be crawled for SEO purposes. What’s different compared before were the amount of sockjs connections.

We consider this before as a sign that Googlebots started running our app on some pages. Seems like this has been increased on most pages.

Another problem we noticed is that as we increase our server resources, Googlebots will just test the new limits and increase the crawl accordingly.

Our backup option if this will become too expensive in the end is just to serve Googlebot the cached SSR without JS file attached.

@filipenevola, do you know if dynamic imports are being cached in the server? Or are the response generated on the fly when requested?

Are you using Prerender or something that could cache your pages for a while?

We have self-hosted prerender running for some clients, and it’s excellent for avoiding overloading the server with renders that will ultimately result in the same HTML.

If I remember correctly you are not running at Galaxy, correct?

You could also set up Nginx to limit the requests with Googlebot agent.

I would bet they are not, but similarly to my previous reply, if you have control over your proxy, you could set it up there as they are deterministic for each version of your bundle.

BTW, did you check that a bunch of POSTs are reaching your server and causing the load? Otherwise, you should ignore dynamic imports as the root cause.

Also, you could run a different group of containers where you only have static imports (if that is the cause).

1 Like

Something similar. We cache the output of our SSR which we serve for both bots and real users. We are now thinking of another SSR copy without the main js script just for bots

Yes, we don’t. We host in aws. Limiting request was an option. But it was a dilemma for us to decide since we are also getting substantial high-value traffic from seo for this app.

As of now, we are trying to split the traffic so we have more control while ensuring that the real user traffic is not affected

1 Like