Googlebot crawling sockjs endpoints

Recently, we discovered the crawling stats of Googlebot from the search console and 16% of our urls return 404s. When we checked those urls, all were sockjs endpoints

Sample url
https://domain.com/sockjs/539/i2a1mual/xhr_send

We are looking to disallow crawling for all urls from /sockjs/*

Posting here if anybody has experience with this and if blocking googlebot will have adverse effect on how googlebot access our app (seo traffic)

3 Likes

I wonder if you get XHR errors in your logs as well and it that is actually coming from said Googlebot.

We do get them sporadically (39 errors in the last 24h, 1050 in the last month) but it’s still annoying. I posted about them in another thread and no one could identify the root cause of those XHR errors from using GET/POST on sockjs:

Example:

XHR error GET https://subdomain.domain/sockjs/info?cb=fb0utplyux

{
	"id": "AQAAAXi2t9cteb9cowAAAABBWGkydUJkb0FBQktoY19ZOC1hSE53QUI",
	"content": {
		"timestamp": "2021-04-09T13:00:04.781Z",
		"tags": [
			"sdk_version:2.6.0",
			"source:browser"
		],
		"message": "XHR error GET https://subdomain.domain/sockjs/info?cb=fb0utplyux",
		"attributes": {
			"http": {
				"status_code": 0,
				"url_details": {
					"path": "/sockjs/info",
					"scheme": "https",
					"host": "subdomain.domain",
					"queryString": {
						"cb": "fb0utplyux"
					}
				},
				"method": "GET",
				"useragent_details": {
					"os": {
						"family": "Windows",
						"major": "10"
					},
					"browser": {
						"family": "Electron",
						"patch": "3",
						"major": "10",
						"minor": "1"
					},
					"device": {
						"family": "Other",
						"category": "Desktop"
					}
				},
				"useragent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) yourDNA.family/3.25.8 Chrome/85.0.4183.121 Electron/10.1.3 Safari/537.36",
				"url": "https://subdomain.domain/sockjs/info?cb=fb0utplyux"
			},
			"network": {
				"client": {
					"geoip": {
						"subdivision": {
							"name": "Florida",
							"iso_code": "FL"
						},
						"continent": {
							"code": "NA",
							"name": "North America"
						},
						"country": {
							"name": "United States",
							"iso_code": "US"
						},
						"city": {
							"name": "St. Petersburg"
						},
						"ipAddress": "65.32.120.21"
					},
					"ip": "65.32.120.21"
				}
			},
			"date": 1617973204781,
			"view": {
				"referrer": "",
				"url_details": {
					"path": "/sign-in",
					"scheme": "meteor",
					"host": "desktop"
				},
				"referrer_details": {
					"path": ""
				},
				"url": "meteor://desktop/sign-in"
			},
			"error": {
				"origin": "network",
				"stack": "Failed to load"
			},
			"status": "error"
		}
	}
}

But then again it rather looks like it’s coming from our users, not from a Googlebot (see location info)

Not currently seeing these errors.

Are you using anything to prerender results for the bots? For example, https://prerender.io/

We have SSR implemented

Just an update… our 404 urls dropped as soon as we blocked sockjs urls

1 Like

So it was indeed the GoogleBot and you added it to the Robots.txt?

Yes, through robots.txt