Prerender.io and index.html (where is this index.html coming from?)


#1

Hi,

Random query but, something that I’m having a spot of bother with.

Using Meteor, Galaxy, Blaze, Prerender (standalone not galaxy).

Google is indexing my pages which is great but, it’s hitting prerender.io with URL’s post-fixed with index.html which don’t exist. These are then being pre-rendered by prerender as 404 pages as my application routes to a 404 page (I’m using FlowRouter) for any routes/etc. that don’t exist.

Can anyone help me understand why and where these index.html pages are coming from? I’d understand if it were a Apache/Nginx box defaulting to index.html etc. but, not in this setup.

So:

https://www.mysite.com/myurl/structure/is/this

Which is then being prerender cached and then indexed by google like:

https://www.mysite.com/myurl/structure/is/this/index.html
https://www.mysite.com/myurl/structure/is/index.html
https://www.mysite.com/myurl/structure/index.html
https://www.mysite.com/myurl/index.html
https://www.mysite.com/index.html

Is this some sort of galaxy/prerender thing … or is a google search bot just making things up? How and what can I do to stop these index.html pages?!

Thank you for any replies.
Flange


#2

Hey @flange, I have the same project setup as you do Meteor, Galaxy, Blaze, FlowRouter, Prerender (not Galaxy prebuilt). We also experienced this before when we launched our website (also got a lot of /privacy-policy at the end of the urls). We found out that this wasn’t a Prerender problem but a Google one. What we did is add a Robots.txt file that excludes this type of urls (not sure how the Search Console of google side works but if Im not wrong you have to submit your robots.txt file there).

I hope this helps you a little but this is for sure not a Prerender problem.


#3

Heya.

Thanks for the reply. Good to have the second opinion that Google is actually the culprit. Thank you. I’ve a good direction now to try and sort this out.

This must be a bit of a problem for everyone using similar setup, no?

I guess from a Bot point of view seeking out default Apache/Nginx/other index.htm/html pages at the root of directories you’re going to come up something to ‘index’. Put’s Google in the realm of a bad-bot though where this setup/client router/no ssr for 404 is concerned though.