What do you miss in your spiderable package?


#1

I am working on a revamp of the spiderable package, so i wanted to ask the community for a quick feedback about the question in the title.

Currently i am working on:
Replace phantomjs with zombie

Planned are also new settings which can be defined in Meteor.Settings:

  • verbose
  • port
  • host (name or ip)
  • allowed bot patterns
  • caching, tmp lifetime
  • precaching (reocurring process caches urls by time plan)

#2

A seeded or (if possible) automatic prerun and configurable caching would be two very great features.


#3

What exactly do you mean with “seeded”?

I think I would do something like:
Google opens site - check if tmp file is in cache and not too old - ouput tmp or new content


#4

something like this https://atmospherejs.com/chfritz/spiderable ?


#5

Why doesn’t you just use your mentioned package, when you miss everything of it? xD :wink:


#6

Oh by seeded I mean providing a list of url’s to fetch and cache.

Compared to automatic (through crawling the whole site beginning at /)


#7

oh yeah, thats a cool idea and I see the use case.


#8

Just for the sake of completeness: the new package is out and you can try it out at:
https://atmospherejs.com/lufrai/spiderable2


#9

Have anyone been able to use this together with Meteor Up (mup) and meteorhacks:cluster?
I’m having difficulties debugging, don’t get why you have to set a port.


#10

webcomponents.js (Polymer) support