Use CollectionFS cfs:filesystem with multiple instances?

I am running (deprecated) CollectionFS cfs:standard-packages with cfs:filesystem on multiple instances with the same code.

  • 1 instance is only handling the upload and the processing / conversion
  • 2+ instances are only handling image serving

When generating lots of files with instance 1 I experience other serving instances crashing - which is pretty bad because meteor --restart unless-stopped doesn’t restart the app and my images are suddenly unavailable.

W20160629-18:15:00.618(0)? (STDERR) stream.js:94
W20160629-18:15:00.618(0)? (STDERR)       throw er; // Unhandled stream error in pipe.
W20160629-18:15:00.619(0)? (STDERR)             ^
W20160629-18:15:00.619(0)? (STDERR) Error: ENOENT, open '/app/.meteor/local/cfs/files/_tempstore/files-fhftx3PHM7cRwKHBa-0.chunk'
=> Exited with code: 8
=> Your application is crashing. Waiting for file change.

I read through some posts, seems CollectionFS is not made for multiple instances with the same database and file system because of the current design with the worker. I tried to create a local copy of cfs:worker to disable the worker starting the processing job on serving instances but that leads to different problems.

Two questions from my side:

  1. Can I just disable the workers on the file-serving instances?
  2. Can I just try/catch the issue? Has someone done this before?

I had something similar with multiple instances when used with S3 and filesystem as tempstore. I switched the tempstore from cfs:filesystem to cfs:gridfs to fix this.

You can try to set FS.debug = true; and see when the problem occur.

I think that the problem is here:


…when another instance try to delete a temp file.

You can fork cfs and get & stop the observers on the other instances.

https://gist.github.com/Siphion/669e23920bd6217bb1098043f17fa656 (not so sure about this)
Look for “// HERE HERE HERE” in the code

And then this on your code:

if (notUploadInstance) { 
   FS.FileWorker.observers.removed.stop(); 
   FS.FileWorker.observers.added.stop(); 
}

(by the way, i’m not using cfs anymore, i’m using slingshot: i suggest you to think about migration if you can :confounded: )

2 Likes

Thanks!

I cloned fileWorker and fsFile and surrounded removeFile() and pipe() with try/catch.

Next I will look into your suggestion to disable the worker/observer completely.

And I will have a look at Slingshot.

2 Likes

Having this issue now too. :frowning: I’m using S3 and FileSystem as my tempstore. So @siphion you’re saying just switching to GridFS tempstore will fix this? Or I also have to make the updates you mentioned?

@bluepuma could you elaborate more on if your changes solved the problem? Did you ever find a solution to this?

Yes. Gridfs tempstore will fix this. If you read again my post the error occurs when other instances try to delete a temp file in a local path but that file do not exist becouse it was created in another instance.

You can proceed in multiway:

  1. use gridfs (the tempstore will be on the database)
  2. fix the code as i suggested or as bluepuma did
  3. migrate

I recomend 3) and use slingshot if you are on S3.

1 Like

Thanks. Does using gridfs cause a bottleneck in the database/app/servers if multiple users are all uploading large files? I think this was my original concern with it. Once a file is sent up to S3 it’s removed from gridfs/database correct? (Sorry, I have a novice understanding of cfs and it’s moving parts.)

I like slingshot but I was hesitating because it doesn’t seem like there’s a clear solution for editing/cropping/resizing files in the browser? Know any good solutions for this? I do cropping and resizing with GraphicsMagick and everything is worked out well (after many hours/days of research work) so I’m hesitant to start again on slingshot.

Right now I allow users to upload to a max files size using CFS and then crop/resize on the server before storing to S3. From what I’ve read, slingshot doesn’t let you do graphicsmagick type transforms.

Actually i think that there’s a bug in gridfs becouse tmp grid collection always contained old data, i think that the purge action did not work correctly, but its something that you can handle with a job that delete items in that collection older than XXX.

If you just need cropping and resize you can do it on the client. :slight_smile:
Lookat this post where i contribuited Slingshot and Imagemagick / grapicksmagic

1 Like

We use S3 and grid for the temp store. We run 2 servos on galaxy and crash… a lot more than we used to. I don’t have a good answer as to why the crash frequency has suddenly gone up, file uploads have not.

If anyone solved this issue, I’d love to know. We rely on users uploading their files and having a nice library of files. If we switch to slingshot (as suggested) we will need to fix up the CollectionFS entries ourselves so we can track users to files and delete S3 files when the user is done with them.