File Upload (FilesCollection v2.0)

dr.dimitru · December 13, 2016, 6:55pm

Hello everyone.

This topic is related only to whom using, or plan to use FilesCollection package. I would like to ask community to give a right path for Meteor-Files 2.0. Any feature requests, suggestion and ideas is highly appreciated.

Lately this lib gained an attention, thank you to everyone who has a chance to use is. Lots of contributors have made this lib much better.

Now it’s time for v2.0, time to move from CoffeeScript to ES6, time to remove all obsolete code.

Here is some of my thoughts and questions to you:

As HTTP upload shows itself as faster, stable and more efficient than DDP (WebSockets), should we drop support for DDP? And put more focus on efficient and asynchronous uploads via HTTP? I’ve been playing with uploads directly from Web Workers - no more UI freezes, 1.5x speed increase, POST uploads from 3rd parties. To find out more about upload transports take a look on this article.
Drop dependency from MongoDB. A lot of questions and issues about meta data, additional fields, etc. - This looks like bottleneck, which ties developers hands, and forces to rely on MongoDB, and to create records in strict schema. How about to put all control in hands of developer, and let you to decide where you should store file and how to describe its metadata? On Client you call upload method, and on Server you receive everything you need at afterUpload hook. This one may also help to move this lib to NPM. Another good option is to split this lib into two - one which manages files as collections, and another which handles uploads.
Protected files and uploads are also popular topic here. Need to standardise authentication method. Looking forward old friend Authorization header with current Meteor token for uploads. For file requests (download, display, play) it’s still unclear. As discussed here we can not rely on cookies as they are not accessible at various circumstances, especially on production with mobile apps. Authentication headers is not under control when you displaying file in HTML. Get query is not safe, HTTPS (SSL) partly solves get query parameters security.
Is any of developers uses public files? Like when it served by proxy-server nginx, apache, etc.

As always main idea to keep this project flexible as possible, by providing thin API to work with files and uploads without overhead.

See discussion at GitHub.

msavin · December 20, 2016, 3:35pm

Hey man, thanks for putting this package together. I think Meteor really needs a solid file upload solution, and this looks to be on the right track.

Yes I think HTTP is the way to go
This I do not know, I think I would want to have a record of every file that was uploaded to my system. The thought of having someone being able to just push files to my storage bothers my. Maybe the package could have “logging” functionality, which would be up to the developer to turn on or turn off., etc.
Yes, protected files are really important, even for basic things like chat apps.
Still using public, but would not want to mix my uploads in there

In terms of features that I think are important:

The ability to delete files. Most packages focus on the upload, but for safety purposes, its important to be able to delete things
File processing and compression. For example, if a user uploads a photo of themselves as an avatar, it may be too big and heavy for regular use. It would be awesome if there was a way to resize the photo into multiple sizes and to compress them. Even if this was optional through a third party API - I think it would be a huge move forward.

Also, for whatever reason, I would much prefer if this were a server-side package that I could build my own Methods around. That way, I can fully control what is being uploaded, by whom, and where to.

Ideally, I would pass the file into a method, run it through a function from the package, and get back an object with the upload id and file name. That object would be stored as a log entry, along with the upload date, status, etc. Then the developer could use that however as they see fit.

dr.dimitru · December 20, 2016, 11:30pm

Hi @msavin,

Thank you for feedback.

File processing and compression. For example, if a user uploads a photo of themselves as an avatar, it may be too big and heavy for regular use. It would be awesome if there was a way to resize the photo into multiple sizes and to compress them. Even if this was optional through a third party API - I think it would be a huge move forward.

To do so, there is various methods to pre-process file on the Client, like via canvas for images or zip compression via .pipe() method.
On the Server all post-processing can be done in onAfterUpload hook.

Or you’re more speaking about examples and “to dos”?

Ideally, I would pass the file into a method, run it through a function from the package, and get back an object with the upload id and file name. That object would be stored as a log entry, along with the upload date, status, etc. Then the developer could use that however as they see fit.

What about remaining time, speed, etc.? Do you need this statistic data during upload?

msavin · December 21, 2016, 4:18pm

Ah client side compression is interesting. Is that how Facebook/Google/etc do it?

When it comes to upload percent… I am not sure how important it is. I imagine it would have start a new subscription and stress the oplog, so it doesn’t vibe well with me as a scalable solution. Plus, pub sub is not always that fast, requires optimization, etc. My guess is if someone were to implement that correctly, they would build a solution around something like Redis?

However, on this note I was thinking that there should be support for uploading multiple files in once. For that, it could give back a progress report of sorts (i.e. Uploaded 3/5 files).

rhywden · December 21, 2016, 4:30pm

Uh, guys: Upload progress is done through events on the upload function.

That’s not something you should plugin anywhere else and every browser has this capability. Yes, even InternetExplorer. Why the hell would you want to publish that?

And trying to zip a single JPEG will only result in wasted CPU cycles.

dr.dimitru · December 21, 2016, 5:08pm

I imagine it would have start a new subscription and stress the oplog, so it doesn’t vibe well with me as a scalable solution. Plus, pub sub is not always that fast, requires optimization, etc.

This lib don’t use any technique related to MongoDB, publications or subscriptions to measure the speed and upload progress.

However, on this note I was thinking that there should be support for uploading multiple files in once.

Have you seen https://files.veliov.com ? It supports multiple file upload, as well as summarised speed/progress for all files in a queue. Its source is available on GitHub.

Ah client side compression is interesting. Is that how Facebook/Google/etc do it?

Idk about Facebook/Google/etc compression on client-side, could you please point me to reference? This functionality was requested by community member earlier, he responded it’s solved his task with compression.

dr.dimitru · December 21, 2016, 5:14pm

That’s not something you should plugin anywhere else and every browser has this capability. Yes, even InternetExplorer. Why the hell would you want to publish that?

Did’t get this one. If it’s related to events, we use EventEmitter library, which is bulletproof (or at least should be), by Oliver Caldwell

And trying to zip a single JPEG will only result in wasted CPU cycles.

It depends form app to app. If it’s intended to be compressed when arrives to server, for example. You always can detach from main thread with help of Web Workers

rhywden · December 21, 2016, 5:47pm

Unless you’re doing something very weird (and probably wrong), you’ll be using the Filereader browser API. Which has events for progress report baked in. There’s absolutely no need to re-invent the wheel.

No. Pictures are already compressed efficiently unless you’re using BMPs or something. You won’t get next to no gain by trying to compress a PNG, GIF or JPEG.

Because those file formats already have compression built-in. You’ll get a compression factor of maybe 1% if you’re really lucky. Which is a waste of time because you just introduced a new point of failure for no reasonable gain.

The only thing you can do is either shrinking or cropping the image.

dr.dimitru · December 21, 2016, 6:03pm

Unless you’re doing something very weird (and probably wrong), you’ll be using the Filereader browser API. Which has events for progress report baked in. There’s absolutely no need to re-invent the wheel.

Agree, and we use those events too. There is more than just read events. If you don’t like events or it’s not working for some reason in a browser you’re targeted on - this lib has callback/hooks API.

No. Pictures are already compressed efficiently unless you’re using BMPs or something. You won’t get next to no gain by trying to compress a PNG, GIF or JPEG.

Are you only upload pictures? Agree what you don’t have to compress images, but you can if you want.

rhywden · December 21, 2016, 6:08pm

Videos won’t compress very well either. And documents are rarely so big that compression is actually needed. I’m simply not seeing the use case for compressing file uploads during upload (unless you’re doing something exotic and regularly need to upload multi-megabyte CSVs or something).

msavin · December 21, 2016, 6:11pm

Ah, I thought you guys meant upload progress from the server to the storage.

dr.dimitru · December 21, 2016, 6:31pm

regularly need to upload multi-megabyte CSV

Not exotic at all, actually you 100/100 match the case how this library was born. In day-to-day business tasks multi-megabyte CSV/XLS/doc and other documents is what people is uploading to servers.

msavin · December 21, 2016, 6:44pm

What I’m seeing is that we have a lot of different file types with a lot of different needs.

Images… we want to edit file format, resolution, compress, etc
Audio and video… same same as images but different
then there’s documents (text, pdf, doc, xls, csv, etc), archives (either uploaded as archives, or downloaded as archives), etc

Maybe we need a bunch of different packages that address each format specifically?

rhywden · December 21, 2016, 6:49pm

My thesis which was about 150 pages long (and contained multiple pictures) was about 10 MB. I’m somewhat doubting that you are actually uploading such huge documents on a regular basis.

I also just tried to zip my thesis - the Word document was reduced from 10.1 MB to 10.0 MB. Yeah, that’s really worth the additional headaches…

dr.dimitru · December 21, 2016, 6:56pm

nooo, it’s better to implement one with very open and thin API, so every developer can change behaviours to meet unique requirements.

dr.dimitru · December 21, 2016, 7:03pm

I got you thesis. My last comment wasn’t about compression.
I’ve said we handle uploads of thousands huge files daily by enterprise users.

.pipeing is not limited to compression, you can change file-type, resize, crop, or use any other 3rd party library.

rhywden · December 21, 2016, 7:13pm

You do know that there’s a reason why protocols like SFTP, FTPS or SCP exist, right?

dr.dimitru · December 21, 2016, 7:19pm

Users need friendly interface in a pocket, which usually a web-app, native or cordova.
Sure we can implement upload via FTP and alike in native app.

rhywden · December 21, 2016, 7:24pm

If you have Enterprise users then they’re probably rather more interested in reliable results.

Fancy UI is nice and all. But (S)FTP is a tried and true workhorse much better suited to file uploads measured in Mega/Gigabytes. For example, what do you do if the connection is lost during file upload? With FTP you can simply issue a resume command.

dr.dimitru · December 21, 2016, 7:28pm

This lib is made in mind of poor connection, upload will be continued after connection is restored, all failed chunks will be re-sent. As most of first users was at poor GSM/3G connection zone. And there is no need to tinker with it, all work will be done silently to user.