MDG plan for a tracking package

Hi,

In a recent GitHub Pull Request, a new Meteor package named census was developed to

Track anonymous info to help us (MDG) improve Meteor

Given than originally no rationale was given in the PR, I was very surprised by this package and decided to open a discussion on it. MDG’s employee @zoltan kindly and quickly responded to share some of the motivations behind this development.

I’ve expressed some concerns about this built-in, and enabled by default, tracking package sending data to MDG especially in regards of Meteor reputation in the broader JavaScript communities. As I don’t want to paraphrase anyone, I strongly encourage you to take a look at the actual PR and at beginning of the discussion on GitHub:

I’m genuinely interested in reading others opinions, so let’s continue the discussion on this forum which is arguably a better debating platform than GitHub issues—as the many gigantic threads on the view layer discussion have shown.

To be clear, I don’t want to blame MDG for anything here, on the contrary I’m thankful that this development happened in the open so that interested persons can see and discuss it.

13 Likes

We were just about to post a thread asking for feedback about this here!

3 Likes

I suggest making it a (y/n) opt-in when using the CLI generator.

It’s how desktop apps handle anon reporting. I almost always gladly say, “YES. Make this better.” I’m sure most of use would do the same. But others don’t want to, and that ought to be respected. Also, MDG should strive for transparency; I think it would only be fair to explicitly disclose what information would be reported. Google Play’s app permission center is a perfect example, everything is very open; no surprises.

@mquandalle

First, unless I’m wrong reading the code, I wouldn’t qualify an IP address nor the application URL as some “anonymous info”.

Yeah, I wouldn’t call that anonymous reporting. Reporting my server’s ip & url would be roughly equivalent to having Webstorm send JetBrains my name and street address. Personally, I would not mind reporting the url/ip, I’m not working on anything which I want to keep secret. Yet, I do have some things private, but I could care less if MDG sees the domain its under next to the thousands of others in some database. If you’re concerned about secrecy, you should have access locked down with a whitelist anyway :closed_lock_with_key: but I digress.

Another suggestion:

That data is pretty basic, you can learn quite a lot from version number and IPs, but that’s nothing compared to what’s possible with just a few more metrics. If you’re collecting data, wouldn’t it be a good idea to collect information about their app? For example, it could report that in the last week my Instance crashed 23 times because of Exception X. That can actually help you find out what’s not working. (sending every stack trace would be way overkill, but keeping some sort of talley of crashes, then reporting ones with high severity and/or recurrence makes perfect sense. There’s so much good that could be done by this.

Imagine these made up scenarios:

  • 65% apps running on windows are getting Exception X, but only 0.03% of linux apps have run into this

    • obviously theres an issue here. Whats going on?
  • 82% of reports of E_NOMEM on boxes with less than 1GB of ram fail somewhere within the mongo adapter.

    • is there an issue with a npm pkg version?
    • Should we make it known that stable instances should have X and Y resources available as a requirement?
    • Is it possible to create a lite-er version of the adapter or have some way to turn on debounce for at-risk systems?
  • Wow there’s 10,000 locally hosted apps from Uzbekistan

    • can we get translations of documentation?
    • can MDG process payments for Galaxy from Uzbekistan?
    • how can that community be supported?
  • 12% of 1.3 users still have iron router installed

    • Should we revitalize IronRouter? (the answer to that question is a resounding no btw)
    • Why the hell are 7% of newly creating apps installing it? Are FlowRouter or ReactRouter missing some functionality, not meeting some use case? Is there a lack of communication of best practices?
    • How many existing projects depend on it? how can we ease the transition?

I can go on and on and on…


Perhaps even make it configurable? I’ve seen OAuth apps where on the “Authorize App Y to use your Account Page” they allow you to optionally limit the scope of what they can access. MDG can potentially use their data mining to huge (financial) potential. If I opt to make the reports very permissive, MDG stands to benefit. Might it be possible to give developers who opt into detailed telemetry a free month of Galaxy, offer a special discounted rate, etc etc? That could be an easy way to get developers to try Galaxy. Getting them signed up like 90% of the battle I would imagine. It’s a Win/Win as I see it.

11 Likes

I agree with this statement

First, unless I’m wrong reading the code, I wouldn’t qualify an IP address nor the application URL as some “anonymous info”.

Why is this needed? AFAIK rails doesn’t have this, django doesn’t have this, phoenix doesn’t have this nor do any other framework that I’m aware of (please correct me if I’m wrong).

Ultimately, we’re doing this to benefit everyone in the community by gathering objective data that informs where to spend our resources in the most impactful way. The difficult decision lies in where to strike the balance between the package being too intrusive but useful vs being friendly but useless.

Facebook tracks my info and it’s not out of the kindness of their hearts even if it does “benefit everyone in the community”. This is to assume that github issues aren’t good enough to track developer issues, the forums aren’t good enough to track developer feedback, Stack Overflow isn’t good enough to track developer pain points, facebook, twitter, and google+ aren’t fast enough feedback and the 280 meteor meetups world wide dont get enough information about the developers using the platform.

3 Likes

I think the reality is that all of these sources only inform us about the most vocal and active members of the community. I think there is a silent majority of people who would never consider posting on the forums, don’t post GitHub issues, don’t ask Stack Overflow questions, and don’t go to meetups. I know I was this kind of developer before I started working at Meteor.

Perhaps the problems those people run into are not important, and we should measure how important problems are by how often people speak out about them. But it’s certainly true from what I’ve seen that it is impossible to get a good sampling of what is happening from the sources you mentioned. For example, people who are more active in the community might be a lot more interested in cutting-edge technology like React than people who are just chugging along getting work done.

I’m not espousing any particular opinion and I’m interested to hear what everyone here thinks about getting more data about real Meteor usage patterns in the wild; I just wanted to chime in and say, no, those sources of data are not very reliable.

11 Likes

@khamoud I don’t think it’s based on that assumption. The sense I get is MDG want’s to know who is using meteor and for what kinds of apps. That way they can gauge their audience. And that makes perfect sense to me from their point of view. It’s not necessarily altruistic; its pragmatic. But that does not mean the community can’t stand to benefit! Far from it. If MDG can get better insight into those two things, they can adjust their aim accordingly. Moreover, I would argue that the assumption is not totally incorrect. You just listed 7 different mediums for feedback, but don’t you think that having an standardized aggregate of issues or demographics have some merit? Those mediums give the qualitative insight into the developer experience. But quantitative measurements are pretty useful too :stuck_out_tongue: Those mediums, the community members are great, but that doesn’t mean MDG can’t aim to do better.

2 Likes

I’d like to know how this benefits MDG as a company. A few days ago MDG announced it was shutting down free hosting (which I support). I assume that announcement was to benefit MDG and not for “the benefit of everyone in the community” because the community seemed pretty upset about it.

Now MDG is going to opt-in developers to share their application data “anonymously” while keeping the IP address and application url.

I would be a lot less worried if the data were open source. The current data server is https://activity.meteor.com which has SSL and doesn’t display anything. Looking at MDG’s repos I couldn’t find an Activity repo nor could I find any mention of open sourcing the data in the PR. All this leads me to believe that the data won’t be open source.

If the point of the data is to help the community then the community should have access to the data that is being collected (if the point is to catch exceptions, performance, etc. then members of the community can help the same way they always have, by submitting PR’s).

My question is: will the data be open source? If yes, then no problem. If no, why not, how does this benefit MDG as a business, and how will it benefit the community?

2 Likes
  1. where’s the source code?
  2. is it devOnly?

Either way it should be optional. It will be a self inflicted wound (PR wise) if Meteor collects application data by default.

1 Like

This describes most enterprise developers. The ones who are using meteor aren’t really talking about it. Because of my company’s trade secret policies, I can’t even say what public facing applications I wrote, and yet they are heavy lifting applications which I would have loved to have been able to get certain hosting solutions in place long ago (talking about Galaxy and it’s limitations for me, but that’s for another conversation). I am definitely a bit of an exception because I answer Stack Overflow questions (at least a few of them), I make bug reports when I run into broken bugs, and as of this last weekend, I’m active here. There are two meteor developers in my company who are not (one doesn’t even have an account here). So within my own sample size, I’m a 1/3 minority.[quote=“manuel, post:8, topic:19471”]
Either way it should be optional. It will be a self inflicted wound (PR wise) if Meteor collects application data by default.
[/quote]

Yeah, especially with regards to companies who have various compliance requirements (PCI/HIPAA)[quote=“khamoud, post:7, topic:19471”]
If the point of the data is to help the community then the community should have access to the data that is being collected (if the point is to catch exceptions, performance, etc. then members of the community can help the same way they always have, by submitting PR’s).
[/quote]

It’s entirely possible that the data is to help the community that isn’t on these forums. But I could be grasping at straws here as there hasn’t been a formal announcement.

3 Likes

I did not read the whole thread, but auto enabled opt in to track IP and/or URL is a very very big no-go for Europe’s data protection laws and even bigger deal if you wish to have any relationship to the enterprise segment here.

Perspective: Google analytics in Germany is only allowed to provide the first three segments of an IP adress. Even if you have a mile of legal documents attached to the page. Don’t underestimate the huge sensitivity for data protection outside of the US.

If you wish to track, make it opt in on first run with appropriate data protection disclaimers / non-nagging.

9 Likes

I’m reposting my comment from the PR in case folks haven’t read it, then I want to address some of the points in this thread so far.

I wanted to add some clarity around why we plan to include the census package. Recently, we were looking at Meteor usage statistics and realized that given our existing telemetry we can only identify about 20% of apps using Meteor. That means a large fraction of our usage comes from apps that we know absolutely nothing about.

This is less than ideal as knowing who our users are and how they’re using our platform helps us make informed decisions towards creating a great Meteor roadmap. The insight gained from better telemetry results in a roadmap that increases worldwide Meteor adoption (which helps everyone in the community - more contributors, more add-ons/services, more talent, more jobs). For instance, knowing what breakdown of apps are built by enterprises vs startups vs individuals allows us to tailor our learning materials, tutorials, features and commercial offerings accordingly. Knowing what scale folks use Meteor at (by collecting the maxSessions stat - https://github.com/meteor/meteor/pull/6469/files#diff-a7c6c405e95f7fbaeac0c4645b12c77eR27) informs us on how much engineering resources to allocate towards scaling/performance.

Ultimately, we’re doing this to benefit everyone in the community by gathering objective data that informs where to spend our resources in the most impactful way. The difficult decision lies in where to strike the balance between the package being too intrusive but useful vs being friendly but useless.

@laosb I like your suggestion of explicitly notifying users that we’ve added the package.

First off, I want to say I think we put the cart before the horse here by writing the code and opening the PR before engaging the community first. We’re an engineering driven company and sometimes code is easier to write than prose. We won’t make that mistake again. The immediate point of this package is perfectly summed up by @rozzzly when they said “The sense I get is MDG want’s to know who is using meteor and for what kinds of apps. That way they can gauge their audience.” . To that end, the most important pieces of data we’d like to gather are rootUrl, maxSessions and version (Meteor’s).

The challenge that I alluded to in my comment on the PR is to have enough people be comfortable with providing this data for it to be useful. If 80% of apps remain in the shadows we haven’t really achieved anything.

Having said that, we absolutely do not want users to be unaware that we’re collecting this data. Further, it should be trivial for users who do not want to give us the data to opt-out. It’s the users in the middle who don’t much care either way whom we would really want to encourage to opt-in - any ideas on the best way to do this would be appreciated.

Now to specific questions/concerns:

You should make the package opt-in via a prompted Y/N

We figured providing a clear message when running meteor create that the package had been added and where to read more about it as well as instructions for removing it (meteor remove census) would suffice. The reasons we thought this was better was:

a) It’s less friction than having to pause, think and press a key everytime you create an app.
b) meteor create can still be run from scripts without the need to add further complexity by introducing additional command line flags.
c) This way those users who are indifferent towards providing the census data are less likely to opt-out as it would mean typing another command.

What information is being sent?

The data we’re proposing to send is here. We’ll add more documentation on exactly what this stuff means and why we’re sending it.

You shouldn’t be sending my server’s ip.

This data was included unintentionally, we’re going to remove it.

You should open-source the data.

Actually, this was our intention all along. We plan on building infographics and writing about the findings we glean from the data. We hadn’t intended to open-up public access to the database - this is in interesting idea though. Would people be comfortable with it? It’s kind of like the actual census data. We’re all comfortable with providing it, the results are public and in fact very useful to everyone - but it might be not be a great idea if people could look up the data based on street address.

Where is the source code?

The source code currently lives in this PR. Once merged it will live in the mainline Meteor repository.

Thanks to everyone for participating in this thread. I appreciate your feedback and want to stress that we’re not out to ship something that’s going to alienate the community.

12 Likes

Excellent response @zoltan. I think the y/n being too demanding is a taaaaaaaad bit facetious hahaha I doubt many people are running the cli generator every day


Damn. Thats ridiculous. I’m all for privacy, but that protection is just silly. “oh no I only can see 3/4 of the ip… there’s only 256 other possibilities…” are you kidding me? Even better, I can post an image here via markdown which has url of a php script that serves a 1x1 pixel transparent .gif with the correct MIME Type and a little bit of binary data that logs information about you… here watch

Expand to view my rant about how that's ineffective and does more harm than good (collapsed because its offtopic)

Took me 30 seconds to find [this StackOverflow](https://stackoverflow.com/questions/4665960/most-efficient-way-to- display-a-1x1-gif-tracking-pixel-web-beacon) post:

header(‘Content-Type: image/gif’);
echo “\x47\x49\x46\x38\x37\x61\x1\x0\x1\x0\x80\x0\x0\xfc\x6a\x6c\x0\x0\x0\x2c\x0\x0\x0\x0\x1\x0\x1\x0\x0\x2\x2\x44\x1\x0\x3b”;

Then I just read the incoming request object, store it’s values. That means I could grab all of your IPs, UserAgents, referrers, etc. If that embedded image wasn’t limited to one page, I could, under the right conditions, set cookies and watch you jump from page to page. If I have this img on few sites, I can track you across domains too. You > really don’t even need cookies,it just makes the results neater.

And all I did was post an image that you really can’t prevent, or even notice, unless you’re super intent of remaining anonymous, in which case you should be using a vpn & filters so none of that matters >.>

This is just how the internet works. Google might be big enough where they can’t risk bad PR and thus comply, but 99% of the internet doesn’t give a shit and they’re going to track you anyways however they can. Those laws should focus more on going after tangible corporations who abuse that data (and I’m sure it does that too). But to call that protection is absurd. Atleast DNT makes it obvious that it’s just a non-binding request saying please don’t (that no server admin cares about)

I’m sorry, that’s not “protection.” If anything, privacy policies like just lull people into a false sense of security. “ope you cant see my ip, im safe and secure now!” How about actually addressing the issues that matter, which we can do something about. EndToEnd encryption everywhere, no excuses.

disclaimer; there are definitely ways to prevent tracking that are not insignificant. But, they are involved. Whether you’re browsing, or an admin attempting to buffer the users from external resources on your site, there’s a fair amount of work for something that goes unnoticed by way too many developers, let alone end users.

/rant

Wow are you some kind of forum wizard?!?!

5 Likes

As a package developer, I’d be super excited for a lot of these stats. Especially the breakdown of Meteor use by version. e.g. when there’s a vast majority on 1.3+, I can say “use package version x for Meteor < 1.3” and can start using ecmascript/modules.

Likewise for installed packages (npm too). What percentage of Meteor installs are using react / angular? It’s motivation to add support in my package. I guess even, would be interesting to know what percentage of react users are running Meteor :>

Other stuff:

  • I’m ok with a notification on meteor create or meteor upgrade (on existing project) and ability to remove after (I do meteor create pretty often and agree this would be annoying; maybe better though, ask once and store answer in .meteorsession or somewhere).

  • ROOT_URL is not anonymous - clearly. I think you’d need another package for this, or not claim that it’s anonymous stats (which is all I think most people would be comfortable with). Anonymous would just use the URL to send an isProduction, etc.

  • IP address: Can’t speak for all but I’d be ok with the first 3 octects to get country / data center, which could be very useful.

  • @rozzzly ideas from post 3 are awesome :slight_smile:

And yeah, lol, I also had to google “discourse expandable post” after @rozzly’s last post :>

4 Likes

No, I just spend too much time one the internet. Apparently this works with most md parsers, and is supported in modern browsers. It works on github too!!

6 Likes

After some days’ thinking, I’ve changed my idea a little bit. This package is important for Meteor’s developing, but for privacy, we shouldn’t add it by default in 1.3. An optional opt-in question in the create process is a good idea, but I also want to make sure that this question won’t shown on every app creation, which can be very annoying. Also for this apps update to 1.3, don’t add it as default.

I kind of doubt that people would be OK with the world knowing the maxSessions of their app as a general rule. Then again, perhaps those people wouldn’t be OK with MDG knowing it either? I’d be interested to hear what others say on this too – my sense is that the customers I’ve worked with would be ok with the second but not the first.

Speak for yourself! I could be a small sample size though.

2 Likes

What about something like a .meteor/census-config.json file to set, which data are allowed to be sent to MDG? People could then decide on their own, what they’re okay with, and the .meteor/.id could be used to always identify the app.

1 Like

Make it an opt-in (NOT opt-out) package that you have to meteor add separately and everything is fine. Some, like us, just don’t want to send you any metrics in the first place. MDG ought to tread very carefully here, this is ground where a lot of companies have been burnt.

Looking at the code, do not add it to 1.3 as it is. Others already mentioned the problems storing IP addresses, and all enterprise people will balk at another thing to whitelist.

Thinking about it pragmatically, if it’s an opt-in (or opt-out, either way - opt-out is just more annoying!) package, all serious devs will leave it out, so I feel your metrics will be limited to the countless of numbers of test apps people do locally before shedding autopublish, insecure and other extra fluff that the people that do things the “right way” from the beginning never use in the first place. Hence I think this one needs to go back to the drawing board.

A few alternate ideas on gathering feedback for your use:

  1. Offer free Galaxy hosting time for doing developer questionnaires. Especially with free meteor.com hosting going down this would be a no brainer for you to get serious statistics and information on a large scale, and get developers to try out Galaxy even if it would be on a 512M instance for a month or two.

  2. Hire developer community outreach people who are proactive in contacting people who do anything serious with Meteor from all around the world. Build your contact list internally and treat it (and the people who are in contact with the developers) as gold. Conduct quarterly research questionnaires and such on these few hundred key people. Have some of these people be in an invite-only IRC/Slack/Telegram/Whatever chat where you discuss new features first with the people who are using your product the most in places that you do not have presence in before presenting a more public draft. In this case, a draft is not an intrusive PR to core.

  3. If the forum and GitHub aren’t sufficient enough, instead of building an internal tool, ask for the community to help you build something that works for both.

4 Likes

I very much applaud that you consider the “silent majority”, but as they are not active you cannot make any sound assumptions about their intentions, wants or needs - you are only dressing your own thoughts as those of a group of people. The vocal ones should be the ones driving the development - it’s not really that hard to register on a forum or GitHub. I doubt all the metrics you’d be getting from the census package as it is now would be much more efficient without major time investment on analytics on your part.

1 Like