MDG plan for a tracking package

I think the reality is that all of these sources only inform us about the most vocal and active members of the community. I think there is a silent majority of people who would never consider posting on the forums, don’t post GitHub issues, don’t ask Stack Overflow questions, and don’t go to meetups. I know I was this kind of developer before I started working at Meteor.

Perhaps the problems those people run into are not important, and we should measure how important problems are by how often people speak out about them. But it’s certainly true from what I’ve seen that it is impossible to get a good sampling of what is happening from the sources you mentioned. For example, people who are more active in the community might be a lot more interested in cutting-edge technology like React than people who are just chugging along getting work done.

I’m not espousing any particular opinion and I’m interested to hear what everyone here thinks about getting more data about real Meteor usage patterns in the wild; I just wanted to chime in and say, no, those sources of data are not very reliable.

11 Likes

@khamoud I don’t think it’s based on that assumption. The sense I get is MDG want’s to know who is using meteor and for what kinds of apps. That way they can gauge their audience. And that makes perfect sense to me from their point of view. It’s not necessarily altruistic; its pragmatic. But that does not mean the community can’t stand to benefit! Far from it. If MDG can get better insight into those two things, they can adjust their aim accordingly. Moreover, I would argue that the assumption is not totally incorrect. You just listed 7 different mediums for feedback, but don’t you think that having an standardized aggregate of issues or demographics have some merit? Those mediums give the qualitative insight into the developer experience. But quantitative measurements are pretty useful too :stuck_out_tongue: Those mediums, the community members are great, but that doesn’t mean MDG can’t aim to do better.

2 Likes

I’d like to know how this benefits MDG as a company. A few days ago MDG announced it was shutting down free hosting (which I support). I assume that announcement was to benefit MDG and not for “the benefit of everyone in the community” because the community seemed pretty upset about it.

Now MDG is going to opt-in developers to share their application data “anonymously” while keeping the IP address and application url.

I would be a lot less worried if the data were open source. The current data server is https://activity.meteor.com which has SSL and doesn’t display anything. Looking at MDG’s repos I couldn’t find an Activity repo nor could I find any mention of open sourcing the data in the PR. All this leads me to believe that the data won’t be open source.

If the point of the data is to help the community then the community should have access to the data that is being collected (if the point is to catch exceptions, performance, etc. then members of the community can help the same way they always have, by submitting PR’s).

My question is: will the data be open source? If yes, then no problem. If no, why not, how does this benefit MDG as a business, and how will it benefit the community?

2 Likes
  1. where’s the source code?
  2. is it devOnly?

Either way it should be optional. It will be a self inflicted wound (PR wise) if Meteor collects application data by default.

1 Like

This describes most enterprise developers. The ones who are using meteor aren’t really talking about it. Because of my company’s trade secret policies, I can’t even say what public facing applications I wrote, and yet they are heavy lifting applications which I would have loved to have been able to get certain hosting solutions in place long ago (talking about Galaxy and it’s limitations for me, but that’s for another conversation). I am definitely a bit of an exception because I answer Stack Overflow questions (at least a few of them), I make bug reports when I run into broken bugs, and as of this last weekend, I’m active here. There are two meteor developers in my company who are not (one doesn’t even have an account here). So within my own sample size, I’m a 1/3 minority.[quote=“manuel, post:8, topic:19471”]
Either way it should be optional. It will be a self inflicted wound (PR wise) if Meteor collects application data by default.
[/quote]

Yeah, especially with regards to companies who have various compliance requirements (PCI/HIPAA)[quote=“khamoud, post:7, topic:19471”]
If the point of the data is to help the community then the community should have access to the data that is being collected (if the point is to catch exceptions, performance, etc. then members of the community can help the same way they always have, by submitting PR’s).
[/quote]

It’s entirely possible that the data is to help the community that isn’t on these forums. But I could be grasping at straws here as there hasn’t been a formal announcement.

3 Likes

I did not read the whole thread, but auto enabled opt in to track IP and/or URL is a very very big no-go for Europe’s data protection laws and even bigger deal if you wish to have any relationship to the enterprise segment here.

Perspective: Google analytics in Germany is only allowed to provide the first three segments of an IP adress. Even if you have a mile of legal documents attached to the page. Don’t underestimate the huge sensitivity for data protection outside of the US.

If you wish to track, make it opt in on first run with appropriate data protection disclaimers / non-nagging.

9 Likes

I’m reposting my comment from the PR in case folks haven’t read it, then I want to address some of the points in this thread so far.

I wanted to add some clarity around why we plan to include the census package. Recently, we were looking at Meteor usage statistics and realized that given our existing telemetry we can only identify about 20% of apps using Meteor. That means a large fraction of our usage comes from apps that we know absolutely nothing about.

This is less than ideal as knowing who our users are and how they’re using our platform helps us make informed decisions towards creating a great Meteor roadmap. The insight gained from better telemetry results in a roadmap that increases worldwide Meteor adoption (which helps everyone in the community - more contributors, more add-ons/services, more talent, more jobs). For instance, knowing what breakdown of apps are built by enterprises vs startups vs individuals allows us to tailor our learning materials, tutorials, features and commercial offerings accordingly. Knowing what scale folks use Meteor at (by collecting the maxSessions stat - https://github.com/meteor/meteor/pull/6469/files#diff-a7c6c405e95f7fbaeac0c4645b12c77eR27) informs us on how much engineering resources to allocate towards scaling/performance.

Ultimately, we’re doing this to benefit everyone in the community by gathering objective data that informs where to spend our resources in the most impactful way. The difficult decision lies in where to strike the balance between the package being too intrusive but useful vs being friendly but useless.

@laosb I like your suggestion of explicitly notifying users that we’ve added the package.

First off, I want to say I think we put the cart before the horse here by writing the code and opening the PR before engaging the community first. We’re an engineering driven company and sometimes code is easier to write than prose. We won’t make that mistake again. The immediate point of this package is perfectly summed up by @rozzzly when they said “The sense I get is MDG want’s to know who is using meteor and for what kinds of apps. That way they can gauge their audience.” . To that end, the most important pieces of data we’d like to gather are rootUrl, maxSessions and version (Meteor’s).

The challenge that I alluded to in my comment on the PR is to have enough people be comfortable with providing this data for it to be useful. If 80% of apps remain in the shadows we haven’t really achieved anything.

Having said that, we absolutely do not want users to be unaware that we’re collecting this data. Further, it should be trivial for users who do not want to give us the data to opt-out. It’s the users in the middle who don’t much care either way whom we would really want to encourage to opt-in - any ideas on the best way to do this would be appreciated.

Now to specific questions/concerns:

You should make the package opt-in via a prompted Y/N

We figured providing a clear message when running meteor create that the package had been added and where to read more about it as well as instructions for removing it (meteor remove census) would suffice. The reasons we thought this was better was:

a) It’s less friction than having to pause, think and press a key everytime you create an app.
b) meteor create can still be run from scripts without the need to add further complexity by introducing additional command line flags.
c) This way those users who are indifferent towards providing the census data are less likely to opt-out as it would mean typing another command.

What information is being sent?

The data we’re proposing to send is here. We’ll add more documentation on exactly what this stuff means and why we’re sending it.

You shouldn’t be sending my server’s ip.

This data was included unintentionally, we’re going to remove it.

You should open-source the data.

Actually, this was our intention all along. We plan on building infographics and writing about the findings we glean from the data. We hadn’t intended to open-up public access to the database - this is in interesting idea though. Would people be comfortable with it? It’s kind of like the actual census data. We’re all comfortable with providing it, the results are public and in fact very useful to everyone - but it might be not be a great idea if people could look up the data based on street address.

Where is the source code?

The source code currently lives in this PR. Once merged it will live in the mainline Meteor repository.

Thanks to everyone for participating in this thread. I appreciate your feedback and want to stress that we’re not out to ship something that’s going to alienate the community.

12 Likes

Excellent response @zoltan. I think the y/n being too demanding is a taaaaaaaad bit facetious hahaha I doubt many people are running the cli generator every day


Damn. Thats ridiculous. I’m all for privacy, but that protection is just silly. “oh no I only can see 3/4 of the ip… there’s only 256 other possibilities…” are you kidding me? Even better, I can post an image here via markdown which has url of a php script that serves a 1x1 pixel transparent .gif with the correct MIME Type and a little bit of binary data that logs information about you… here watch

Expand to view my rant about how that's ineffective and does more harm than good (collapsed because its offtopic)

Took me 30 seconds to find [this StackOverflow](https://stackoverflow.com/questions/4665960/most-efficient-way-to- display-a-1x1-gif-tracking-pixel-web-beacon) post:

header(‘Content-Type: image/gif’);
echo “\x47\x49\x46\x38\x37\x61\x1\x0\x1\x0\x80\x0\x0\xfc\x6a\x6c\x0\x0\x0\x2c\x0\x0\x0\x0\x1\x0\x1\x0\x0\x2\x2\x44\x1\x0\x3b”;

Then I just read the incoming request object, store it’s values. That means I could grab all of your IPs, UserAgents, referrers, etc. If that embedded image wasn’t limited to one page, I could, under the right conditions, set cookies and watch you jump from page to page. If I have this img on few sites, I can track you across domains too. You > really don’t even need cookies,it just makes the results neater.

And all I did was post an image that you really can’t prevent, or even notice, unless you’re super intent of remaining anonymous, in which case you should be using a vpn & filters so none of that matters >.>

This is just how the internet works. Google might be big enough where they can’t risk bad PR and thus comply, but 99% of the internet doesn’t give a shit and they’re going to track you anyways however they can. Those laws should focus more on going after tangible corporations who abuse that data (and I’m sure it does that too). But to call that protection is absurd. Atleast DNT makes it obvious that it’s just a non-binding request saying please don’t (that no server admin cares about)

I’m sorry, that’s not “protection.” If anything, privacy policies like just lull people into a false sense of security. “ope you cant see my ip, im safe and secure now!” How about actually addressing the issues that matter, which we can do something about. EndToEnd encryption everywhere, no excuses.

disclaimer; there are definitely ways to prevent tracking that are not insignificant. But, they are involved. Whether you’re browsing, or an admin attempting to buffer the users from external resources on your site, there’s a fair amount of work for something that goes unnoticed by way too many developers, let alone end users.

/rant

Wow are you some kind of forum wizard?!?!

5 Likes

As a package developer, I’d be super excited for a lot of these stats. Especially the breakdown of Meteor use by version. e.g. when there’s a vast majority on 1.3+, I can say “use package version x for Meteor < 1.3” and can start using ecmascript/modules.

Likewise for installed packages (npm too). What percentage of Meteor installs are using react / angular? It’s motivation to add support in my package. I guess even, would be interesting to know what percentage of react users are running Meteor :>

Other stuff:

  • I’m ok with a notification on meteor create or meteor upgrade (on existing project) and ability to remove after (I do meteor create pretty often and agree this would be annoying; maybe better though, ask once and store answer in .meteorsession or somewhere).

  • ROOT_URL is not anonymous - clearly. I think you’d need another package for this, or not claim that it’s anonymous stats (which is all I think most people would be comfortable with). Anonymous would just use the URL to send an isProduction, etc.

  • IP address: Can’t speak for all but I’d be ok with the first 3 octects to get country / data center, which could be very useful.

  • @rozzzly ideas from post 3 are awesome :slight_smile:

And yeah, lol, I also had to google “discourse expandable post” after @rozzly’s last post :>

4 Likes

No, I just spend too much time one the internet. Apparently this works with most md parsers, and is supported in modern browsers. It works on github too!!

6 Likes

After some days’ thinking, I’ve changed my idea a little bit. This package is important for Meteor’s developing, but for privacy, we shouldn’t add it by default in 1.3. An optional opt-in question in the create process is a good idea, but I also want to make sure that this question won’t shown on every app creation, which can be very annoying. Also for this apps update to 1.3, don’t add it as default.

I kind of doubt that people would be OK with the world knowing the maxSessions of their app as a general rule. Then again, perhaps those people wouldn’t be OK with MDG knowing it either? I’d be interested to hear what others say on this too – my sense is that the customers I’ve worked with would be ok with the second but not the first.

Speak for yourself! I could be a small sample size though.

2 Likes

What about something like a .meteor/census-config.json file to set, which data are allowed to be sent to MDG? People could then decide on their own, what they’re okay with, and the .meteor/.id could be used to always identify the app.

1 Like

Make it an opt-in (NOT opt-out) package that you have to meteor add separately and everything is fine. Some, like us, just don’t want to send you any metrics in the first place. MDG ought to tread very carefully here, this is ground where a lot of companies have been burnt.

Looking at the code, do not add it to 1.3 as it is. Others already mentioned the problems storing IP addresses, and all enterprise people will balk at another thing to whitelist.

Thinking about it pragmatically, if it’s an opt-in (or opt-out, either way - opt-out is just more annoying!) package, all serious devs will leave it out, so I feel your metrics will be limited to the countless of numbers of test apps people do locally before shedding autopublish, insecure and other extra fluff that the people that do things the “right way” from the beginning never use in the first place. Hence I think this one needs to go back to the drawing board.

A few alternate ideas on gathering feedback for your use:

  1. Offer free Galaxy hosting time for doing developer questionnaires. Especially with free meteor.com hosting going down this would be a no brainer for you to get serious statistics and information on a large scale, and get developers to try out Galaxy even if it would be on a 512M instance for a month or two.

  2. Hire developer community outreach people who are proactive in contacting people who do anything serious with Meteor from all around the world. Build your contact list internally and treat it (and the people who are in contact with the developers) as gold. Conduct quarterly research questionnaires and such on these few hundred key people. Have some of these people be in an invite-only IRC/Slack/Telegram/Whatever chat where you discuss new features first with the people who are using your product the most in places that you do not have presence in before presenting a more public draft. In this case, a draft is not an intrusive PR to core.

  3. If the forum and GitHub aren’t sufficient enough, instead of building an internal tool, ask for the community to help you build something that works for both.

4 Likes

I very much applaud that you consider the “silent majority”, but as they are not active you cannot make any sound assumptions about their intentions, wants or needs - you are only dressing your own thoughts as those of a group of people. The vocal ones should be the ones driving the development - it’s not really that hard to register on a forum or GitHub. I doubt all the metrics you’d be getting from the census package as it is now would be much more efficient without major time investment on analytics on your part.

1 Like

At this moment, I think of all the times I had to tell people that they really should remove the Autopublish package and no, it’s not “b-b-but it’s a core package built in by default so it is meant to be there or something will break (and it does break when I remove this package)”, even if your messages about the need to remove this package are very clear.

6 Likes

@zoltan

First off, I want to say I think we put the cart before the horse here by writing the code and opening the PR before engaging the community first. We’re an engineering driven company and sometimes code is easier to write than prose. We won’t make that mistake again.

Thank you dearly for recognizing this. I think this has been becoming a “theme” lately and usually is the main grounds for the disappointment and occasional turmoil.

@zoltan

If 80% of apps remain in the shadows we haven’t really achieved anything.

I don’t agree with this. Statistics is an interesting science. You’ll be amazed to see what you can achieve with the data you’ll get from 0.0001% of the apps.

@tmeasday

I kind of doubt that people would be OK with the world knowing the maxSessions of their app as a general rule. Then again, perhaps those people wouldn’t be OK with MDG knowing it either?

This may be debatable in either case, but I am sure I can get some of my clients to share their data both for you to see and publish publicly, while for others, I’m more than sure that even hinting at the possibility of such data disclosure to outside their company network would get me crucified. So that’s why I whole heartedly agree with:

@juho

Make it an opt-in (NOT opt-out) package that you have to meteor add separately and everything is fine.

and let you know that I will opt-in for some of my apps and that

@fvg

What about something like a .meteor/census-config.json file to set, which data are allowed to be sent to MDG?

is an even better suggestion because a blanket-rule about opting-in and out is just too broad. For people like me, who would like to actually help the platform grow and improve, it would be a nice way to contribute back to you and the community.

And @brajt comment about how scary it may sound for a lot of newcomers to remove a default package is spot on.

@rozzzly

Damn. Thats ridiculous. I’m all for privacy, but that protection is just silly. “oh no I only can see 3/4 of the ip… there’s only 256 other possibilities…” are you kidding me?

Come on! Laws are there for a reason and while you are more than welcome to break them, as long as you are ready to face the consequences. But then, the better you would just respect them for what they are, general consensus by a community of people who have decided that it is important for them to lay down some rules about the issue. By your reasoning, murder should not be illegal because it is perfectly easy for anyone to grab a knife and stab someone random.

3 Likes

Ok, here’s a more optimistic comment (but I do stand by the one above):

This gets me thinking - and I know limited resources, priorities, free stuff are hard to maintain yadi yada - what if MDG offered a free app-analytics package, nothing too fancy, just some basic functionality, in exchange for the rights for MDG to collectively analyze that data for their own development plans?

Most of us do trust @arunoda with our data on Kadira, don’t we? Why not put the same trust in MDG?

And when open sourced (both or either one of the data or analytics code), the community could contribute to do all kinds of interesting stuff that could benefit everyone.

3 Likes

Cool trick with the expandables!

I know you viewed this from a technical side, but this is about handing over identifiable information to 3rd parties without a propper consent. And - as, I guess, illustrated by you - the ignorance that about 1,5 times the population of the US value privacy much much higher than the average American. Took a while for Google to learn that too.

Injecting a tracking package into an open source distribution that sends data to a business (not a foundation or NGO), is exactly the kind of move that makes it impossible for me to suggest meteor as technology while fencing off questions regarding the reliability of the vendor.

Meteor is a glorified build tool. Imagine webpack would send tracking data to webpack Inc. You guys need to learn and practice old fashion market research. With only a 20% gab, the statistical confidence interval should be way big enough to draw conclusions.

Than expecting to track enterprise usage? This is just the kind of example why companies fear that their engineers even experiment with this kind of technology and accidently send data of some secret project over the wire. A great excuse to block the adoption of Meteor.

@zoltan Use statistics and market research validation of your server telemetry. Tracking of a web hosting company - on the client side - has no place in open source technology that requires trust and more trust from a broad range of stakeholders of which some also look for reasons not to allow the use of the meteor open source project.

Understand this Enterprise Example:

Imagine I sit together with the CTO and his senior dev to evaluate some technology stack. I talk about meteor, an open source project - so no risk of a dying vendor or copyright material. Than they ask me “How do they make money?”. I say “They try to build a infrastructure hosting company around this. But the software is and stays open source.”.
Than they want to see it. I go on my, or sometimes, the CTO machine and download meteor, create a new project in front of their eyes and run it.

I know exactly what will happen next
A) No tracking message
The CTO plays around for the afternoon and while he reads into the docs, he figures out he has been tracked. In some case even violating his own foreign infrastructure policies because the code sends out over the firewall.

B) A notice is shown, that tracking can be removed with "remove consensus"
The CTO will ask immediately: “You said it’s independent. What else do they track and how do we remove it? Is this a “closed” open source project? We can not have anything tracking our users/clients/project infrastructure without knowing exactly what is send around.” The demo ends.

C) Some message for opt in tracking by MDG
CTO: “We can not allow any tracking of our infrastructure. Please choose NO. Is that the only part we need to be worried about data leaks? What is the role of this MDG really and is it likely they just turn around on technology aspects if they can inject their own tracking package into an open source project?”

Data is the currency of the 21 century. If you want to be taken seriously, value the Data of others as if you are about to pick pocket and think about the trust relationship you will have with that someone/victim later.

On this, it really decides if meteor is really an open source project with no immediate profit goals or a marketing tool of MDG. The later destroys more than any value you could create in analyzing all the data points of meteor apps.

12 Likes