I can do a quick summary of what's being proposed and why. I work in the JS team at Mozilla and deal directly with the problems caused by insufficient data. Please note that I'm speaking for myself here, and not on behalf of Mozilla as a whole.
Tracking down regressions, crashes, and perf issues without good telemetry about how often it's happening and in what context. Issues that might have otherwise taken a few days to resolve with good info, become multi-week efforts at reproduction-of-the-issue with little information.
It simply boils down to the fact that we can't build a better browser without good information on how it's behaving in the wild.
That's the pain point anyway. Mozilla's general mission, however, makes it very difficult to collect detailed data - user privacy is paramount. So we have two major issues that conflict: the need to get better information about how the product is serving users, and the need for users to be secure in their browsing habits.
We also know from history that benevolent intent is not that significant. Organizations change, and intents change, and data that's collected now with good intent can be used with bad intent in the future. So we need to be careful about whatever compromise we choose, to ensure that a change of intent in the future doesn't compromise our original guarantees to the user.
This is a proposed compromise that is being floated. Don't collect URLs, but only top-level+1 domains (e.g. images.google.com), and associate information with that. That lets us know broadly what sites we are seeing problems on, hopefully without compromising the user's privacy too much. Also, the information associated with the site is performance data: the time spent by the longest garbage-collection, paint janks.
This is a difficult compromise to make, which is why I assume it took so long for Mozilla to come around to proposing this. These public outreaches are almost always the last stage of a length internal discussion on whether proposals fit within our mission or not.
I'm not directly involved in this proposal, but I personally think it's necessary, and strikes a reasonable balance between the privacy-for-users and actionable-information-for-developers requirements.
> Tracking down regressions, crashes, and perf issues without good telemetry about how often it's happening and in what context.
If that's what you're aiming at. Collect the data but keep it local. Install some sort of responsiveness/"problem" monitoring. Ask the user to send data relevant to the problem if a problem occurs. IMHO there is no need to systematically collect user data for that.
Or get the data from a random sample of users. You don't need data from everyone.
> Or get the data from a random sample of users. You don't need data from everyone.
To my amateur ear, that actually sounds like a good compromise to lessen the blow somewhat more. You should suggest it to Mozilla :)
That's what's proposed here. I guess no one actually read the post...?
There is no mention that once validated, that the RAPPOR-based metrics when fully deployed would be take from a random population. Only that the initial study of the system will be done to a random population.
FTA:
What we plan to do now is run an opt-out SHIELD study [6] to validate our implementation of RAPPOR. This study will collect the value for users’ home page (eTLD+1) for a randomly selected group of our release population We are hoping to launch this in mid-September."
Notably:
"this study will collect ... for a randomly selected group"
[6] - https://wiki.mozilla.org/Firefox/Shield/Shield_Studies
I added the second part about the random sample later to the comment when I moved the proposal already out of my short term memory. I hope they use the data from their initial study to test whether the opt-out group actually is different from the group they already get data from.
I'm not sure how that would help. If I opt-out of data collection, I don't think I'd be particularly pleased if I get randomly selected to be one of the users in this "random sample" and the stats get sent anyway.
And if I opt-in to data collection, why would it matter to me whether the stats I'm sending are a result of me being selected as part of a random sample or not? Might as well just _always_ send those stats; it doesn't matter to me.
This. Firefox prompts for feedback semi-regularly. I seem to recall it even bends over backwards to make it end-user friendly by having "Firefox made me happy / Firefox made me sad" options. It seems like it would not be difficult to tie that screen to a secondary prompt that says, "Can you briefly turn on some additional telemetry for us so that we can try to fix the problem?" Let the users make that choice to temporarily lower their shields so that you can get some useful data out of their machines, in exchange for the implicit pledge that you will use this data for troubleshooting this one explicit issue (i.e. whatever prompted them to click "Firefox made me sad").
That seems like a reasonable compromise to me. I'm happy to send logs if my browser crashes whenever I visit a certain page, and if I know I'm gonna be monitored for that period, I'll isolate my browsing habit to only visit that page. I do not consent, however, to sending everything--even anonymized--on the offchance that Mozilla will see the crash events and use it to flag that domain and maybe fix the issue on that particular page.
Awesome idea... why not introduce a "reproduce bug" mode which basically monitors all things in detail. If people are annoyed enough to send bug reports there is a good chance they will use it if properly presented. If people are not filing bug reports you don't really have a business there... If you generally need more of this telemetry advertise it and use it yourself...
That sounds way more reasonable to me.
"This is a difficult compromise to make"
Then don't make the compromise.
As others have expressed here the reason few people opt in to data collection may be because they have chosen to use a Web browser that does not mandate the collection of data.
I'm assuming there will always be an opt out which I shall add to my list of things I have to do when installing Firefox.
> I'm assuming there will always be an opt out which I shall add to my list of things I have to do when installing Firefox.
There will be. Sorry for the hassle :(
How can I recommend my friends to use Firefox when I know they wont remember to opt out?
The way I see it is that if Firefox's userbase dwindles because of this, either we get our Firefox with opt-out telemetry or... Firefox dies. And now we have a Chrome monopoly.
I'm not sure I like that gamble.
On Linux it may be that the various distributions decide to repackage Firefox with the default setting flipped. Not sure about the various policies on that one.
The ESR track presumably will have the default flipped because corporations get funny about data transfers to remote servers - mind you Microsoft seem to be getting away with it for business who don't have a full on Enterprise set up.
> I'm not directly involved in this proposal, but I personally think it's necessary, and strikes a reasonable balance between the privacy-for-users and actionable-information-for-developers requirements.
I use Firefox and always opt into any telemetry that sends data back to Mozilla. You could say I am a fanboy. I think it is a HORRIBLE idea and Mozilla should scrap it yesterday and never bring it up again. If people bring it up again, send them to the roof team (if it doesn't exist, create one). If they come downstairs, fire them. You already have people like me who are willing to opt-in to every single thing you can try. For example, Firefox nightly on Android has consistently crashed for me about every five minutes or so since the last weekend and yet I keep using it. Don't throw away this goodwill.
The problem here is that, for certain types of data, statistics obtained exclusively from users who opt-in to data collection aren't very useful because they're heavily biased in favor of the type of user likely to opt-in (which often isn't very well representative of the average user).
Bias can be corrected statistically.
Statistics are not a substitute for the long-tail effect.
Lack of reporting from non-technical people who aren't aware they can opt-in cannot be corrected statistically, as the two categories of people (technical, non-technical) use the browser very differently.
For made up example, if you type "Yahoo" into the search bar and then type "Search" into the field and then type your search into the third page, you'll be acting as many normal world users do, and you may uncover crashes on page #2 at Yahoo that a technical user would never encounter, simply because they wouldn't type the word "Search" into the search field at Yahoo and trigger a JS bug where "Search" or "Yahoo" gets used one too many places and ends up crashing the CSS parser because it race conditions with repaint.
If that problem affects 0.01% of the Firefox population, that's a lot of people who don't think technically, and do feel regret when we crash and can't help them because we can't see where it crashed.
(Yes, employed. No, I didn't talk to anyone else before I posted here. My own thoughts, I am not a number^Wcorporation, etc.)
This is a horrible development. If Mozilla starts collecting this sort of data on an opt-out basis, it will put many users at risk. Seriously, WTF?
> This is a proposed compromise that is being floated. Don't collect URLs, but only top-level+1 domains (e.g. images.google.com), and associate information with that. That lets us know broadly what sites we are seeing problems on, hopefully without compromising the user's privacy too much.
Sure, there's no problem with images.google.com because it's generically innocuous. But what about pornhub.com for users in Saudi Arabia? Or some Japanese site that's essentially child porn for users in the US? The top-level+1 domain in many cases is totally incriminating.
> Also, the information associated with the site is performance data: the time spent by the longest garbage-collection, paint janks.
Maybe so. But it's collection of the top-level+1 domain that's the problem.
> I'm not directly involved in this proposal, but I personally think it's necessary, and strikes a reasonable balance between the privacy-for-users and actionable-information-for-developers requirements.
Fine. But then, make it opt-in, to protect users.
Many problems here:
1. You're proposing a mechanism for collecting data, and a strategy for extracting more data than you currently do. You have not figured out the type of data that you will finally need, only a set of things that you currently envision. Naturally, the data that you will collect in the future will be more than what you currently envision. There is built-in mission creep that is dangerous.
2. What you currently envision is not fleshed out as especially useful. You only believe it is useful. The pain point of biased data is red herring. Your concern is more about not enough data.
3. You have found a technology which you believe will allow you to collect a lot of data anonymously. But none of you seem to understand the technology very well. It seems like a shiny toy that you are eager to go to town with. I am not sure this is the right attitude.
4. You're proposing to use your users in lieu of proper testers, or to save time. There are many ways to properly test software and to save time. Have they been explored? There used to be a time when beta software was a thing. Prompt the users to become testers for your beta software. If users don't want to be testers then don't collect data from them. How much data do you actually need anyway? Have you fully utilized your existing data?
Over all, I see this as a nice-to-have luxury, not some life-and-death situation, and subverting the goodwill of users is not worth it, IMHO.
Differential privacy is relatively battle-tested. I wouldn't be too worried about it standing up to scrutiny.
The problem with differential privacy is I have to trust the person aggregating the data to actually do it.
Do you? Excuse my ignorance, but I thought there was a way to locally mangle the data before submitting. Is that not what apple is doing?
>Is that not what apple is doing?
I don't know, is it? How would I check, if I consider apple an untrusted actor?
For the case of RAPPOR (and for what Apple is doing), you do not need to trust the aggregator with your data. These algorithms operate in the "local" model of differential privacy, where all privatization occurs on the users' local machines before being sent to the aggregator.
This is incorrect, at least in theory. RAPPOR is designed to protect the user's data even if an attacker can see all of their individual responses over time. Of course, there could be implementation issues...
> There used to be a time when beta software was a thing. Prompt the users to become testers for your beta software.
Firefox already has opt-in telemetry, and Firefox already has a beta channel. It's unclear to me how it would help to tie telemetry to the beta channel; that would just make the existing problems (not enough data, and biased data) even worse, since there are probably far more users willing to share telemetry data than to use beta software.
In context, that might mean if there has to be some opt-out situation then opt-out for the beta channel might be slightly more acceptable.
If it's so harmless, let users opt-in. Adding data collection via an opt-out is shameful, it shows that you know people would not want this and yet you'd prefer to get more data anyway.
Thanks for your input. Glad to hear someone from the Mozilla team on this thread.
Its an interesting compromise... because without improved performance and features, we'll lose Firefox entirely, and all of the relative privacy / security gains that entails. This is a good example where "perfect" privacy that reaches only a few is the enemy of "good" privacy that reaches more people.
Firefox must continue to exist if we are to have any browser without an economic incentive to be user-hostile. If they need performance traces from websites, and they have an open, clear discussion of how to preserve as much user privacy as possible, they should collect them.
The data collection MAKES it user-hostile. If they start collecting data, then there's no point for Firefox to exist - they're just a crappier version of Chrome.
If user privacy is paramount, then there are multiple ways to lower the privacy incursion that is caused by the data collection.
Only collect top-level domains of Alexa rank 1k. That users are using a highway is less sensitive than a specific street where there only exists 5 homes, and it reassures users that private domain names won't be leaked.
Send the data through Tor. That way you only get the data about the browser <-> site interaction, not user<->browser<->site interaction.
And make it opt-in and notify users of the purpose of the data collection. A good model to follow here is Debian installer and popcon. Follow the good practices of data collection in the free software world and do not use dark patterns.
This is a reasonable compromise, but it does bias the sample towards popular sites. Granted, many of these sites are the sites that Firefox struggles with, but browsing habits are a heavy-tailed distribution. That said, smaller sites do open the door for problems, so it's probably a workable compromise.
EDIT: It should also be completely disabled in Private Browsing mode -- otherwise the optics are even worse than they are now.
> there are multiple ways to lower the privacy incursion that is caused by the data collection
The OP actually discusses a very interesting method for doing exactly that using differential privacy techniques. I personally think that's a very good compromise for this use-case.
From the OP, one suggestion is to collect "top-level+1 domains". This don't solve the issue of a person going to "starting_a_union_inside_company_x.com", which would be a top-level domain. Niche domains don't have a large number of users and as such the users can be trivial to deanonymized. It is also rather common that domain name servers have a private and public side. Firefox could easily become a vector of leaking from the private side, possible revealing sensitive information such as unannounced products.
From the OP we can also see that they don't intend to store IP-addresses, but it will always be possible. By using a anonymity network they can reassure the user and at the same time eliminate the risk that a malicious actor in the future will silently manage to start tracking information about which websites users go to. Additional benefits is that Mozilla also won't become a target for governments, a risk that no organization can ever be safe from if they start gather information about users.
It is not enough to strike a reasonable balance between the privacy-for-users and actionable-information-for-developers. You also need to find a balance between risk management and time spent on reducing risks. What I propose primarily is that they spend a bit more time on reducing risks, as that would benefit everyone.
Even Alexa 1k could be quite sensitive, for example there are many porn sites in that list.
As an organization, we are very aware that some of the sites people visit using our browser would humiliate them if someone could draw a link between who they are and where they visited. This isn't restricted to porn, but that's certainly the most widely known category of site that falls under this heading. We consider this carefully every time we do anything with any user data ever, whether a crash report or the TLD+1 proposal described above.
EDIT: Don't forget that the DNS resolution for porn sites can be deanonymized and resold by your internet provider - there's nothing we can do to protect you from DNS being a cleartext, sniffable, mitm'able protocol.
Mozilla's crash reporter already has the option of submitting the URL.
There are a couple different reasons crash reports aren't sufficient:
1. Crash reports only report crashes. We need also want to see perf issues like GC and paint jank, etc.
2. Crash reports don't sample the general population, so statistically the information is less useful. If we get a perf issue, it's very important to know whether that issue is suffered by 10% of the users in general pop, or 0.5% of users in general pop. You want to prioritize the stuff that has the greatest impact on the general user population.
Lastly, crash reports are sort of a boolean filter - you only get the people that crash. The things I'd like to know to help in my development are things like "what is the histogram of max GC pause times on docs.google.com". Getting that info requires a good random sampling of the population, not just those who exhibit problems.
> Lastly, crash reports are sort of a boolean filter - you only get the people that crash. The things I'd like to know to help in my development are things like "what is the histogram of max GC pause times on docs.google.com". Getting that info requires a good random sampling of the population, not just those who exhibit problems.
PLEASE do not go down this road. Look where "optimizing" video card drivers has led the video game industry. Game engine developers and game developers are lazier than ever. It is not up to you to make sure docs.google.com runs well on your browser. It is up to you to provide browser that adheres to (and defines if it must) standards. It is up to the web developers at docs dot google dot com to make their application work on Mozilla Firefox.
This is getting off-topic, but it's interesting. I think I have the exact opposite take on things from you :)
A program written by a developer and used by a user is a relationship between that developer and the user. I just work on the platform that allows that relationship to exist. I feel it's overstepping our boundaries as platform providers to say "we're not going to make this platform faster for you because we think developers are writing bad code using that performance as a crutch".
It feels like I'd be setting myself up as a self-appointed clergy over moral matters in software development. It's not a hat I'm comfortable with.
Why not think about the program you are working on as a program that is built to support the open standards that enable people to communicate and concentrate on performance within these standards? If someone wrote a bad performing non standard compliant code the program should throw an error.
Making bad code run faster is overstepping the boundaries.
But we're not making "bad code" run faster. We're making code run faster. The original counterpoint was that we shouldn't be, because improving the performance just gives leeway for bad programmers to use it as a crutch.
We don't prioritize bad code for optimization. See usage of 'with' in Javascript. We don't actively try to make it worse, but whenever a decision is presented which regresses 'with' performance for gains somewhere else, it'll probably be taken because we don't care about 'with' running fast.
But the example I mentioned: histograms of max GC pause times on a particular website. Or particularly bad janks, or long amounts of time spent in JS which might be the result of poor JS execution..
None of these optimize "bad code". They're just standard platform performance optimizations that help all programs. That will include "bad" programs as well.
THAT is your use case? And this just CAN NOT be done from opt-in? Makes no sense.
If mozilla can't see how utterly insane this is then there is no hope left.
>A program written by a developer and used by a user is a relationship between that developer and the user.
What about the relationship between you/Mozilla and the firefox users? This thread is evidence that at least some of the users are not happy that you are (in their eyes) sacrificing their privacy for future performance gains.
Making optimizations based on telemetry from real world sites doesn't mean you're optimizing for that one site only, like a video driver including hacks for a specific game. For example, shifting an array in Firefox used to be O(n) vs. O(1) in the competition [1]. Improving these sort of code paths benefits the entire web, even if the performance issue is discovered and profiled on docs.google.com.
[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1348772
> Making optimizations based on telemetry from real world sites doesn't mean you're optimizing for that one site only, like a video driver including hacks for a specific game. For example, shifting an array in Firefox used to be O(n) vs. O(1) in the competition [1]. Improving these sort of code paths benefits the entire web, even if the performance issue is discovered and profiled on docs.google.com.
Again, the question should be if something "benefits the entire web", how can we discover it without an opt-out anti-feature? If the answer is we can't, then we don't want it. It is as simple as that.
>You only get the people who crash
Uh - these are the most important people. The. Most. Important. The people you just pissed off by taking a header in the middle of whatever it was they are doing. Your performance noodling is irrelevant if you aren't addressing those issues.
I'm sorry, but you make the team sound incredibly out of touch with statements like this. To offset the other platforms advantages in marketing visibility, Mozilla has to be better across the board to survive, so unless you guys aren't crashing at all now, I'd say that this should be job #1.
Yes, Firefox developers can only work on addressing crashes or performance issues, but not both.
Sounds good. I'll just work on making things crash faster then.
1. Then why not add a "perf reporter" and a "paint jank" reporter?
"Hi! It seems that this page is loading unusually slowly, would you mind submitting more details to help Mozilla diagnose the issue?
Click `More Details` to see exactly what information is being reported."
You even already have a good entry point for one of these - the "unresponsive script" dialog.
Personally, I'm far more likely to send you this data (after having looked over it) than even the opt-out case. If I have to opt-out of all data collection to be sure I don't accidentally report www.really-illegal-pornography.com to Mozilla I'll opt out and you'll never see any information from me at all. If I can avoid sending reports for www.reall-illegal-pornography.com but still report lots-of-annoying-javascript.google.com than you'd get more out of me.
2. If the issue is reported 10x more often on docs.google.com than on obscure.yahoo.com only because docs.google.com is far more common (even though the problem happens only on 0.00001% of visits to docs.google.com but on 10% of visits to obscure.yahoo.com) it does indicate that the issue in docs.google.com is more important. Sure it is rarer per visit, but a user is still 10x more likely to encounter it.
>You even already have a good entry point for one of these - the "unresponsive script" dialog.
Thanks for bringing that up.
Top-level domains are still betraying the user's privacy. Does it bug me that PornoTube is significantly laggier on Firefox than YouTube? Sure. Do I want Mozilla to know that I'm visiting it? Hell no.
They wouldn't know that you are visiting it, just that someone is visiting it.
How can they not know that I'm visiting it? I mean, the data is coming from my IP address. Sure, they may be dropping that data before storage. But what if it's intercepted?
Connections to the Mozilla Telemetry server are done over HTTPS, so all an interceptor would know is that you are sending Telemetry and not what that Telemetry is.
OK, fair enough. Then an adversary inside Mozilla that can intercept the data. I mean, the NSA is inside Mozilla, right? It'd be foolish, in my opinion, to assume that they're not. Such a juicy target, and all.
No compromise, I switched to FF on Android to avoid this crap from Chrome and now you'll do it as well.
I look forwards to the fork.
They already said that anything they would do would have an opt-out available.
Regardless, opt-in should not be the default.
They say they have to default to opt-in because otherwise users will not enable this type of data collection. That, in my opinion, should be the #1 indicator that users DO NOT want this collection happening in the first place. They default to opt-in because they know most people won't opt-out, either because they forget, aren't aware that it is happening, or various other reasons.
I'm okay with them collecting any data they want to so long as it is opt-in (because I never will). Mozilla is slowly eroding their original, core values.
Palemoon on android works wonderfully.
Take a list of sites, for example Alexa top 10,000 and make an automatic script that browse these sites and collect whatever information you need. Have a bunch of devices, phones, laptops, PC's from different brands doing this. This will not cost much and you don't have to spy on your users.
> Don't collect URLs, but only top-level+1 domains (e.g. images.google.com), and associate information with that.
Using "images.google.com" as an example is too convenient.
That would be great if you could also add whatever TLD+1 most people would rather keep private as another example right after "images.google.com".
>This is a proposed compromise that is being floated. Don't collect URLs, but only top-level+1 domains (e.g. images.google.com), and associate information with that.
Until sites start programmatically generating a unique subdomain for each [Firefox] user.
> Don't collect URLs, but only top-level+1 domains (e.g. images.google.com)
Do you consider images.google.com to be eTLD+1? The eTLD would be .com; so, eTLD+1 would be google.com; and hence, images.google.com would be eTLD+2?
eTLD: https://en.wikipedia.org/wiki/Public_Suffix_List
This is clearly an example of the infamous off-by-one error.
Yeah, you're right. Thanks for the correction. It's eTLD+1, I just erroneously used images.google.com as an example.
>>This is a difficult compromise to make,
Sorry I do not accept this compromise. Mozilla seems to have lost its way of late. Sad to see a company that was at the fore front of Privacy, and Security abandon that in name of market share and performance.
I would rather sacrifice performance for privacy, not the other way around.
From EME, to the adoption of Browser Extensions as the only customization option, now this.... Mozilla and FF is changing in ways that are harmful to the open, secure, and private web. Following the trends and policies of MS and Google is not the correct path.
I think the core disagreement here is not ideological per se, but on premises. I agree with the motivation of not collecting any data.
That said, I don't feel that we have a choice but to compromise. If we don't build a better browser, then the other browsers will win by default, which means you lose all those privacy and security motivations anyway.
This is not some gleeful romp down the yellow brick road of data collection. It's a hard-searched, difficult compromise to a question that there are no good answers to, and LOTS of disagreement about.
What are we attempting to "win". Again I go back to my statement of compromising princibles in the name of market share
I have used FF since Ver 1.0 for a few reasons the top ones being it is Open Source, it has always been the most privacy and security focused browser, and were strong advocates of Open Standards that where inter-operable on ALL platforms with out vendor lock in
FF is still open source.... the rest that seems to be in flux now.
> What are we attempting to "win". Again I go back to my statement of compromising princibles in the name of market share
I don't see it as an either-or, but rather a balance to strike. A perfectly private browser with no marketshare doesn't help users. A completely compromised browser with 100% marketshare doesn't help users either.
It would not be no marketshare. It would be the market share you have today people that respect the princibles FF once stood for
Mozilla is not happy with us current users though, they would much rather trade us for Edge and Chrome Users..
Mozilla has made it clear it does not value the Users that desire Privacy, Customization, and Power in the hands of the user. Mozilla has Dreams of "beating chrome" a pursuit I have no interest in and place no value in.
But the marketshare "today" is tanking. It's at 10% and falling. There's no reason it won't get so small that it can't support development any longer.
Then it might be more appropriate to say "has attempted to trade us for Edge and Chrome Users." I think it's inevitable that if they continue on the path they've been on since they started losing marketshare they will disappear, and sadly with so many years of audience-alienating development behind them that no one will pick it up.
The only hope is that one of the forks of earlier versions manages to get enough developers and an institution behind it that they can bring it back to popularity, but before that happens we might be calling the internet "Chromenet" and google won't allow you to visit their sites unless they have been signed with a valid Chrome developer key.
Nah, we'll just switch. If you are trying to become Chrome, we'll just use Chrome. (Firefox+data collection < Chrome not logged in).
Edit: I've been with you guys since the beginning, but the line is drawn here.
If you become everything people dislike about the other browser, nobody is going to care what happens to you.
Why is the choice between opt-in vs opt-out of automatic behavior?
If Mozilla wants perf data, collect it and then prompt the user "crash reporting" style.
I would totally opt-in to prompts. Give it a threshold and ask, "This page seems to frequently perform less well on your computer, would you like to send us a report?"
Random sampling, basically. The value of random sampling is hard to overstate - it gives you a real picture of what's going on. A non-random sampling gives you a picture, but you have no way of confirming that the picture is a reflection of reality.
Random sampling and privacy run into conflicts not just in the browser space, but everywhere else. For example, recently the Canadian government went through a period where it allowed census respondents to optionally answer some questions that were previously mandatory (using privacy arguments). The result was several years of poor census information. The recent government reinstated the mandatory census questions.
The browser is just one arena where this everpresent conflict between knowledge and privacy plays out.
I've used Netscape then switched to Firefox when Netscape became way too bloated, then enjoyed years and years of Firefox getting better, supporting new JS and HTML5 features all WITHOUT telemetry and with the Crash Reporting window where I can see the data that is being submitted and submit it if I want to submit it.
What have changed so much in last 5 years or so that now you have to get all this data? What is wrong with just building a standarts compliant browser that runs JS fast and has easy to understand settings (where I don't have to go to about:config to disable the WebRTC/telemetry/Pocket etc) ?
> What have changed so much in last 5 years or so that now you have to get all this data?
To be honest, a lot. Once again, this is my personal take on the matter, not Mozilla's view.
First off, browsers were a LOT simpler back then. The sophistication and complexity in a browser has grown significantly in the last decade or so.
Secondly, browsers have matured. Remember that this software category has only been around for 20 years or so. Compared to the code quality in browsers today, browsers of 10 years ago were crude and simple. As a software category matures, the low-hanging fruit dry up, so it's harder and harder to improve your product.
Lastly, competition. Firefox has the luxury of being released when the biggest competitor (Microsoft) wasn't putting real effort into its browser product. Google will not make that same mistake with Chrome.
Basically, the information we needed back then was less, because the problems were much more obvious, because the whole industry was still pretty young. Now browsers are much more mature, the ecosystem is much more complex, has a much wider user base, and the problems are becoming harder and harder to pin down.
I think it's good to annotate this with your other comment as to why Firefox has to join this competition, rather than just do it's think and disregard its marketshare:
> A perfectly private browser with no marketshare doesn't help users. A completely compromised browser with 100% marketshare doesn't help users either.
That make sense for top sites and Flash.
But for things like perf and regression? Really?
You might miss out on issues if users don't submit, but each submission is an indication of problem (because it's Firefox that decides a problem is bad enough). And you can still prioritize based on how common that problem is.
Under the "collect and prompt" scheme, you are still sampling randomly.
A random sample of users experiences perf issues, a random sample of users opts-in to the collection, you get a random-sample of data. (If you suggest they opt-in to continued collection, you might even get a continuous stream of samples from the same user.)
Yes, that data won't cover the people who don't have issues, but do you need to optimize for them? It also won't cover people who have issues but still don't opt-in, but do you think that is somehow correlated to the severity of the issue? Otherwise the data will be mostly unbiased. The variance will be higher than if you made it opt-out, but if you are doing sound statistics, you will have to handle that anyway.
The thing is, I think, that the users that opt-in to the collection aren't "a random sample", but rather "a sample of users biased towards certain profiles".
And how did they find that out?
In part, AFAIA (but I've only limited statistical training) this is a well known feature of voluntary datasets. But IIRC they've also done user studies and compared those.
Here's what's going to happen.
You people are going ahead with this idiotic plan - because that is what Mozilla does, asks for feedback and then proceeds to ignore feedback - and you will lose another 2% market share.
The reason is painfully obvious: You betrayed one of the core principles of Firefox, which is privacy. You pissed off a lot of people which will NEVER come back because you stabbed them in the face and spat in the wound.
You also gave Microsoft and Google a freebie. Now they have something else to throw back at you: your supposed "more private" Firefox phones home with your users' browsing history (not strictly true, but people don't dig that deep into the minutiae).
Hows that stopping them from winning by default? You basically just disqualified yourself...
Make this thing optional, otherwise you are dead meat. If you can't "win" without betraying your principles, it's time to either throw down the towel and give up or just be upfront and admit that you are going to go all in, users be damned.
That last option would actually probably gain you a few users.
Edit: spelling...
I guess this is offtopic, but what do browser extensions have to do with openness, security, or privacy?
It is a factor in openness, the Browser Extension API as being developed by FF, MS, and W3c is very very limiting far more limiting than the old XUL based model
It can be a factor in security both positive and negative as XUL was very powerful and could be abused, but it also was used by some projects to enhance the security of FF or provide other security related functionality that is now no longer possible unless FF allows or builds it into the browser directly. Same for Privacy.
It's limiting at the moment because it's not finished yet. And Firefox in particular has far outpaced any third-party or industry-wide standards in adding new APIs. They've been proactive and responsive in getting feedback from addon developers while designing the APIs. I would say security addons are one of the top priorities. For example here's a blog post by the NoScript developer: https://blog.mozilla.org/addons/2017/08/01/noscripts-migrati... "I feel that Mozilla has the most flexible and dynamically growing browser extensions platform"
I will openly admit that I am skeptical of any initiative that has the Backing of the W3C, Microsoft or Google. All of which have proven they are more than willing to sacrifice user privacy and security
So since Web Extensions / Browser Extensions was started by all 3 of those entities with FF adopting them I am very very cautious of them
In Google Analytics issue last month, "legacy" uBlock could block the connection while webextension version couldn't. It can't because it is not allowed to (openness), as a result your privacy is compromised.
".. we can't build a better browser without good information on how it's behaving in the wild."
Who decides what is a "better" browser?
1. Is it the authors? Do they write the software for themselves and agree to share it for free with anyone who may want to use it?
2. Is it the users? Do the authors solicit feedback from users to determine what users want? If users demanded a browser with no default telemetry, would the authors comply?
3. Is it third parties who have an interest in the behavior of users? For example, domain name industry, ad-supported businesses, their employees or advertisers themselves. Are the authors on salary, compensated indirectly from advertising revenue? Or does it come from somewhere else?
4. Is it all of the above? If we follow the money where does it lead? Whose decision of what is "better" is the most important?
Mozilla is descended from a defunct 1990's company that aimed to license a web browser to corporations for a fee. It would have been very clear in that case who the browser was being written for. But today, it is not so clear who Mozilla is serving. It resembles some sort of "multi-stakeholder" project.
It would be nice to have a browser that fits description 1 or 2. I believe there are plenty of folks, including some developers, who would appreciate a browser with no default telemetry. By virtue of the total absence of data collection, they might consider it "better" than alternative browsers that "need telemetry" for whatever reason.
There are many very, very political people inside Mozilla. Some of them may even want to commit political violence. Political violence seems to be a problem that just grows and grows, so how can we be sure that it's not supported in Mozilla. These would be a very small minority of Mozilla of course, but the problem is that you don't know who it is. And it only takes a single extremist to betray your users. To get your users injured or even killed.
The same concern will of course apply to any other data harvesters, but that's for another thread
Ok, I get your point. You need the extra debugging information.
Now, here's my concern. I DO NOT want compromises. I DO NOT want to balance anything. I DO NOT want this telemetry crud on my browser spewing out my browsing history to anyone, no matter how anonymous you people claim it will be.
I just want a decent web browser.
What are my options? "Mozilla's way or the highway"? Redirect evil.telemetry.things.mozilla.org to /dev/null? Go back to elinks?
Or will there be a "disable this piece of crap utterly and completely" button somewhere not hidden under an URL? Or even better, a compile flag?
Edit: spelling...
There'll be an opt-out, just as there always has. That's not what's being discussed here. The question is whether to allow these stats to be collected as an opt-out vs opt-in.
Before you send any telemetry get informed consent by presenting the user a dialog enumerating everything that you are sending and I'd be fine with opt-out otherwise this is a dark pattern and you are getting on my shit list.
If you present the user a dialog, that is effectively opt-in, which is what already exists.
You wouldn't say anything else, so your statements don't change anything: Any company which wants to collect more data would justify it in the same way.
The main reason to collect data is monetization. People don't like to think they're being sold, so it's justified on other grounds. That's a universal. Since the way data is monetized is to track and segregate users, claims that it can be done in a privacy-respecting fashion are, therefore, specious.
There is one conclusion to be drawn here, and it isn't that Mozilla is going to respect my privacy.
Are you honestly suggesting that the only possible use for aggregate user statistics is for ads? Not for A/B testing or tracking performance regressions?
I'm saying that the behavior of someone who was collecting aggregate statistics for ads and the behavior of someone who was collecting aggregate statistics for another reason would be identical in this forum, so we must assume the worst.