It was my understanding that people were finding numerous examples of leaked data that included Authorization headers, which don't always leak passwords but which still leak session keys. Maybe if espadrine from this linked comment reads this, he could explain more?
https://news.ycombinator.com/item?id=13719455
FWIW, as an example, Grindr uses CloudFlare: do a Google search for "authorization: grindr3" and you will find a URL which (no longer cached but you can still get snippets) contained an authenticated grindr request, which would be enough to have had temporary access to that person's account.
https://costumla.com/wild-west-costumes-for-men.html
"... U Edge Certificate Authority1 0 U San Francisco1 0 U California�1p ���U� N��U � � �f"�0! %���U@ @GET /v3/profiles/[REDACTED] HTTP/1.1 CF-RAY: 33282514b8d957d7 FL-Server: 15f76 Host: grindr.mobi X-Real-IP: [REDACTED] Accept-Encoding: gzip Client-Accept-Encoding: gzip X-Forwarded-Proto: https Connect-Via-Https : on Connect-Via-Port: 443 Connect-Via-IP: 104.16.85.62 Connect-Via-Host: grindr.mobi CF-Visitor: {"scheme":"https"} CF-Host-Origin-IP: [REDACTED] Zone-ID: 22252132 Owner-ID: 2607399 CF-Int-Brand-ID: 100 Zone-Name: grindr.mobi Connection: Keep-Alive X-SSL-Protocol: TLSv1.2 X-SSL-Cipher: ECDHE-RSA-AES128-GCM-SHA256 X-SSL-Server-Name: grindr.mobi X-SSL-Session-Reused: . SSL-Server-IP: 104.16.85.62 X-SSL-Connection-ID: 15d1b8de6025864d-DFW X-SPDY-Protocol: 3.1 authorization: Grindr3 ... accept: application/json user-agent: grindr3/3.0.13.16790;16790;Free;Android 6.0 CF-Use-OB: 0 Set-Expires-TTL: 14400 CF-Cache-Max-File-Size: 512m Set-SSL-Name: grindr.mobi CF-Cache-Level: byc CF-Unbuffered-Upload: 0 Set-SSL-Client-Cert: 0 Set-Limit-Conn-Cache-Host: 50000 CF-WAN-RG5: 0 CF-Brand-Name: cloudflare CF-Age-Header-Enabled: 0 CF-Respect-Strong-Etag: 0 Set-Proxy-Read-Timeout: 100 Set-Proxy-Send-Timeout: 30 CF-Connecting-IP: [REDACTED] Set-Proxy-Connect-Timeout: 90 Set-Cache-Bypass: 0 Set-SSL-Verify: 0 CF-Force-Miss-TS: 0 Set-Buffering: 0 CF-Pref-OB: 1 Set-Keepalive: 1 CF-Pref-Geoloc: 1 CF-Use-BYC: 0 CF-IPCountry: [REDACTED] CF-IPType ..."
edit: I have spent the last fifteen minutes pulling the search snippet (edit: and now an hour; but all on my phone, so this is harder than it should be, and I also am distracted by other stuff). The way you do this is by walking through the parts you can see to get nearby context. (I do this a lot to pull content purged from or inaccessible to or simply updated in Google's cache.)
In so doing, while I haven't been able to still have access to the session key (likely too long and unique for a snippet), I have pulled the X-Real-IP address field of this Grindr user and a profile identifier they were checking out (both of which I redacted above, but you could trivially get yourself now using that context).
CloudFlare: if you think there isn't private data that was leaked, OR EVEN PRIVATE DATA STILL ACCESSIBLE, you are a bunch of fucking idiots. 1) Clearing the cache isn't sufficient, as for anything built from short plain text words we can pull the snippet. 2) IP addresses and session keys count as "private data". 3) GET requests actually are often sensitive information :/.
I am the second responder in that thread.
I personally saw an oAuth bearer token for a user of Fitbit. It was there clear as day.
Maybe I saw the only one that leaked, but that seems...unlikely.
OAuth2 does indeed send a secret bearer as an "Authorization: Bearer" header, but OAuth1 does not.
The FitBit stuff I recall seeing other day was OAuth1. It would look like oauth_token=XXX and oauth_consumer_key=YYY, along with oauth_signature=ZZZ. If that's what you're seeing, it's not a bearer token and that data isn't secret. Those values are essentially record locators.
OAuth1 has an undisclosed "consumer secret" (embedded in the app) and "token secret" (acquired at login) that are used to sign the request. So these requests don't leak any usable tokens. If you also see oauth_nonce and oauth_timestamp values, then the signature is also protected from a replay attack.
The response body of the request that originally acquired the token/secret would leak these secrets.
I'm sure there are bearer tokens out there for other services, but if fitbit is using OAuth1, you'd have to snag that original token acquisition to call the api as that user.
Edit - I hope this doesn't come off as argumentative. I just want to point out the good design choices made by the guys that weren't actually compromised by this. OAuth2 is really easy to implement, but it kinda feels like a step backwards security-wise.
> The FitBit stuff I recall seeing other day was OAuth1.
Fitbit uses oAuth 2
https://dev.fitbit.com/docs/oauth2/
"OAuth2 is really easy to implement, but it kinda feels like a step backwards security-wise."
No, no it isn't. There are about three proper ways to implement it. Almost nobody whose site I've come across that uses OAuth2 has it properly implemented.
Token revocation and authentication is hard, folks.
I don't see anywhere in the post where they claim private data wasn't leaked. In fact, they explicitly state that authorization cookies were leaked. What they said they didn't find was passwords, CC info, health records, SSNs, and customer encryption keys.
They even have a paragraph talking specifically about cookies in GET requests:
> This is not to downplay the seriousness of the bug. For instance, depending on how a Cloudflare customer’s systems are implemented, cookie data, which would be present in GET requests, could be used to impersonate another user’s session. We’ve seen approximately 150 Cloudflare customers’ data in the more than 80,000 cached pages we’ve purged from search engine caches. When data for a customer is present, we’ve reached out to the customer proactively to share the data that we’ve discovered and help them work to mitigate any impact. Generally, if customer data was exposed, invalidating session cookies and rolling any internal authorization tokens is the best advice to mitigate the largest potential risk based on our investigation so far.
It's still weird to enumerate things not found, but no mention of examples like dating site messages that were found. They're obscuring that by focusing on internal headers, etc. Why does the table with "0 health records" not have an entry for "X private correspondence"?
My biggest issue is that they didn't release their data-set. With something this major, it's standard to either have a third party investigate or publicly release your data so it can be validated.
For all we know they cherry picked the responses they tested from a single site that doesn't handle anything sensitive.
> For all we know they cherry picked the responses they tested from a single site that doesn't handle anything sensitive.
I don't think you understand how Cloudbleed works. It doesn't matter what site they picked; every single vulnerable site can leak the exact same info. It's literally impossible to cherry-pick that data.
I don't think you understand how the internet works. Some websites only serve static content and don't deal with any sensitive information. Without seeing Cloudflare's data set there is no way to verify that the responses they picked are a representative sample.
Ok you really don't know how Cloudbleed works. Go read up on it. Every single vulnerable site can and did leak the same information. The only way to "cherry-pick" it would be to literally throw away the responses that you saw and didn't like, or in other words, by lying.
I think he was trying to explain to you that, for this particular leak, what was leaked was the private memory of the CloudFlare servers. So that memory doesn't have all a single site in it. It doesn't matter what site triggered the data to be output, the data that was output can still come from any CloudFlare customer even if they had no pages with the condition that triggered the issue.
There's 3 classes of information:
1. Information where improper disclosure is illegal. For example, health records. Or credit card details, which would presumably be a violation of PCI DSS.
2. Information that can be actively exploited, but can also be fixed so the previous disclosure is harmless. This means passwords, authentication tokens, etc.
3. Information that is merely private in nature.
Cloudflare is focusing on the first two items. The third one is hard to quantify; what one person may consider private, another person might not care about. And there's not much that can be done about this kind of disclosure (beyond scrubbing caches which they're doing anyway). Also, it's difficult to automatically identify this type of content (whereas cookies, passwords, credit card numbers, etc are pretty easy to detect) and Cloudflare probably doesn't want to have their employees spending their time reading through all of the cached data they can find looking to see if there's private info (both because it's a lot of work and a huge waste of time since it won't affect anything, and because it's private info; chances are nobody's going to see it normally, and having employees reading your private messages doesn't help anyone and is a violation of your privacy).
Plus a 4th class: public information.
I don't know how they could determine how many health records were leaked without looking at and classifying the data, so they've already gone ahead and done that. They presumably know how many steamy Grindr messages were in there.
Additionally, there are laws about messages as well. Email laws generally don't specify smtp only.
Health records probably have some sort of health record ID, and there's only so many formats that can take. Detecting strings that match certain ID patterns is easy.
That's one hell of a busted test. So a page fragment could leak my name and "hurts when I pee" but if the universal standard ID was cutoff it's not a health record?
2. Information that can be actively exploited, but can also be fixed so the previous disclosure is harmless. This means passwords, authentication tokens, etc.
I wouldn't call the disclosure harmless. It's unknown if anyone made use of the leaked information before Cloudflare knew, so accounts should be treated as compromised unless it's shown otherwise.
Also, leaking user credentials to any system that handles payments and health info would also breach PCI/HIPAA . This broadens the scope of systems effectively breaking the law.
Another thing to keep in mind is that many(most?) token based authentication systems don't invalidate tokens. So any tokens captured will be valid until they expire, and they can't be "changed" without invalidating every outstanding token (changing the server key)
No I mean after it's fixed, the previously-disclosed information becomes harmless. Obviously anyone who exploited it before you reset your password/tokens may have caused you harm.
> Another thing to keep in mind is that many(most?) token based authentication systems don't invalidate tokens.
In my experience, changing your password generally invalidates all outstanding tokens. And yes, this does mean invalidating all of them instead of just the leaked one, but that's not usually a big deal.
> 3. Information that is merely private in nature. > Cloudflare is focusing on the first two items. The third one is hard to quantify; ...there's not much that can be done about this kind of disclosure (beyond scrubbing caches which they're doing anyway). Also, it's difficult to automatically identify this type of content...
I have some pretty fundamental moral issues with this position. Cloudflare is measuring the risk that you were affected by something they can fix, not measuring the risk that you were affected by something you consider important. They are absolutely downplaying the importance of all of this private information: saying "this isn't to downplay" is like adding "we have no affiliation with" at the bottom of a massive trademark violation: it means nothing. That something is difficult to measure and might be impossible to fix it does not somehow make it unimportant.
Let's translate this: we find that some company has been dumping a bunch of random chemicals in our water supply. They respond with an attempt to make us not be concerned, but they concentrate their analysis on a handful of toxins that can be "corrected" with an antidote or a chelation or some other form of direct mitigation in the water itself, while giving lip service to something which merely increases your cancer risk and carefully not mentioning the stuff that will just make you violently ill for a couple days, as it is difficult to know what will cause that, it is a dosage mediated effect, and they can't fix it.
Meanwhile, people keep reporting that they are measuring the water coming out of their tap and keep finding junk in it, and some of the stuff they are finding would make people sick if they drank it. Are you seriously telling me that you think this kind of statement is legitimate?
I am going to maintain: these Cloudflare headers are themselves private information. Even if you ignore Authorization headers, just knowing what URLs people are browsing is something that we would normally consider to be a serious problem... I mean, all BEAST leaked was the size of the files you were downloading, and people take that seriously: the fact that there are even worse potential problems shouldn't distract us from the base issue.
> 3. Information that is merely private in nature.
Whether information is merely private is a judgment call that we may be ill equipped to make since we lack important context.
Being outed as gay may range from merely inconvenient for my coworker where friends and family already know and folks from the company have much evidence to assume so to life-threatening for a person that's living in a state where LGBT* people are actively prosecuted by the state and live under the threat of death penalty for merely living their life. Anything in between is possible.
Cloudflare mentioned they only went through about 2000 leaked responses out of over a million. That combined with random people on hackernews STILL finding private data after the purge points towards this leak being a lot worse than they're letting on.
Thankfully CloudFlare spent a week cleaning up the leak in search engines and caches before publicly announcing the issue, so a lot of the evidence is gone.