neiman 5 years ago

In 2020, the only way for netzidents to do get what they naturally deserve is by hacking.

  • bzb6 5 years ago

    How do you “naturally deserve” to access the contents of the Twitter website?

    • morsch 5 years ago

      I agree, that's just cruel. No one deserves being subjected to Twitter.

    • devmor 5 years ago

      Since it became a dissemination service for public officials. The moment it became illegal for the US President to block people on Twitter, it should have become illegal for Twitter to restrict access to information to the public, for the same reason.

  • cooper12 5 years ago

    Setting your user agent would only be considered hacking by the same people who think the Internet is a series of pipes. The browsers themselves copy each other's user agents for interoperability, so it's far past the point that changing it to look like another agent would be considered devious.

    • nextaccountic 5 years ago

      Yeah, but on the POV of whoever runs the network, circumventing such blocks is "abuse"

  • 1vuio0pswjnm7 5 years ago

    The original web browser, NCSA Mosiac, encouraged users to change their User-Agent string, so-called "spoofing" or "masquerading".

    https://raw.githubusercontent.com/alandipert/ncsa-mosaic/mas...

    The User-Agent header is not mandatory and was never intended to be used by tech companies for denying access or fingerprinting. It was supposed to be used, at the user's discretion, to help with interoperability problems. RFC7231 specifically refers to user-agent masquerading by the user as a useful practice. It explicitly discourages using this header as a means of supposed user identification, e.g., fingerprinting.

    https://tools.ietf.org/html/rfc7231#section-5.5.3

antpls 5 years ago

Given that trick is spreading for several sites now, the trick won't last long. Google could for example generate secret unique user agents for the biggest players. Biggest players would then only allow requests from that secret unique UA.

  • txdv 5 years ago

    so much to google embracing the open web

  • smarx007 5 years ago

    I think Google shares IP range blocks so you could implement a check like "if(isGooglebot(user_agent) && isGooglebotIp(ip_addr))" in your system.

    Edit: ah no, they stopped https://developers.google.com/search/docs/advanced/crawling/.... I don't think 2 DNS lookups are acceptable to block a GET request but I think it can be done out of band, i.e. isGooglebotIp function can fire away a Redis query and if nothing was found, to put the ip into the DNS verify queue. A few requests later, the bot will now get banned thanks to a new record in Redis.

    • 1vuio0pswjnm7 5 years ago

      No need to use a Googlebot UA string. Others will work. Such as

      Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0

  • smileysteve 5 years ago

    Things that would be inadvisable while most of these companies are actively under suit for antitrust issues.

  • kobalsky 5 years ago

    it already happens, try to create your own search engine an index Amazon. captchas everywhere, their robots.txt is just for show.

  • Too 5 years ago

    This trick is almost older than the internet, so if someone cared they would block it already. Sending google.com as Referrer is another variant of it. Before stackoverflow days this was very useful for getting past the paywall on expertsexchange for example.

    I was under the impression that serving other content to google would greatly punish your pagerank and even pull you off the search results completely.

1vuio0pswjnm7 5 years ago

When Twitter announced they were going to stop supporting browsers not on their list of supported browsers, I figured their attempts to block would be something more than just checking the value of the user-agent header.

They should just announce that users must use a particular user-agent header value and provide a list of approved values. If no one else compiles a list of acceptable user-agent header values for Twitter, I might have to do it.

Every user should just use the same user-agent header value. That would negate any utility of the user-agent header.

sethaurus 5 years ago

It’s been received wisdom until now that Google penalizes websites which behave differently when scraped by the Googlebot. Is that no longer the case?

  • SXX 5 years ago

    Pinterest has proven by spamming SERP for years that if you're big enough Google will close eyes on it.

    • syshum 5 years ago

      That applies to more than just google

      If you are big enough, there are separate rules for you (or no rules)

  • smarx007 5 years ago

    They will serve same content to users with JS enabled and legit Googlebots while blocking clients w/o JS and bots. I don't think it violates Google rules but ofc is of questionable decency.

67868018 5 years ago

You just need the word "Bot" in your user agent. Requires for fetching Twittercards for link previews too. Changed earlier this year.