Incident Report: Railway Blocked by Google Cloud [resolved]

status.railway.com

546 points by aarondf 22 hours ago

Subsequent thread: Incident Report: May 19, 2026 – GCP Account Suspension - https://news.ycombinator.com/item?id=48204770

r721 14 hours ago

>We have resolved this incident and a post mortem is available here.

>https://blog.railway.com/p/incident-report-may-19-2026-gcp-a...

>May 20, 07:57 UTC

https://status.railway.com/incident/I23M92U0

sschueller 12 hours ago

It should be possible to sue Google for damages in such cases. This isnt a network outage or service failure which I would consider part of ToS.
- VladVladikoff 9 hours ago
  
  What if the reason for their stuff being shut down was a payment issue like an expired credit card or maxed credit account? Unless I missed it skim reading their post I don’t see any information anywhere about their communications with Google.
  
  bastawhiz 7 hours ago
  
  If you have an account manager and a contract, there's zero excuse for automated suspension. That's literally the whole point of having a dedicated person. From the report:
  > May 19, 22:22 UTC - P0 ticket filed with Google Cloud. Railway's GCP account manager engaged directly.
- Cthulhu_ 8 hours ago
  
  It's always possible to sue, but Google has good terms of service and lawyers - I'm 99% confident that a lawsuit would end up nowhere.
  
  bastawhiz 7 hours ago
  
  They have every right to sue, and if they did sue they almost certainly would win. This is clear breach of contract. The only argument Google could make is "they did something to violate our agreement" but they'd have to prove that, and then have a damn good explanation for why they were in the right to suspend the account without any outreach. Unless Railway did something egregious, Google clearly made an error.
  But that's not what will happen. Google will offer an apology (perhaps even a public one), a giant pile of account credit, and a pinky promise not to do it again. Railway will accept it and hmmm and haw internally about whether to decrease their reliance on GCP, and then when they calculate the cost of going in on other clouds more heavily (or their own metal), they'll just think harder about weird failure modes.
  
  danudey 4 hours ago
  
  One company I worked at is highly reliant on Google Cloud, but at one point we moved some services to Azure.
  Azure noticed, and immediately hit us with a discount offer in the hopes of getting more of our business.
  Google noticed, and immediately hit us with a discount offer in the hopes of keeping more of our business.
  This is just a reminder that your multi-cloud strategy doesn't have to be 'deploy everything across multiple clouds'; it can even just be 'make it obvious that you have leverage'.
  
  array_key_first 4 hours ago
  
  I'm sure their contract explicitly states that their account can be suspended or terminated immediately without prior notice upon violating some TOS. And, most TOS are incredibly wide and vague, it wouldn't necessarily be hard to find something they violated.
  This is sort of the problem with these new-age internet companies. The contracts are incredibly hostile. Most TOS you see amount to "you have no rights and we can fuck you up the ass"
  Google is a B2C company so I'm sure some of that culture transfers over to B2B relations, but I'm speculating. Maybe the contracts are more normal for B2B.
- redwood 8 hours ago
  
  I can assure you that Google will be giving them significant commercial incentives as an apology for this behind the scene
quentindanjou 12 hours ago

Railway say the incident is resolved but many are still down (returning 502): on our side, we had to manually trigger a redeploy to fix it but I believe it should have been triggered automatically by Railway and I can't understand how they can mark this as resolved while many are still down.
In total, down for >11 hours on our side.
gcr 10 hours ago

> Railway owns our vendor choices, and we ultimately own this one. Your customers don't care whether the failure was Google or Railway; they see your product. Your uptime is our responsibility, and we'll keep delivering on it.
This is an excellent closing statement.
dang 5 hours ago

Thanks! The post-mortem is currently on the frontpage here:
Incident Report: May 19, 2026 – GCP Account Suspension - https://news.ycombinator.com/item?id=48204770

dangoodmanUT 21 hours ago

It has been 0 days since GCP has taken down a startup (again).

You see this at least once a year. Never heard of this from AWS or Azure.

In all seriousness, this is why we don't use them. They have the most ergonomic cloud of the big three, then absolutely murder it by having this kind of reputation.

tjpnz 21 hours ago

AWS normally contacts you first.
- cherioo 20 hours ago
  
  They better do. What is google doing?
  
  Gigachad 20 hours ago
  
  It's all AI powered
- kevin_nisbet 20 hours ago
  
  Do they?
  The only anecdotal thing I've seen is we hired a vendor to do a pentest a few years ago, and they setup some stuff in an AWS account and that account got totally yeeted out of existence by AWS if memory serves.
  
  alchemism 20 hours ago
  
  I’m fairly certain you are supposed to contact any vendor before attempting to penetrate hosts with authorization, not the other way around.
  
  coredog64 20 hours ago
  
  Having done this for both Azure and AWS, there's a specific ticket that needs to be filed with each provider that documents the scope of your pen test, where you're coming from, and a time frame over which you're doing it (which ISTR was "not more than 24 hours")
  
  mixdup 20 hours ago
  
  Responding to an unknown security tester like that is a selling point, not a cautionary tale
  
  kevin_nisbet 19 hours ago
  
  Yup, I thought it was great. Although one concern I always had in the back of my mind was where is the line drawn. Such as if an adversary gains access to one of my orgs accounts and does something similar, do we get 100% taken out.
  
  dannyw 20 hours ago
  
  You should not be conducting unauthorized penetration tests against third party infrastructure providers without permission. They have processes and systems and usually just wants a heads up of what you plan to test and t the duration / timestamps.
  Cuz otherwise you look like a threat actor.
  That’s assuming your vendor was pentesting AWS systems. If you meant you hired a vendor to pentest your own systems on AWS, that’s of course a totally different matter.
  
  kevin_nisbet 20 hours ago
  
  >That’s assuming your vendor was pentesting AWS systems. If you meant you hired a vendor to pentest your own systems on AWS, that’s of course a totally different matter.
  Sorry for being unclear, the vendor was attacking our organization only, and any other company was expressly forbidden in the contract. As I recall it was a fake SSO sign-in page to collect credentials that they would try and social engineer our employees with.
  
  Shank 18 hours ago
  
  At a minimum you should contact AWS before you launch a phishing page as a test that targets AWS customers.
  
  Lukas_Skywalker 17 hours ago
  
  I understood it as a phishing page imitating their own system, targeting their own employees. Nothing related to AWS, except for being hosted there.
  
  raverbashing 12 hours ago
  
  If a vendor doesn't know the basics about pentesting open infra and can't be bothered to look up terms of use sounds like they know ssh-it about fsck
abrookewood 20 hours ago

Yep, agree 100%. Such a stupid move on their behalf.
jameson 20 hours ago

What was the reason GCP took down a startup previously?
- __s 20 hours ago
  
  hn.algolia.com gcp blocked
  https://news.ycombinator.com/item?id=46731498 https://news.ycombinator.com/item?id=33360416
  Then I recall https://news.ycombinator.com/item?id=45798827
  https://news.ycombinator.com/item?id=33737577
  
  jameson 16 hours ago
  
  Wow... Just wow...
somewhatgoated 20 hours ago

On the other hand i can’t remember when there was a serious outage on GCP, unlike AWS/Azure who seem to go down catastrophically a couple of times per year.
- corpoposter 20 hours ago
  
  IIRC the Paris datacenter flood took down a whole “region” and some data was permanently unrecoverable.
- JoRyGu 20 hours ago
  
  AWS goes down catastrophically but are back up in minutes/hours most of the time (as long as they aren't down because Iran blew up their data center). That's obviously REALLY bad for certain industries, but I suspect for the vast majority of their customers it's not a big deal. We've been able to isolate the damage almost every time just by having AZ failover in place and avoiding us-east-1 where we can.
  
  ajross 17 hours ago
  
  > AWS goes down catastrophically but are back up in minutes/hours most of the time
  The outage in the linked article appears to have been resolved in 4-5 hours.
  
  graemep 14 hours ago
  
  Failover is supposed to protect you every time, unless something really exceptional happens.
  While its possible to to isolate the effects, judging by how many things stop working when there is an AWS failure a lot of people fail to do that. I think the shit of responsibility to AWS removes the incentive to put effort into resilience against AWS failure.
- pixl97 19 hours ago
  
  GCP never goes down because they banned all their customers.
  
  somewhatgoated 8 hours ago
  
  A funny meme but just untrue
- plandis 19 hours ago
  
  GCP has had outages. From a quick search it looks like they had a global outage less than a year ago:
  https://status.cloud.google.com/incidents/ow5i3PPK96RduMcb1S...
- devmor 19 hours ago
  
  There was a pretty bad one last summer - their IAM system got a bad update and it broke almost all GCP services for an hour or so, since every authenticated API call reaches out to IAM.
  It had lasting effects for us for a little over 3 hours.
- abofh 19 hours ago
  
  I've been in AWS for almost twenty years at this point. It's been a long time since I've seen a global outage of the data plane on anything. The control plane, especially the US-east-1 services? Yes - but if you're off of east-1, your outages are measured in missile strikes, not botched deployments.
  
  andreareina 19 hours ago
  
  Didn't the latest outage affect people not on us-east-1 because internal aws services depend on us-east-1?
  
  erikerikson 18 hours ago
  
  The impacts are usually partial. For example, scaling is impacted but everything already deployed contributes to work up to capacity. Or, you can't change configuration but the previous configuration works as configured. Often surprisingly not so impactful even if there can be limited work stoppage.
  
  hasyimibhar 18 hours ago
  
  The problem with the us-east-1 outage is that a lot of big companies are there, so even if you try your best not to depend on us-east-1, your third party providers are most likely there. In my previous company, we were completely down during us-east-1 outage because of other dependencies that are beyond our control.
  
  erikerikson 18 hours ago
  
  Entirely fair. I have thus far avoided that problem. Not always engineering's choice.
  
  HighGoldstein 14 hours ago
  
  Considering how many AWS and non-AWS services go down at least partially when us-east-1 fails, this reads somewhat like "Don't worry that the steering wheel and pedals aren't working, your engine is still running on cruise control".
  
  happymellon 18 hours ago
  
  Work for a major bank who isn't solely in US East 1.
  No it didn't impact us.
  
  shrikant 10 hours ago
  
  I can easily remember a few multi-hour AWS incidents from the last few years, since I've had to handle the resulting fires at my various employers at those times. Not sure how you missed these, or do they not count as "global outages" for some reason?
  December 2021: https://www.cloudcomputing-news.net/news/aws-outage-takes-do...
  June 2023: https://newsletter.pragmaticengineer.com/p/the-scoop-52
  October 2025: https://www.cnbc.com/2025/10/20/amazon-web-services-outage-t...
  Each of these were massive outages impacting very large services across the web.
- blobbers 19 hours ago
  
  Unfortunately, if everyone goes down people are understanding. If just _you_ go down, then its oddly less forgiveable.
- danesparza 19 hours ago
  
  You can read the parent post, right?
- Izikiel43 19 hours ago
  
  I still remember the one where they nuked all the storage of I think an Australian insurance company I think, luckily the it department had done a multi cloud setup for backups
  
  Barbing 17 hours ago
  
  Google Cloud accidentally deletes $125 billion Australian pension fund - May 2024
  https://www.business-standard.com/world-news/google-cloud-ac...
- adamtaylor_13 19 hours ago
  
  Perhaps you don't notice GCP outages because so few companies rely on them?
  
  fragmede 18 hours ago
  
  GCP has a lot of customers. But you wouldn't know the companies that do, unless you worked there and wanted to leak it, or it publicly comes out. Eg it's been publicly acknowledged that Apple uses GCP for iCloud, https://www.cnbc.com/amp/2018/02/26/apple-confirms-it-uses-g... , and Home Depot is another that's used as a case study, https://cloud.google.com/customers/the-home-depot but most customers don't want to make a big deal about being on GCP as it's none of our business who's hosting them.
  
  shye 18 hours ago
  
  Apple also uses AWS, and I won't be surprised if they also use Azure. Big companies are multicloud, and not because it's a good idea (it rarely is), but because they inherited multiple environments on different CSPs, and maintaining those where they are is often cheaper than migrating them to a different CSP.
  
  Alive-in-2025 18 hours ago
  
  I wonder if big companies can get a special contract with something like you can't delete my service automatically (unless it's an emergency)
  
  jedberg 16 hours ago
  
  If you're big enough you don't need a contract for that, that's just their default method of operation.
  
  GoblinSlayer 15 hours ago
  
  #48203226
  
  Imustaskforhelp 18 hours ago
  
  upvoted & favourited because you taught me a really interesting fact which I feel makes up for an amazing discussion (regarding icloud using GCP).
  also, I can't help but imagine if instead of render, it was Apple's account which could've been auto-banned (Render is almost a billion dollar company or series-B, I am not sure)
  I haven't read the articles and I admit that but can you please elaborate to me on why Apple uses GCP themselves for idrive, I would love to know the technical decisions behind it on a genuinely curious level.
  From my (let's face it) limited understanding of GCP, it isn't particularly good or price performant and one of the wonders is that Google sells it directly with Google photos too and an competitive lineup at android.
  So in some sense if Apple is using gcp's for icloud then aren't they just reselling google storage themselves and google can always beat them in pricing while also wanting to chew away at the percentage of iphones themselves too?
  I mean, I can still try to understand the google search pays apple 10 billion dollars (right?) deal but I don't quite understand why apple would pick GCP when the hosting market is one of the more competitive ones with lots of companies.
  I would love to get some explainations or theories as to why exactly is that the case
  (Also given its HN, if anyone from apple is reading or knows the answer, I would love that too!)
  
  morpheuskafka 17 hours ago
  
  > So in some sense if Apple is using gcp's for icloud then aren't they just reselling google storage themselves and google can always beat them in pricing while also wanting to chew away at the percentage of iphones themselves too?
  Apple uses Samsung displays and Sony camera sensors, iirc, both of which are flagship Android phone makers. That doesn't really seem to be a concern in their procurement thinking. iCloud and Google Photos are not that direct competitors because which one is native depends on which phone you already bought. Google Photos definitely does have some market share on iOS due to having 3x the free storage and a handy compression mode (which used to be entirely unmetered at launch but now still uses storage, just less of it). But it will never be a full competitor because it is a separate app you have to install and it can't magically fetch cloud-only photos from the camera roll and photo picker UI like iCloud can.
  The pricing of Google One and Apple One/iCloud+ isn't really dictated by underlying storage costs. At the higher tiers like 2TB, many don't come close to using all, while the laughable 5GB iCloud free tier clearly costs almost nothing in raw store, even on nVME SSD, if you compare it to S3/Backblaze or even raw disk pricing on the cloud.
  
  necovek 16 hours ago
  
  Let's also not ignore enterprise realities: in your example, Samsung Displays is likely giving a great price to Apple for displays based on long-term commitment of large quantities: it allows them to optimize production and possibly give a better price than maybe Samsung Mobile for smaller-runs of phones.
  Each division also cross-charges, so Samsung Mobile would be paying Samsung Displays for the screens, and possibly at a small, guaranteed and non-negotiable margin.
  Without a global strategy not to do so, divisions within an enterprise optimize for their own bottom line and have internal discussions on build-vs-buy even if they have an internal factory.
  
  barkingcat 17 hours ago
  
  Firstly, apple doesn’t compete on price. Even if icloud is priced more than google people would always buy apple just for the ecosystem integration. It’s not even a competition to be honest.
  Look up “buy or build” which is the industry term for this kind of evaluation: buy product and use it/resell it or build your own.
  Apple has gone for different strategies in various areas:
  Build own Apple silicon chips, do not buy off the shelf chips from intel or nvidia or amd.
  Buy and resell google storage but don’t want to build their own distributed data store for end users.
  It’s about what matters more for the company and the core products. Apple’s laptops, cell phones are considered core products. Icloud is a value add.
  This is also why apple is making their own cell phone broadband chips. For most companies, this is a “buy from qualcolm” but apple needs to build their own for independence for their number 1 core product: the iphone.
  
  anurag 7 hours ago
  
  > also, I can't help but imagine if instead of render, it was Apple's account which could've been auto-banned (Render is almost a billion dollar company or series-B, I am not sure)
  I believe you mean Railway.
  Render (a $1.5B company) has been hosting customers on GCP since 2018, and has never been banned.
  
  Imustaskforhelp 4 hours ago
  
  Yes sorry I meant railway rather than render so sorry about that, it was more of an honest mistake!
  > Render (a $1.5B company) has been hosting customers on GCP since 2018, and has never been banned.
  speaking of which, I have a question for render but how does render prevent something like what has happened with railway (ie. the account getting banned), I would love to know more about what the team at render thinks of such and also I would love to get some thoughts on why Render is using GCP, I would love to know some architectural decisions behind it as I am curious about it!
  Once again, thanks for responding to me and waiting for your response and have a nice day anurag!
  
  VirusNewbie 17 hours ago
  
  Spotify, Ebay, Paypal, Apple, Walmart, Uber are huge users. Lots of other big named companies are big users that I don't think are public.
  Then there's Anthropic...huge user.
  
  buildbot 16 hours ago
  
  Snap, for like 15 years at this point: https://cloud.google.com/blog/products/gcp/snap-maintains-up...
  
  locknitpicker 16 hours ago
  
  > Perhaps you don't notice GCP outages because so few companies rely on them?
  GCP is the world's third largest cloud provider, and has around half of AWS' market share. Claiming no one uses it reads like Yogi Berra's "no one goes there anymore, it's too crowded".
  
  SahAssar 11 hours ago
  
  Isn't that including things like google workspace and similar? Both Azure and GCP have sometimes included things that most people think of as unrelated SaaS (office 365, gsuite/workspace) to make themselves look bigger in the cloud sector.
  
  locknitpicker 9 hours ago
  
  > Isn't that including things like google workspace and similar?
  AWS also includes Amazon WorkSpaces. Moreover, AWS includes all of Amazon's cloud infrastructure for things like Amazon music, Ring, Amazon Prime Video, etc.
  
  SahAssar 4 hours ago
  
  But as a percentage of revenue I'd assume those are a lot smaller than Office365 is for microsoft and Workspace is for google.
  Last I checked I don't think AWS included things like Amazon Prime Video either, AWS is primarily their buissness/platform offerings, not consumer things like Twitch/Prime/Music/etc.
  
  koito17 16 hours ago
  
  There is a mobile game I know of that had an outage as a result of a GCP service outage. That is the only time I've noticed GCP outages.
  With that said, I would not say few companies rely on GCP. Search for "GCP" in this month's HN hiring thread. There are 23 hits, more than Azure's 21. AWS has 90 hits, which I guess shows its sheer dominance in the startup space. But these figures more or less agree with my intuition of the major clouds being AWS/GCP/Azure.
  
  somewhatgoated 8 hours ago
  
  We rely on them so I would have definitely noticed. Even a couple of minutes and our customers would freak out…
- manyatoms 19 hours ago
  
  How is blackhole-ing a customer not considered an outage?
- nemothekid 19 hours ago
  
  >On the other hand i can’t remember when there was a serious outage on GCP
  They had a really bad global outage a year ago. At least with AWS outages are contained to a single region.
- onion2k 16 hours ago
  
  You can't have 100% uptime. It's unfeasible, especially for a startup. You should be telling your customers that downtime might happen, sometimes for reasons beyond your control, and that if it does then you'll do your best to recover and to compensate them for the inconvenience. You should cultivate a relationship with your early customers that makes them feel bad for you when there's an outage rather than angry about how it impacts them. Maybe even go as far as firing the customers who give you a hard time over it. That way if your cloud provider falls over it's really annoying but not a big deal.
  Your cloud provider blocking your business from running is far worse.
- mlhpdx 15 hours ago
  
  None of the AWS “outages” have impacted us. They have either been regional, in which case we stand down the region (we run multiple hot regions), or didn’t involve things we need to maintain operation.
  I can’t imagine AWS ever doing such a cascading delete. I mean, they have made deletion protection a difficult thing to ignore even for individual resources.
rozap 20 hours ago

Yep, we also don't touch them for this same reason.
overfeed 20 hours ago

> Never heard of this from AWS or Azure.
AWS does it more efficiently; it takes down many startups at a time when us-east-1 goes down.
- stingraycharles 19 hours ago
  
  That’s an entirely different type of problem, and avoidable by just using us-east-2 (I still don’t understand why people default to us-east-1 unless they require some highly specific services).
  
  MattGaiser 19 hours ago
  
  Sympathy. Railway is going to have numerous people blaming them for this outage. When us-east-1 fails, it is headline news, so you are not to blame.
  
  aloha2436 19 hours ago
  
  Is it that easily avoidable? A lot of AWS's control plane seems to have dependencies on us-east-1, or at least that's what it's looked like as a non-us-east-1 user during recent outages.
  
  happymellon 17 hours ago
  
  I don't know how much it's improved, but a bunch of URLs they use unnecessarily have region specific details in them.
  I remember a Workspaces outage about 5 or 6 years ago, and the problem for us was that the redirect link in the console had US East 1 in it.
  The workspaces themselves weren't in US East 1 and nothing relied on US East 1.
  Emailing users who needed it an alternative link with a different region in the URL for the login redirect fixed it for us.
- mgfist 19 hours ago
  
  And we all celebrate it since we can't do any work
- yandie 18 hours ago
  
  During my 5 years of my startup, we had only 1 outage due to AWS because we picked us-west-2 as the primary reason. If anyone starting a company and picks us-east-1 as the primary reason, they should be fired. There's absolutely no reason to be in that region.
  
  tempest_ 18 hours ago
  
  Why do people want to be in that region? Is it the default or something?
  I know some workloads help to be colocated but all these places are connected by fiber and every cloud has a worldwide CDN it seems.
  
  necovek 16 hours ago
  
  At some point it used to be significantly cheaper than any other AWS region in the world. Not sure if that's still true.
  
  locknitpicker 16 hours ago
  
  > Why do people want to be in that region? Is it the default or something?
  It's one of the oldest and largest regions. It hosts the most services, both low-level platform stuff and higher level managed services (which run on the low-level platform stuff), so services tend to be more performant.
  Geographic location is also good.
  Also, due to scale their pricing ends up being cheaper.
  Let's say that it's the region people use by default, unless they have a compelling reason to have a presence in any other particular region.
- xavdid 18 hours ago
  
  If my cloud provider brings my startup down, it's my problem. If they bring all the startups down, that's their problem.
busterarm 19 hours ago

Hetzner and OVH also do this all the time.
It's AWS and Azure that are the outliers and tend not to care too much what their customers do with their infrastructure. AWS is perfectly fine with allowing me to run copies of 15 year old vulnerable AMIs copied from AMIs they've long since deprecated and removed. Even for removed features like NAT AMIs.
Spooky23 18 hours ago

https://en.wikipedia.org/wiki/Timeline_of_Amazon_Web_Service...
Azure nerfed the front door of all Azure and O365 services last year.
All of these companies are great at what they did, and occasionally fuck up.
OsrsNeedsf2P 16 hours ago

AWS has throttled our service so badly that we couldn't operate. I was thinking of writing a blog post about how they stalled our growth for a month but it seems moot

tardwrangler 17 hours ago

Everyone is eager to point a finger at Google, but I've been a user of Railway for a while now, and I've seen enough nonsense to want to hear what GCP has to say about this before drawing any conclusions. Let's just say Railway has had problems like this before, and the way their team handles them does not inspire any confidence.

Regardless of how it happened, for me, this is the straw that broke the camel's back.

prathamtharwani 17 hours ago

Could you point us to any specific past instances? I'd be interested to read about them.
- jeffreyq 17 hours ago
  
  https://blog.railway.com/p/incident-report-february-11-2026
  
  x0x0 14 hours ago
  
  "we did not have the monitoring or controls to prevent our anti-fraud from hard killing 3% of workloads, including many instances of pg"
  Oof.
  
  rzmmm 11 hours ago
  
  Needs an anti-anti-fraud service which terminates malfunctioning anti-fraud services.
  
  x0x0 6 hours ago
  
  When I've written similar services, there was a (low) hard cap on how many fraud decisions they could action before they quit and paged. If we were getting hit with a wave of something, a human had to temporarily bump that limit.
locknitpicker 16 hours ago

> Let's just say Railway has had problems like this before, and the way their team handles them does not inspire any confidence.
This. It's very odd that in other threads we see a bunch of accounts heavily invested in criticizing a cloud provider, but what's conspicuously absent from this wave of indignation is any curiosity in the root cause, or even any interest in exploring what it might have been. Quite odd.
- gizzlon 16 hours ago
  
  Agreed, I'm very curious as to how this could happen.
  But TheRegister did reach out to Google and they have not replied yet: https://www.theregister.com/off-prem/2026/05/20/google-cloud...
  
  locknitpicker 11 hours ago
  
  > But TheRegister did reach out to Google and they have not replied yet
  That is exactly what GCP should do: not comment on a customer's issues. Even when it's due to abuse from a customer, which might even be the case.
  
  acdha 9 hours ago
  
  Railway are alleging that this affected many GCP accounts, which would make at least a confirmation or denial of scope appropriate.
  
  locknitpicker 7 hours ago
  
  > Railway are alleging that this affected many GCP accounts, (...)
  From what I've read in other comments, the root cause seems to have been automatic account suspension as an anti-abuse measures.
  It's also telling that Railway describes the root cause simply as "Google Cloud Platform has suspended Railway's production account." It then mentions this
  > At 22:20 UTC on May 19, Google Cloud placed Railway’s production account into a suspended status incorrectly, as part of an automated action. This action extended to many accounts within Google Cloud. As this was a platform-wide action, there was no proactive outreach to individual customers prior to the restriction.
  The why is conspicuously absent, but this sort of sweep is indeed consistent with anti-abuse measure.
  If this is the case I would be cautious in accusing a cloud provider of wrongdoing. Many things need to go awfully wrong to trigger this sort of alarm, and I'm not talking about GCP's anti-abuse system. In fact, it's telling that no reputable, well established business is reporting any impact. The whole point of any anti-abuse system is to suspend accounts that are caught engaging in some sort of abuse.
puppymaster 16 hours ago

another ditto from me, albeit anecdotal again. Railway dev teams play fast and loose with sprinkles of vibe coding everywhere on top. There's 'oops yea bear with us we are still a startup' and then there's railway.
- swyx 15 hours ago
  
  i mean even google and aws are not without sin on this one. maybe wait for an RCA before punching someone who is currently down. theres a reason classy people do "hugops" when a competitor goes down, regardless of reputation.
  
  1dom 15 hours ago
  
  Personally, I don't see this as people punching someone who's down. This is the sort of real life experience and necessary context from actual technical users that I come to HN comments for.
  Someone is just asking to get Google's side and explaining why they want that, which seems reasonable since we're in a post where Google is being punched/blamed for this, and it sounds like it isn't Railways first questionable outage.
rs_rs_rs_rs_rs 15 hours ago

>I've seen enough nonsense to want to hear what GCP has to say about this before drawing any conclusions
Sure but not even a warning before shutting down their account?
- egorfine 14 hours ago
  
  First time?
  It's google, come on.
- DannyBee 8 hours ago
  
  According to the timeline the account was suspended for 18 minutes total. That is fast enough that it could have simply been a bug in a Google rollout or something that made something think it was suspended when it wasn’t really.
  If it was actually suspended the yeah it’s weird not to get an email.
maipen 14 hours ago

Two years ago I needed their support and they were so toxic that I just moved to vercel and told them to f off. But I wanted something similar for other services and then I found coolify. There’s absolutely no reason to use railway when you can use coolify.
- pbronez 10 hours ago
  
  Yeah I’m planning a similar transition for my personal infrastructure. Railway is super easy to get started with, dashboard and logs features are nice, but I’ve just lost confidence in it.
theobr 13 hours ago

I got details I shouldn't have. I can confidently say this one's 100% on Google, and I will be disappointed if Railway is unable to share more. There is literally nothing they could have done to prevent this aside from avoiding GCP entirely.
- blensor 10 hours ago
  
  I saw your youtube video, and while I am generally a fan of the small guy against big corps, this was a bit much with all the "I am so afraid to say something but I have to" talk.
  So I will hold my judgement until this has been disected a bit more
  
  pratio 10 hours ago
  
  For those who come after me, curious about the video, here it is.
  https://x.com/theo/status/2056946993407369300
  https://xcancel.com/theo/status/2056946993407369300
  Couldn't find it on yt.
  Either way, I agree with blensor here, there's no new info on the railway incident itself but mostly about google's direction towards antigravity.
  About the author of the video mentioning that he's scared, unfortunately, that has always been the case with Journalism/columnists etc, speaking ill of the platform which you use to sell your wares tends to backfire. Wish him all the luck
  
  blensor 9 hours ago
  
  Sorry for forgetting to post that, and thank you for adding it.

valgaze 19 hours ago

May 2024 UniSuper incident: https://cloud.google.com/blog/products/infrastructure/detail...

https://www.unisuper.com.au/about-us/media-centre/2024/a-joi...

A joint statement from UniSuper CEO Peter Chun and Google Cloud CEO Thomas Kurian

8 May 2024

UniSuper and Google Cloud understand the disruption to services experienced by members has been extremely frustrating and disappointing. We extend our sincere apologies to all members.

While supporting UniSuper to bring its systems back online, Google Cloud has been conducting a root cause analysis.

Thomas Kurian has confirmed that the disruption arose from an unprecedented sequence of events, where an inadvertent misconfiguration during provisioning of UniSuper’s Private Cloud services ultimately resulted in the deletion of UniSuper’s Private Cloud subscription.

This is described as an isolated, “one-of-a-kind occurrence” that has never before occurred with any Google Cloud client globally. This should not have happened. Google Cloud has identified the sequence of events and taken measures to ensure it does not happen again.

Why did the outage last so long?

UniSuper had duplication across two geographies as protection against outages and data loss. However, the deletion of the Private Cloud subscription triggered deletion across both geographies.

Restoring the Private Cloud required significant coordination and effort between UniSuper and Google Cloud, including recovery of hundreds of virtual machines, databases, and applications.

kvakvs 18 hours ago

The instant cascading worldwide deletion upon closing or deleting a subscription sounds like a recipe for disaster. Why not mark it for deletion and delete say... a day or a week later?
- modernpacifist 18 hours ago
  
  Either mark-for-delete has the same impact as deleting in terms of shooting all the Cloud resources associated with the subscription, at which point the outage still happens but maybe the recovery is smoother or you've just delayed the inevitable by a week because no one will look at it unless there is actual impact.
  
  jeremyjh 17 hours ago
  
  You just turn it all off. So yes, the disruption is the same but restoral is much smoother. Much easier said than done - that has be baked into every service and there would certainly be a cost from it that would have to be passed along to everyone.
- shye 18 hours ago
  
  From personal experience, as a customer who once did something stupid: Google Cloud does soft deletes. But you need to reach out to support fast enough. And really, if you deleted something important and discovered it only the next day, and not within minutes, you're having a bigger issue that a soft delete won't solve.
  
  chillfox 14 hours ago
  
  What kind of shitty soft delete can’t be undone a few weeks after?
  Weekends and public holidays are a thing, plus it’s quite common for companies to shut down for 2 weeks over Christmas.
  There’s a lot of opportunity for mistakes or malicious actions to happen at times that won’t be discovered for a while.
- manapause 18 hours ago
  
  It’s a good question. That said unless there are compliance or fallback concerns i would prefer a service that burns my data on departure.
  
  raverbashing 17 hours ago
  
  No, that's the naive view
  Because in case of a compromise/unauthorized access that's exactly what you don't want to happen
  
  locknitpicker 16 hours ago
  
  > No, that's the naive view
  No, not really. That's pretty basic stuff. You would do well in reading up on the shared responsibility model. Customers are responsible for setting up their own infrastructure, and platform/service providers are only responsible for the services they manage. Even then, stuff like persisted data is still recoverable by design.
  But you are absolutely responsible for the service you put together. This is a basic principle for around two decades. Infrastructure as code tools are pervasive and ubiquitous for over a decade.
  
  raverbashing 16 hours ago
  
  Oh "reading about it"?
  Try experiencing it in person
  Again this is the naive view
  Again if someone compromises your accounts and everything is deleted instantly you'll be the one looking like a fool
- locknitpicker 16 hours ago
  
  > The instant cascading worldwide deletion upon closing or deleting a subscription sounds like a recipe for disaster.
  I don't agree. What do you expect to happen when you explicitly delete your user account? Do you expect your systems to remain in operation for a week? That itself would be a major risk and liability, as your whole infrastructure would still be up even though you cut your access to it.
  Also, isn't your whole infrastructure expected to be automatically deployed with IaC? The notable exception is data, which is already soft deleted and recoverable through customer support.
  All in all, where do you expect the customer's responsibility to end and the cloud provider's to start? The shared responsibility model is covered by any intro course in no uncertain terms.
dantiberian 18 hours ago

I wrote about the UniSuper issue at the time: https://danielcompton.net/google-cloud-unisuper. It was a pretty nasty bug where their VMWare environment was created with a one-year expiry date, but was one "resource" from the perspective of Google Cloud.
- suttontom 16 hours ago
  
  "UniSuper’s production Google Cloud VMware Engine (GCVE) private cloud was automatically deleted one year after it’s creation due to a misconfiguration in how it was created. When it was created, there was a bug in the creation script which passed a null value."
  That's pretty amazing. Not due to a cascading failure from someone changing a config deep inside of a system that caused a bunch of unintended effects, just someone who messed up writing a shell script?
  
  IshKebab 16 hours ago
  
  This is why you never use shell scripts for non-interactive tasks.
  
  GoblinSlayer 15 hours ago
  
  Probably javascript. Shell scripts don't have null values.
- raverbashing 16 hours ago
  
  Creating stuff with 1yr (implicit) expiry by default is just a delayed footgun tbh
  
  onion2k 16 hours ago
  
  That's one footgun, but then pushing that into production and actually deleting things rather than queuing them to be deleted later after a sanity check until the system is stable, and not informing users that the 1 year policy existing, (probably) not documenting that the expiry exists, not testing 'what happens if we pass in null?', etc are a whole series of mistakes.
  This was less "Oh look, a rare edge case that was easy to miss!" and more "We don't bother putting guardrails into critical systems. Oops!"
karlkloss 16 hours ago

"deletion of the Private Cloud subscription triggered deletion across both geographies"
It's called single point of failure, and it's the nightmare of everyone who was ever in charge of safety.

binarycleric 20 hours ago

How the heck do these things happen, especially with companies with huge monthly spend? At my last job we had some suspicious workloads running on AWS and our TAM reached out to us before taking any action. Who wants to bet this was some AI automation gone wrong and because GCP seems to be allergic to actually contacting a human to get a response, this just sits in some support queue that outsourced workers look at after a few hours just to give a canned response?

garciasn 20 hours ago

Nothing surprises me with anything related to support on GCP. While we absolutely do not need them, I have been through no less than 12 different Account Executives over the last 6y and they're all ENTIRELY and COMPLETELY useless.
They all introduce themselves, beg me to setup a meeting w/them and some sort of engineering resource(s), and they come to a meeting with a canned slide deck that is so absurdly unrelated to us that I just laugh, and then the next time I hear from them it's because we have a new AE.
This is my most recent reply (right after Next '26):
> I really appreciate you reaching out; however, we have met with, I dunno at this point, more than a dozen GCP Account reps, execs, technical teams, etc over the years and there's little to no value for us or you, now or in the future. Please do feel free to invest your time on your other clients. We're good; truly.
I love GCP and its services; we have been very pleased with it over the years, but the human side of it? Fucking sucks and I just don't see why they even bother.
- OptionOfT 20 hours ago
  
  It's because they're measured on something, unsure which metric, but it's definitely not how helpful they are to you.
  
  YuriNiyazov 20 hours ago
  
  Don't know about GCP, but our AE on AWS was also continuously rotating, and as best I can tell, their job was to figure out what we are planning to build, and to ensure that we should always use <INSERT AWS SERVICE DU JOUR> for that, rather than a competitor product or build it ourselves.
  
  Rodeoclash 20 hours ago
  
  Exactly the same experience for us as well. I just don't bother with them.
  
  garciasn 19 hours ago
  
  Before I just cut them off entirely, I used to tell them my primary concern was cost savings and that I wanted them to recommend ways I could cut 25% off my bill every month and watch the glorified salespeople fumble over trying to avoid that conversation.
  It’s ok though, Claude helped us cut >45% of our monthly costs. I’m surprised they haven’t been beating down my door after we made that level-shift. Probably in AE transition. ¯\_(ツ)_/¯
  
  realityking 15 hours ago
  
  My experience with a large-ish ($5m/year) AWS account was quite different. They were happy to support us with cost optimizations, discounts, and one time credits for certain activities (co-innovation and archiving certain milestones in their partner program).
  Their primary concern seemed to have been to keep as much of our workload inside AWS as possible and to win workload from 3rd party services we used (e.g. CDNs). The actual revenue appeared secondary.
  
  darkwater 14 hours ago
  
  This is my experience as well with AWS accounts in the 2-4M/year range, the biggest upsell they always try to make is Enterprise Support, but for the rest they are usually happy to help you cut cost in the short term - as long as you stay with them in the long term.
  
  captn3m0 13 hours ago
  
  The incentives are nicer here, from what I’ve heard: AWS TAMs are not reviewed in revenue at all. And cost savings for customers actually counts as a win for them.
- idontwantthis 20 hours ago
  
  It doesn’t worry you enough that someday you could have a serious problem and they wouldn’t be able to help you?
  
  garciasn 20 hours ago
  
  On the list of things that worry me the most about our company's stuff, an issue I cannot solve w/o help from a human at GCP is around #900000042.
- dylanpyle 19 hours ago
  
  For what it's worth - I'm not sure what the criteria is (I assume we're "medium sized / not a big upsell opportunity"?) - our GCP rep quickly pushed us to switching to using a GCP reseller. They took over our billing so that we can pay via ACH, and provide both free first-line support/escalation and paid engagements for bigger projects; they don't charge a premium on top, apparently Google pays them for supporting us. Hasn't made much of a difference in how we operate, but at least we have a direct-ish line for issues when they come up.
- shye 18 hours ago
  
  That's exactly why I'm less pleased with GCP: to trust a CSP (or any service), I need to be assured that when (not if) things go wrong, I could escalate to a team that would have my back.
- throwaway041207 18 hours ago
  
  This is actually kind of validating. I work for a company that spends almost 1mm a year on GCP. We've never had an actual support contract with them because the numbers work out to, at a minimum, being 10% of our spend. We've yet to encounter a situation where we actually needed GCP support, so we've held off. In the moments where we'd like to get some support (mostly around datastore behavior) we've managed to work around it or figure it out ourselves. So it's good to know we haven't missed out on much. Beyond the offensive aspect of GCP offering no support if we aren't willing to cough up a non-trivial percentage of our spend, I'm pretty happy with it.
guluarte 20 hours ago

It's Google. They let you use their services, but the moment you don't fit the norm, they suspend you.
- rajeshvar 19 hours ago
  
  What does blocked mean? Is there a different post that I am missing? There is shared infrastructure in GCP for networking (ex-googler here) and if only railway is affected, then it is not clear if it is only GCP or if there is something from Railway's perspective that needs to be addressed.
  
  Kwpolska 15 hours ago
  
  https://station.railway.com/community/what-we-know-so-far-ma...
  > Around 22:20 UTC, our Google Cloud account was placed into a "restricted" status hence removing all of our cloud overflow VMs, our CloudSQL instance, and our API.
  
  tardedmeme 12 hours ago
  
  It means they ban you. Cancel your account. No takesie backsies. Say bye bye to your revenue. This can happen at any time for any reason or no reason and it's amazing nobody's learned from all the other times it happened.
ndneighbor 19 hours ago

huh- I guess there are two HN submissions with meaningful replies...
I said this in the other thread, we got access to our account back, but even with a Account Rep. and a CSM on our account- it still took them a while to figure out what was going on.
I'm sure it could have been worse if we didn't have a rep on our account.

BitWiseVibe 20 hours ago

As someone who runs some public APIs, the amount of spam from Railway IPs is insane. They have horrible abuse prevention. Hopefully this encourages them to improve their operations.

nikcub 19 hours ago

This is the conflict at the center of running a hosting company - make it easy to signup and you get a lot of new users but also a lot of abuse.
Implement anti-abuse measures and you will hit some loud false positives (this may be the case with GCP here).
I don't envy anybody running a hosting co - the internet is a really ugly place under the surface.
edit: to add - AWS are really good here. Must be the ~30 years of retail fraud and abuse experience.
- edelbitter 17 hours ago
  
  I continue to receive phishing via AWS pretending to be Amazon. And not even the Unicode-lookalike shenanigans that my spam filter refuses for excessive mixed scripts, no; literally claiming to be Amazon as in: the company that operates the relay.
- swyx 15 hours ago
  
  i wonder if DID or World (various ways of Proof of Human) can help solve this issue.
  
  nikcub 11 hours ago
  
  This just incentivizes market for bio-mules, which already exists with world[0] - where prices stay low because it was rolled out to low-income countries.
  Then there's the platform game theory. If you adopt you add friction which reduces signups, and there will always be a competitor who would risk the 10x fraud increase in order to capture 100x the market. Railway has seen hyper-growth because it's so easy to run from, and is recommended by, coding agents[1].
  The solutions are here already just not well implemented or understood - probabilistic fraud detection, resource limits, service and automation limits, standard gov identity verification as a signal, enterprise sales channels with human relationships, etc.
  There are tradeoffs with each platform choice that just aren't well understood. Most users shop on price and DX and don't see the abuse infra or problem until it hits them.
  Google and GCP have a problem where they completely cook users who get flagged in their automated fraud net (this isn't news - or shouldn't be)
  [0] https://www.coindesk.com/policy/2023/05/24/black-market-for-...
  [1] and the problems that come with providing that simple interface, like sometimes dropping prod
- bootsmann 15 hours ago
  
  Is it really a false positive if railway lets people run abusive services on GCP and then GCP consequently shuts them down?
  
  iloveplants 8 hours ago
  
  the services are running on railways own servers, not gcp
- duckmysick 15 hours ago
  
  Hetzner is famously aggressive with their KYC (Know Your Customer) requirements, often locking new sign-ups and asking for photos of ID.
  Damned if you do, damned if you don't.

fjni 21 hours ago

Wait… railway runs on GCP? Didn’t they make a whole thing about not “building a cloud on top of another cloud?”

Or did they just mean that they’re not renting VPSs but only metal from the cloud provider?

In my mind I was so excited that there was another provider not just paying one of the hyperscalars but at a minimum colocating and owning more of their stack. https://blog.railway.com/p/heroku-walked-railway-run

eoswald 21 hours ago

Yep, and this is why I'm pissed. They lied. They're completely dependent on GCP. So, I gotta do some research, i need something a little more stable (and less dependent on one company's whims) than this. This is bad for them, because it really strikes at the heart of their 'big claim,' peacefull software deployments. This is chaos.
- ndneighbor 21 hours ago
  
  Yea, I mean, that's the whole MO of our platform and we failed at that. So yea, that's disappointing and more so for our customers.
  I can provide an explanation about the GCP dependency. Yes, we have host workloads off GCP, and we have been able to build a good business by performing a cloud exit. However, we were worried that we would have a circular dependency on our own cloud. I don't think we expected to get auto-modded out of our own account, hence we left our DB on CloudSQL.
  It was never our intent to deceive people that we didn't own our own destiny with our business. The last GCP issue, we were assured that this scenario wouldn't happen (when we got auto-ratelimited, which was bad, but survivable) - but it seems like we have further work to do. Apologies.
  
  fontain 21 hours ago
  
  I’m very sympathetic and understand that decisions are easy to criticize in hindsight but leaving your database in GCP while moving everything else to your own data centres seems so backwards I can’t even begin to imagine how that could happen. Was this really an intentional design decision?
  
  ndneighbor 21 hours ago
  
  > decisions are easy to criticize in hindsight
  I mean, the pain we have caused our customer ultimately proves you correct. That said, we made our decisions with the information and constraints that we knew in that moment in time. Railway has hosts in AWS/GCP/and co-los, so coordinating those workloads in a fully distributed manner would be ideal but end of the day, we didn't forsee that would just have our project get deleted just like that.
  (Even if we did get assurances from them in 2024, that it wouldn't happen again, although we just got auto-rate limited the last time.)
  
  r_lee 20 hours ago
  
  could you clarify, did an automated process by Google delete a GCP project/account/resource(s)? like, what exactly were you seeing when trying to get access or see what happened?
  
  ndneighbor 20 hours ago
  
  They deleted our GCP proj. sans warning. Still working the details, but that's how this whole thing began.
  
  csw-001 20 hours ago
  
  Thanks for getting things back up (genuinely mean that, btw). Upon logging back in I was prompted to promise I'm not deploying naughty things (I'm not). Was this in response to GCP detecting illegal (prohibited) behavior from something deployed via railway?
  
  ndneighbor 20 hours ago
  
  Actually, when I made the TOS check, I put that in Redis. That + the feature flags got reset.
  
  arjie 21 hours ago
  
  I have exactly the same architecture. You can easily administer a postgres/mysql on your own infrastructure, but it's also the one thing where backups and availability are super strict. I can easily support multi-region in Google Cloud or AWS and that's way harder to do on-prem, and it's also hard to handle the replication story as safely as with Google Cloud. The hope is that GCP et al. give you safety and availability for the control plane stuff and you can run your data plane on-prem.
  At $2m/mo spend, this kind of thing is insane. GCP has never been the most reliable of clouds but this is pretty awful. I would never have expected this.
  
  ahofmann 17 hours ago
  
  I have kind of the same architecture. I host multiple dedicated servers and vps instances in the Hetzner "cloud", but all of these connect to a few hosted databases by Hetzners web hosting packages for like 20 bucks a month. It sounds insane, but the one thing that absolutely needs to stay online, is the database, so not hosting this myself makes sense. And since Hetzner is apparently tuned their dirt cheap databases pretty well, we can hammer them pretty hard without any problems.
  
  yen223 20 hours ago
  
  this is easily explained by "database migrations are incredibly difficult and very risky"
  
  purduemike 19 hours ago
  
  Why CloudSQL? why not AlloyDB for stability?
miniman1337 21 hours ago

from the blog linked via Wayback Machine. "From Day 1, we had this notion at the forefront.
The other notion that we have intuited is that you can’t build a cloud on another cloud. We have devoted years of practice running our own metal (and playing well with other clouds) to make sure that Railway’s business, which invariably becomes your customer’s business, is as rock solid as possible."
- MrDarcy 20 hours ago
  
  That’s strange, when I interviewed with the founder a few years ago he told me they were on AWS wanting to move to firecracker.
- dlcarrier 18 hours ago
  
  I'm not familiar with Railway, so this might not make any sense, but it's possible they were using their own hardware but managing it with Google accounts. It's not uncommon for a company's offsite human-to-human communications to fail when there's a Google outage or ban, so it's not unexpected to have the same interference with human-to-machine or machine-to-machine communications.

chatmasta 19 hours ago

I thought Railway was building their own data centers? [0]

> The fact of the matter is, you simply cannot build a cloud on someone else’s cloud.

Indeed…

[0] https://blog.railway.com/p/launch-week-02-welcome

QuinnyPig 18 hours ago

Vercel seems to be pulling it off. So does PlanetScale, albeit for databases only. But everything’s a database.

ksajadi 16 hours ago

When you signup for Railway, they have uncommon way of making sure you have read and understood their T&C regarding abuse of their systems, including crypto mining, etc.

My guess is that many are abusing their free tier, causing them trouble with their service providers.

I take no joy in seeing Railway take a hit like this, even as a competitor, but free compute attracts all sorts of strange users. We've been there and decided early on to avoid free compute even it costs us our top of the funnel.

eoswald 21 hours ago

Sorry, I have a hard time blaming Google for this, when Railway seems to be having increasing trouble keeping the platform stable. Something like this should NOT take down an ENTIRE service. There should be a backup when literally your business is about being the reliable backend. This just seems like poor planning to me.

cactusplant7374 21 hours ago

Disaster recovery is pretty expensive, right? Especially for their size.
ryanisnan 21 hours ago

I don't quite know what you mean. Do you really expect Railway to use a multi-cloud architecture to host all of their client's projects? I suspect that would lead to a lower availability, all things considered.
- impulser_ 21 hours ago
  
  They literally own their own data centers. That's whats surprising about this. They are lying to their customers when they say they operate their own data center because obviously they don't if everyone's apps are down with GCP blocking their account.
  
  ryanisnan 21 hours ago
  
  Oh, I see what you mean. Eh, it's possibly the same reason that AWS essentially goes down when us-east-1 goes down.
  
  brookst 21 hours ago
  
  Is it not possible that they own their own data center and have an unfortunate Google dependency?
  Obviously a fiasco but I’m not prepared to call them liars when it could be an honest mistake.
  
  Terr_ 21 hours ago
  
  I imagine there's also an important difference between:
  1. We depend on X but could gracefully migrate to an alternate in a week if we really needed to.
  2. All data is mirrored instantly so that we can do seamless fail-over in case X has its own outage.
  
  impulser_ 20 hours ago
  
  Then don't say your not a "Cloud on top of a cloud" provider.
  They even made fun of cloud providers being down when AWS was down.
- eoswald 21 hours ago
  
  Well, in the same token, is it smart to base your ENTIRE architecture on a single cloud architecture? Isn't that why some of us build in fallbacks for AWS-hosted services? I mean, their enitre platform, both public and private facing, is running on the same thing. One error, one problem, takes out the entire service.
  
  irjustin 21 hours ago
  
  Taking this at face value, this doesn't happen to AWS clients - at least I don't read about it here.
  AWS may have data centers[0] go[1] down[2], but that's within expected bounds of standard ops.
  [0] https://hooks.slack.com/services/TJ7HQS7FC/B0B5S7UTBJ4/PUHIC...
  [1] https://www.aljazeera.com/news/2025/10/21/what-caused-amazon...
  [2] https://netflixtechblog.com/lessons-netflix-learned-from-the...

UrbanNorminal 20 hours ago

Is google allergic to humans or something? Cannot they just send an email or call the company before taking a wrecking ball to the entire company's infra? Are they stupid?

BarryMilo 20 hours ago

Surely this is automated. They wouldn't waste precious dollars on employing humans just to keep other humans happy.
- snypher 17 hours ago
  
  It surprises me there's not a manual review for $$$$ accounts. Speculation at this stage, but it's weird they would be put in the Recycle Bin like that.
lateral_cloud 18 hours ago

Keep the pitchforks at bay for now. No one knows what actually happened yet and we are only seeing one side of this outage.

faangguyindia 22 hours ago

Google cloud also locked out a Korean Goverment Organization recently. The guy posted on GCP subreddit.

Google really need to improve their support team. It's strange such a big corp can't even afford to have proper support team.

King-Aaron 22 hours ago

> It's strange such a big corp can't even afford to have proper support team
This seems to be by design.
- ndneighbor 21 hours ago
  
  We have a CSM, Head of Customer Support contact, and further contacts with GCP. Despite that, we still had this issue.
danpalmer 21 hours ago

> It's strange such a big corp can't even afford to have proper support team
Railway say they are in touch with that support team.
- shooker435 21 hours ago
  
  god help them
  
  danpalmer 16 hours ago
  
  I had good experiences with their support, and bad experiences with AWS support. tldr: YMMV.
add-sub-mul-div 21 hours ago

Automating support, automating everything is the key to their whole deal. Tech giants leapfrogged the rest of the economy by innovating a company that can scale its customers without having to scale itself proportionally.
benwoodward 20 hours ago

pretty sure their support team is a flaky ML model that is haplessly flagging random accounts
choilive 20 hours ago

Not strange, Google has never had a proper support team unless you are an "Enterprise" level customer.
aranelsurion 11 hours ago

> support team
They must’ve upgraded them to Gemini 3.5 by now.

bearjaws 20 hours ago

I will never leverage GCP in an enterprise setting, it's honestly amazing how hard they fumble the bag. Will be interesting to see when GCP support started working with them, from the updates there was an hour and change from when they identified the issue and GCP support was confirmed.

In the cloud space it seems like AWS does nothing and wins.

brokenodo 20 hours ago

Well, as a 2 week tenured and very happy Railway customer until now, I am now a Render customer. Somehow DNS cut over within 1 min(!) and live after about 30 minutes of work. Not bad!

DrewADesign 20 hours ago

In my experience, DNS changes are a lot faster than they used to be. There’s some website that has a map that tries to resolve your domain with a bunch of name servers around the world that was pretty neat to look at last time I migrated something.
- nbarbettini 18 hours ago
  
  I became so conditioned to waiting hours(!) for DNS propagation that I'm always pleasantly surprised when it takes <5 min these days.
  
  DrewADesign 8 hours ago
  
  Yeah way back in the day I’d be used to waiting overnight
twostorytower 17 hours ago

I love pointing my name servers to Cloudflare so any DNS changes from that are practically instant.
- swyx 15 hours ago
  
  as with many things, we say we like decentralization but quietly vote for centralization

Avicebron 21 hours ago

Isn't Railway the "the API key to delete the backups is in the prod database, because that's where the backups live duh" guys?

trvz 16 hours ago

No, this is the company that failed those guys.
You should also read the story, as you're perpetuating a false version of it: https://x.com/lifeof_jer/status/2048103471019434248

codegeek 21 hours ago

This is bad. Even their own website is down at railway.com. Looks like total dependency on google cloud. Surprising for a company of their scale with all this VC money.

choilive 21 hours ago

They run a decent amount of their own compute/bare metal server for customer workloads. But likely still had some critical dependencies on GCP.
cube00 18 hours ago

> Surprising for a company of their scale with all this VC money.
Not sure too many VCs would be cool with deep redundancy when there's more features to build to bring in more customers instead.
rmeara 16 hours ago

Google has a total dependency on it's own infra and does fine. Why do its customers need multicloud? Huge PITA unless you need an absurd number of 9s

whh 21 hours ago

This could kill a startup. I really don't like Google's automated and silent account murder functionality.

MrDarcy 20 hours ago

There’s no way this was automated or silent.
The only reasonable explanation is Railway lost control of their estate and something was happening that warranted a group of humans to decide flipping the kill switch was the best of a set of bad alternatives.
- macintux 20 hours ago
  
  You’re giving Google far more credit than they’ve earned.
  
  whh 12 hours ago
  
  It's almost certainly one of those Android Store related checks or YouTube account checks. It's why it's best to disable login for the services you don't want your staff messing with on Google Workspace.
- faangguyindia 19 hours ago
  
  you can go on google cloud subreddit and watch horror stories
  i actually built a good plan out of those horror stories for my companies.

throwaranay4933 23 hours ago

This screenshot from Discord suggests the idea that the outage is caused by automated GCP account ban: https://x.com/acgfbr/status/2056866780866351323

Alive-in-2025 19 hours ago

Automated account bans are the bane of internet existence today. I was banned from reddit for "bad behavior", I appealed and both times it's oops, there was nothing there, some automated system thought your comment was rude even though it wasn't.
Then they send you very strongly worded messages that says trying to work around the ban will lead to something bad happening.
I've been worried my main email account provider would do this. The core issue is even if you pay, even if you are a company as shown here companies don't carefully enough have limits on banning. I can only imagine they ban lots of scammy things every day so "they think it's working great".

enahs-sf 21 hours ago

I respect what railway is doing but also would never run my business on such a platform.

dpark 21 hours ago

That kind of sounds like you don’t respect what they are doing.
- enahs-sf 19 hours ago
  
  I think it’s good people are making IaaS platforms, but have dealt with enough firefighter hero bullshit to have seen this coming a mile away. Uptime and redundancy are strongly correlated.
eoswald 21 hours ago

Today changed my opinion on them completely. Was willing to give them the benefit of the doubt that they're growing fast, but now seeing that they've failed to scale properly, and are missing little things that become big things later. I can't take that risk.

usernametaken29 19 hours ago

I didn’t knew Railway so with this misleading headline I thought a Google Cloud data centre was being built in the way of a railroad. That’d been a funny story to read..

astafrig 19 hours ago

How is the title misleading?
- tauntz 14 hours ago
  
  "Railway Blocked by Google Cloud"
  If you don't happen to know that "Railway" is referring to a company, then you might reasonably read that as "a GCP outage caused issues in the train network somewhere".
Polizeiposaune 17 hours ago

An elevated railroad once ran through one end of what is now a Google-owned building (Chelsea Market in Manhattan). It's now part of the High Line elevated pedestrian park.

TheTaytay 21 hours ago

I’ve seen a few smug “all your eggs in one basket” comments here.

I’m aware of some companies hosting their own metal and infra, but I’m not aware of large companies mitigating risk by hosting on separate cloud providers as a fallback mechanism. We might disagree with cloud provider choice, or think they should have been hosting their own metal, but that’s still an “all your eggs in one basket” choice, right?

Heck, they might even have multi-region fallback with GCP, but if GCP bans your account, that doesn’t matter.

Are there good examples of running a company of railway’s size so redundantly that their host could nuke one of their accounts and they’d just keep on trucking?

fontain 21 hours ago

They do run their own metal. That’s their entire ethos. Railway is their own cloud.
chradams 21 hours ago

Just google multi-cloud. Yes. It's a thing.
- wmf 20 hours ago
  
  99% of multi-cloud is fake though. True multi-cloud is incredibly rare.
  
  TheTaytay 18 hours ago
  
  I appreciate it. That's my belief as well. Very easy to write a post like, "Just use multiple clouds!" or to claim to have done it with a small project. But it's hard for me to imagine the benefits outweighing the extremely massive complexity costs at a certain scale.

zx8080 14 hours ago

For those who opened this link to read news about the real railway (with trains), it's not about it. Thank you for wasting my time!

padolsey 20 hours ago

Does anyone know how this even happens inside the walls of google? Is it an automated process? How is such a (presumably) high revenue account just magically blocked without human intervention? I'm quite perplexed.

jpollock 20 hours ago

There would have been efforts to contact them, but it would have been via their contact method, aka the email they set it up with.
Common ways this happens? They are using a credit card to run their business with no backup payment method. Then the company's contact person is on vacation.
Sign up for terms. It will get you payment terms!
- scratchyone 20 hours ago
  
  Honestly still insane to nuke a high-volume client's business after a single payment issue. There would be no reason for Google to believe that a single hiccup like that is evidence that they won't get paid and have to cut account access immediately.
  
  antran22 17 hours ago
  
  Railway might not be even in the realm of high-volume clients for Google. For all we know they might be efficient in utilizing Google infrastructure.
  But most likely, it's just automations in place without an appropriate human override coupled with gross negligence.
  
  4lx87 6 hours ago
  
  It is insane, but my past experience with GCP is they suspended all service only days after a failed payment, after years of paying on time. It's a major factor in why I don't use them anymore. I'm not waking up to angry customers again because the CC is expired and I missed an email.
  I'd be curious to know why Railway's account was suspended. Was it a similar payment issue or something else?
  
  jpollock 1 hour ago
  
  It's not a single payment failure, it would be multiple days, possibly even a week to 10days.
  This is why businesses should put in the effort and sign up for credit terms. Then it's an invoice, and you reduce this risk substantially.
  Credit cards are _not_ reliable at this scale. Banks are offline all the time, cards are marked stolen, protocols change, all sorts of things that will cause flags indicating "the money can't move down that path".
  Businesses that pay for AWS/Cloud/etc via credit cards are trying to buy reliability but put it behind a single point of failure.
  Credit cards are not how you should be paying for business services with uptime requirements!
- mbreese 20 hours ago
  
  Yeah, I'm not sure what to think here. We know Google is not the best at customer service and has automated account suspensions. But, what I'm curious about here is why this happened.
  Railway hosts applications for customers. An uneducated guess for some possible reasons: 1) one of those customers hosted something they shouldn't have 2) railway had something spawn that took up too many resources 3) Or their account balance was too high 4) Or something...
  But all of this probably culminates in someone needed to read an email that was missed.
  Scaling a customer infrastructure setup like Railway is hard. This is one of the non-technical hard parts - how to make sure your account with your primary vendor is safe. But, I'm willing to wait to pass judgement here until more information is available. I'm sure the post-mortem will have lessons. I'd like to know more.
- thayne 19 hours ago
  
  > via their contact method, aka the email they set it up with
  If it's anything like AWS, that may be just one of hundreds of emails they send every day, most of which are just noise.
jasonkester 17 hours ago

Yeah, compared to the AWS experience:
I had a toy Free Tier account that managed to overstep a limit one month and rack up $0.0038 in charges.
AWS hounded me about it for an entire year before finally putting the account on hold. Then kept at it for months more before finally deleting it.
It’s pike the paperboy from Better off Dead, if he were to continue delivering newspapers while hounding you for his two dollars.

cube00 15 hours ago

Railway "What we know so far: May 19th 2026": https://station.railway.com/community/what-we-know-so-far-ma...

mjy78 18 hours ago

All in on cloud so we don’t need to worry about backups. Now your subscription is the single point of failure.

jefborges 20 hours ago

Railway is back, but I’m not sure if I can trust keeping my projects there, so I’m going to migrate to another company.

oofbey 19 hours ago

After reading about how their delete database API also deletes all the backups, I concluded they are not to be trusted.
- CodesInChaos 15 hours ago
  
  Don't all major clouds do that by default? But at least they have additional protections you can configure, if you know about them.
marknutter 19 hours ago

It's not back.

hnburnsy 19 hours ago

From their founder on X...

"Absolutely. The Railway network is a mesh ring between AWS, GCP, and Metal

So: - High availability interconnects - High availability path routing between clouds - Database itself is high availability

However, Google's VPC itself is not. So we will add a shard to Metal and AWS"

hnburnsy 19 hours ago

More here...
https://x.com/JustJake

thrownthatway 18 hours ago

Huh.

Railway dot com

Has nothing to do with railways.

I wish software people would get their own words.

patrickmay 9 hours ago

I was also expecting a story about a physical railway being shut down.

sammy2255 19 hours ago

The 3-2-1 backup rule is pretty outdated in the world of cloud. You could have 3 complete copies of your data in different S3 buckets, but if they're all under the same account you've lost your blast radius protection

rsync 19 hours ago

If only there were a quick and easy way to replicate s3 buckets to an independent provider…
… on the Unix command line …
… to a cloud older than AWS…
… if only …
- eclipticplane 19 hours ago
  
  I don't think that technology exists. Sorry.
- oefrha 19 hours ago
  
  Well having backups help, but I certainly can’t migrate my infra to rsync.net on moments’ notice (or ever since rsync.net does storage and nothing else) so my customers aren’t affected.
- funtech 19 hours ago
  
  Wish I could upvote this comment account more. Too many people look for something new and shiny when trusty ol tools are sitting right there. :)
- lemagedurage 18 hours ago
  
  Inflated egress costs might make this prohibitively expensive, $80 per TB at GCP and AWS
whalesalad 18 hours ago

You replicate data to different clouds.
zootboy 16 hours ago

It's not outdated, you just actually need to follow it. 3 copies of data in separate S3 buckets is ignoring the "2" in the 3-2-1 rule: 2 different mediums, and also the "1" rule: 1 copy offsite. In the cloud era, offsite means not on the same cloud provider. Different mediums ideally means a non-cloud provider (e.g. a NAS at your office under your control).

jaspanglia 18 hours ago

Cloud platform dependencies are becoming a huge single point of failure

jkogara 13 hours ago

Interestingly, upon logging in this morning I was presented with a new terms and conditions banner that required me to agree to not deploy a list of, to varying degrees, nefarious things (bots, torrents, "anything illegal", etc.). Is it likely that some of these workloads resulted in the auto restriction from GCP?

gnabgib 22 hours ago

Dupe - join the discussion started an hour ago instead of query string work (12 points, 4 comments) https://news.ycombinator.com/item?id=48200827

aarondf 22 hours ago

I added the qs because it defaulted to a story from 3 months ago.

danpalmer 10 hours ago

7 minutes from bug filing to account restoration. This shouldn't have happened in the first place, but that's an excellent response time from the support team.

Mengkudulangsat 21 hours ago

That explains why all my vibe-coded hobby projects are down.

Thank God I'm not dealing with any public-facing sites! Would have been an expensive lesson for a newbie coder if my job depended on this.

orliesaurus 20 hours ago

I wonder if someone has exploited a weird Google-safety automated process to report something on Railway which caused Google to block the whole thing.

whh 12 hours ago

There's that "automated action" again. Regardless of the architectural decision, it makes me incredibly uneasy relying on GCP if these types of things can happen.

r_lee 20 hours ago

seriously, is it possible to trust GCP with critical data/services at this point if you're not a billion dollar company?

I'm exaggerating but someone said they got "auto banned"

what if that happens to a small account which hosts some really important data/services there?

Avicebron 20 hours ago

> what if that happens to a small account which hosts some really important data/services there?
Pray to @dang that you will make the front page of HN?
throwaway85825 20 hours ago

Even if you are a billion dollar company you still have problems like the Australian pension did. Google is just that bad.
ttoinou 20 hours ago

Railway isnt far from being a billion dollar company, no ?
- intelVISA 7 hours ago
  
  I don't want to believe this, lol.
xyzzy_plugh 20 hours ago

I've managed several accounts with GCP over the years and I've always maintained a great relationship with our contacts there. Some of these accounts were quite small, on the order of <$20k/mo, and even then we were kept abreast of anything that might be cause for concern. I always maintain a standing biweekly meeting with at least someone on the other side (account exec, technical staff, whatever) and I've yet to be blindsided by anything.
Is Google's communication good? No, not particularly. The only way something like TFA happens is if the relationship is neglected (by one or both parties). I'm not saying Railway did something wrong, but there are usually many flags and opportunities to correct long before drastic actions.
I get the impression that Railway plays fast and loose with a lot of their limits and resources and that Google may not be a fan of that.
Edit: would also like to say that if you put all your resources in one GCP project you are going to have a bad time. If you organize stuff over many projects it is very unlikely that they will ever take account wide action. I've had issues with, for example, a particular tenant's behavior, but it never jeopardized the other tenants.
chi_features 20 hours ago

https://blog.railway.com/p/series-b
Agreed. Railway are probably not far off a billion dollar company though!
jrockway 19 hours ago

I don't think you can ever trust one service with critical data. Some Claude instance deletes your prod database, you have to restore from an offsite backup because it also deleted your local backups. Even at small startups we did pg_dump to AWS from GCP because ... who knows what is going to happen to GCP, and we want to continue to be in business if that happens.
I don't feel safe with any one single point of failure. "Your credit card bounced", "you thought it was dev", "you got hacked", etc. are all the same problem to me and no cloud provider solves those merely by setting up an account.

AbstractH24 9 hours ago

Did anyone discover any unexpected tools/wesbties use railway during this outage?

zelon88 19 hours ago

Wild to me that any tech sector business would want to rent an operating environment to park their entire infrastructure into. This is the equivalent to traveling shoe salesmen setting up a tent in the parking lot of a strip mall.

brunooliv 14 hours ago

Having tried many of these hosting services to host/play with toy apps, DigitalOcean and Fly.io are both unparalleled GOATs.

mattbee 15 hours ago

The risk of an "upstream cloud provider" is not something you need to tolerate in your supplier of internet infrastructure!

dlcarrier 18 hours ago

This is the kind of outage worthy of a Kevin Fang video.

tux 20 hours ago

At this point you can’t trust Google anymore, it keeps breaking things. Imagine having Google AI do this thins automatically. Will have apocalypse in in a day.

yomismoaqui 13 hours ago

Remember, the cloud is someone else's computer.

If that person turns it off you're screwed.

dwa3592 21 hours ago

Wait, I thought railway was a cloud provider like AWS, GCP but better and more agile. At least that's the impression i got from their website.

pavelevst 17 hours ago

Avoid vendor locking, have backups, make disaster recovery standby (or plan for quick recovery elsewhere)

leventhan 19 hours ago

What's a good alternative to Railway?

brokenodo 21 hours ago

I’m a new customer and have been falling in love with Railway over the last 2 weeks, but this is quite the wake up call.

csw-001 21 hours ago

Literally in the same boat. I've been really happy with it, but this is a major eye opener.... It's been done for a looooong time by provider standards.
- reelvideocap 21 hours ago
  
  same
TheAtomic 21 hours ago

same same
choilive 20 hours ago

Been a customer with them for over a year now, small incidents here and there but never anything this major.

steve1977 17 hours ago

Lesson learned: don't rely on a single hyperscaler, even (or especially) as a startup.

burnerRhodov3 17 hours ago

I just... I don't really understand why startups even use AWS, GC, or any other cloud hosted software? Hetzner, etc. Are all extremely cheap, and honestly scale so well... Code nowadays is cheaper for configs, and having full control over your compute is... liberating.
- steve1977 17 hours ago
  
  Oh absolutely... and many use architectures that have evolved out of the needs of really big companies and are not really a good fit for a startup. But I guess they want to be "ready for growth".
- dannersy 17 hours ago
  
  Low cost to entry, easy to get scale from the beginning if you need it. The large cloud providers throw free credit at startups to lock them in all the time. I had a short lived stint trying to get my own startup off the ground and it was really easy to get free compute from Google with no strings attached. This was many years ago now, but I would be surprised if it is any different.
  I am with you entirely and would not have taken that route today, but it is really easy to see why people go that route.
- antran22 17 hours ago
  
  A few years ago, when I was kinda active in the startup scene in my area, you have people selling access to cloud credits with penny-on-the-dollar price. The credits are given out liberally to big-corps, organization by AWS/GCP, through workshops, webinars, events. All in the hope of roping the departments into building MVPs, demos on AWS/GCP, but people also find a way to cheat on that system and make some quick bucks.
  I know a startup of my acquaintances that have been running on AWS for 5 years straight without paying a single dollar to AWS. When the credits almost run out, they started to migrate their data over to another account with credit. That happened twice already.
  It helps to have a portable, replicable IaC config. But also this is sustainable because they are a pretty small struggling shop. You will probably not be able to do this if you are trying to maintain more than 3 nines for an enterprise client.
- chi_features 16 hours ago
  
  Perhaps Railway does a bit more than what you think, they have some great functionality (I'm not affiliated with them). Check out [Features | Railway](https://railway.com/features) "PR Environments", they are incredible for the QA process

bilalq 18 hours ago

Building a startup on GCP (or even Google Workspace) is an existential risk.

redanddead 20 hours ago

one of the many reasons companies are cloud agnostic and dont want to get locked in

fh67 20 hours ago

Yeah but until you find that the new cloud provider won't approve your compute quota or doesn't have enough capacity in the region or you hit fraud flags for stagnant account spinning up lots of compute.

parineum 20 hours ago

There's a lot of, what seems to me, unfounded blame being directed at Google for this. Isn't railway the company that just blamed Anthropic for deleting their prod database?

mmmore 20 hours ago

Nope, Railway was the company who was hosting PocketOS, which is the company that blamed Cursor for deleting their prod database. Railway is only involved insofar as their API allowed an instant delete of the prod database.
- oofbey 19 hours ago
  
  Railway deserves a lot of blame here. Deleting backups along with the database is a lot like not having backups. Moronic design choice.
  
  Genego 19 hours ago
  
  Why does Railway deserve any blame here at all? It was an MCP with elevated infra access, that the user willingly connected through Cursor, which allowed an LLM Agent to manage infra on Railway. The user would first have gone through oAuth confirming the access level scope (I would have rejected the moment it indicates to me that it can delete critical infra and backups...). So obviously it has access to all commands the user would also have access to. From my perspective the blame is entirely on the user, and partly on Cursor for not enforcing HITL correctly across their agents.
  
  wmf 18 hours ago
  
  Putting AI aside, people make mistakes. One of the most common mistakes people make is deleting the wrong thing. After they realize the mistake, people want to restore the thing they deleted from backups. Thus deleting the thing and deleting the backups of the thing should always be separate operations.
  
  utunga 18 hours ago
  
  Absolutely.
sidrag22 20 hours ago

fairly certain you are remembering the goofy article that was going around where a railway user allowed an agent to delete his db. iirc he questioned the agent after and the agent told him it should have read the file that told him not to do things, so just sounds like he deleted his db and blamed his tools.

jujube3 19 hours ago

If you buy a cloud-on-a-cloud, you're a clown-on-a-clown.

koolhead17 18 hours ago

Let's blame some rouge AI agent at GCP causing this.

ryanisnan 22 hours ago

Yikes. I was wondering why my TLS certs were coming up as invalid.

bshack0 21 hours ago

so....what are we switching to y'all? cloud-run ? ;P

auxiliarymoose 21 hours ago

federated hardware (a bunch of raspberry pis networked into a high availability kubernetes cluster, hidden across various local coffee shops for free power and bandwidth)
throwatdem12311 21 hours ago

raspberry-pi cluster in my closet
- frio 20 hours ago
  
  16GiB Raspberry Pi 5s in my country are now going for ~$450USD, so I've gotta say that's out of reach for me now :(.

eezing 18 hours ago

“Deletion of private cloud subscription…”

Who deleted it?

isninkhamiss 21 hours ago

github got way more noise for less

ChrisArchitect 20 hours ago

Earlier: https://news.ycombinator.com/item?id=48200827

Drew-Aetherwave 21 hours ago

It is killing me...

jamwise 17 hours ago

There goes a 9

mcontrerazCL 22 hours ago

all my fkn postgres bd in railways! what do i do now?

cactusplant7374 21 hours ago

Take a walk. Breathe in the fresh air. It feels good.
eoswald 21 hours ago

Hahah at least you're not getting called every five minutes because you cant shut off the alerts, because its apparently deployed SOMEWHERE but good luck finding how to access it. Can't wait to see the bill from Twilio because of this lol

WhereIsTheTruth 16 hours ago

When your cloud depends on an other cloud

All these companies are fraud

Osborn_Ojure 20 hours ago

compute recovered, get ready boys!

fnord77 19 hours ago

wish I knew what "railway" is

iloveplants 23 hours ago

seems like it's every day

shevy-java 18 hours ago

Do not become dependent on Google. Ever.

rvz 21 hours ago

Let me guess… Googler running AI agent in production that blocked this startup’s account.

paganel 13 hours ago

Apparently this has nothing to do with real-world trains and to the real-world rail system, at first, and reading the title alone, I had thought that some trains might have got stuck somewhere because of an IT (google cloud) failure. It's just another SaaS story.

rekabis 21 hours ago

TL;DR: putting all your eggs into one basket is bad, man.

lfx 21 hours ago

That’s true, however having only few eggs and shopping for several baskets does not make sense in early days. Not sure how big railway is, but usually you start small with one egg.
- christophilus 21 hours ago
  
  You’d think they wouldn’t have started with GCP. There are plenty of datacenters where you can buy racks and racks of servers, and talk to a human when something goes wrong, and even walk in and access your servers. That’s what I’d be using if I were to build a Rackspace today.
  
  tomschlick 21 hours ago
  
  They started on GCP and have been migrating to their own "Metal" DC doing exactly what you're describing. But GCP is still their overflow given how rapidly they are growing and holds some amount of networking that routes to their DC.
  
  wmf 20 hours ago
  
  Colo is worse than cloud when you're getting started. Sure, you can talk to a person but everything else is much lower quality. People are obsessed with having someone to yell at but yelling does not fix outages.

rekabis 21 hours ago

TL;DR: putting all your eggs into one basket is bad, man.

canpan 21 hours ago

How to handle domains? The rest is easy, but your domain registrar blocking you sounds like a pain. My current solution is to use a local small provider, just for the domain. Then if there is a problem with your play account it is out of any blast radius.
- FlamingMoe 21 hours ago
  
  What do you mean by local small provider? A registrar on main street?
- truekonrads 21 hours ago
  
  MarkMonitor
  
  Barbing 20 hours ago
  
  Any changes since acquisition?
  Looks like they were sold at the beginning of the year to a company without a Wikipedia page whose parent company doesn’t have one either https://en.wikipedia.org/wiki/Markmonitor
  Acquired in November 2022 by Newfold Digital, it was later announced that the firm would be sold to Com Laude, a company owned by PX3 Partners.
  -
  Edit-Private equity apparently https://px3partners.com
  PX3 stands for purpose, passion, and performance. It is a pan-European private equity firm with headquarters in London. It invests behind transformative themes and targets companies operating within select segments of the business services, consumer and leisure, and industrials sectors with strong business fundamentals.
- rekabis 16 hours ago
  
  What the deuce are you blathering on about. An account got blocked, this has nothing to do with a domain.
  And I’m talking about having disparate failovers that don’t rely on a single hosting provider. At that point, who cares what Google does to your cloud account… work with the hot failover and spin up another hot failover somewhere else.
binarycleric 20 hours ago

Same applies to all the companies betting the farm on AWS.
- rekabis 16 hours ago
  
  Precisely. If you’re going to have a hot failover, it behoves you to have an entirely separate entity billing you for that hosting.
  Honestly, I don’t know where the downvotes are coming from. Do people have no clue about service resiliency? I can understand if it’s a personal project or you haven’t yet scaled to paying customers, but anything at scale with serious money involved needs to be completely independent of the underlying hosting. It should remain up even if an entire provider goes titsup.
Aachen 5 hours ago

Note, you submitted a dupe: https://news.ycombinator.com/item?id=48201711 (the comment I'm replying to is 1 ID older so I guess this is the canonical one to reply to)