janalsncm 3 hours ago

Never used gpu spot instances before but I would have to imagine getting interrupted is pretty annoying.

  • somehowadev 2 hours ago

    As long as your workload can handle resuming again and your instances aren't heavily in-demand (looking at the eviction rates), the cost saving for us is substantial enough to take the occasional interruption.

    I do wish Azure gave more than the 30 second eviction warning (like AWS) but still useable.

  • hhh 3 hours ago

    it depends, our workloads can finish up in under two minutes and shut down without much effort, so we haven’t really noticed it outside of one time when we had no spot capacity.

    • janalsncm 3 hours ago

      I guess if checkpointing is set up correctly and your runtime is saved to a docker image it’s feasible. Probably not going to get a 3 hour continuous chunk of time I would assume.

      • direwolf20 2 hours ago

        When I once used Spot it wasn't that bad. You're likely to have an instance for 3 hours.