points by drewg123 4 years ago

We test on production traffic. We don't have a testbench.

This is problematic, because sometimes results are not reproducible.

Eg, if I test on the day of a new release of a popular title, we might be serving a lot of it cached from RAM, so that cuts down memory bandwidth requirements and leads to an overly rosy picture of performance. I try to account for this in my testing.

cnst 4 years ago

Test in prod FTW!

However, how do you test the saturation point when dealing with production traffic? Won't you have to run your resources underprovisioned in order to achieve saturation? Doesn't that degrade the quality of service?

Or are these special non-ISP Netflix Open Connect instances that are specifically meant to be used for saturation testing, with the rest of the load spilling back to EC2?

  • drewg123 4 years ago

    We have servers in IX locations where there is a lot of traffic. Its not my area of expertise (being a kernel hacker, not a network architect), but our CDN load does not spill back to EC2.

    The biggest impact I have to QoE is when I crash a box, but clients are architected to be resilient against that.

Aissen 4 years ago

Thanks, it's an interesting set of tradeoffs.