Boston Linux & UNIX was originally founded in 1994 as part of The Boston Computer Society. We meet on the third Wednesday of each month at the Massachusetts Institute of Technology, in Building E51.

BLU Discuss list archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Discuss] automatic daemon restarts

On 9/15/2014 4:15 PM, Tom Metro wrote:
> Not to say your points are invalid, but Netflix would disagree with you.
> They created a testing tool that intentionally kills random services on
> their production systems just to test that automated recovery works
> correctly.

Netflix is a highly available application system that is designed to be
robust in the face of isolated faults and to degrade gracefully under
failure conditions. Chaos Monkey is the tool that they use to test the
implementations of their designs. It works by shutting down random
Netflix-owned instances within the AWS scalable architecture. Automated
recovery in the Netflix environment is simple: spin up a new instance
that is configured identically to the one that failed. They don't try to
restart the faulted instance. It's down for the count and it stays that
way so they can analyze the fault that knocked it out.

This is a /very/ different scenario from what you might have with a
single LAMP instance where systemd keeps restarting MySQL after a
persistent fault of some sort keeps knocking it out. This isn't
automated recovery; it's an automated disaster looking to wreck your tables.

Rich P.

BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!

Boston Linux & Unix /