Lecture Operating systems: A concept-based approach: Chapter 19 - Dhananjay M. Dhamdhere

A fault may disrupt operation in a system by damaging the states of some data and processes. The focus of recovery is to restore some data or process(es) to a consistent state such that normal operation can be restored. Fault tolerance provides uninterrupted operation of a system despite faults. This chapter discusses recovery and fault tolerance techniques used in a distributed operating system. Resiliency, which is a technique for minimizing the impact of a fault, is also discussed. | Chapter 19 Recovery and Fault Tolerance Copyright © 2008 Operating Systems, by Dhananjay Dhamdhere Introduction Faults, Failures, and Recovery Byzantine Faults and Agreement Protocols Recovery Fault Tolerance Techniques Resiliency 19. Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 Operating Systems, by Dhananjay Dhamdhere Faults, Failures, and Recovery A fault may damage the state of a system Error: a part of the system state that is erroneous Failure: unexpected behavior or situation 19. Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 Operating Systems, by Dhananjay Dhamdhere Faults, Failures, and Recovery (continued) Recovery: for reliable operation, system is restored to a consistent state, and operation resumed A recovery is performed when a failure is noticed 19. Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 Operating Systems, by Dhananjay Dhamdhere Classes of Faults Fault model: properties that determine the kinds . | Chapter 19 Recovery and Fault Tolerance Copyright © 2008 Operating Systems, by Dhananjay Dhamdhere Introduction Faults, Failures, and Recovery Byzantine Faults and Agreement Protocols Recovery Fault Tolerance Techniques Resiliency 19. Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 Operating Systems, by Dhananjay Dhamdhere Faults, Failures, and Recovery A fault may damage the state of a system Error: a part of the system state that is erroneous Failure: unexpected behavior or situation 19. Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 Operating Systems, by Dhananjay Dhamdhere Faults, Failures, and Recovery (continued) Recovery: for reliable operation, system is restored to a consistent state, and operation resumed A recovery is performed when a failure is noticed 19. Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 Operating Systems, by Dhananjay Dhamdhere Classes of Faults Fault model: properties that determine the kinds of errors/failures that might result from a fault Classes of faults: System fault system crash Amnesia and partial amnesia faults A fail-stop fault brings a system to a halt Process fault Byzantine faults: malicious or arbitrary actions Storage fault amnesia faults Communication fault nonamnesia faults 19. Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 Operating Systems, by Dhananjay Dhamdhere Overview of Recovery Techniques For non-Byzantine faults, recovery involves restoring system or application to a consistent state Involves reexecuting some actions 19. Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 Operating Systems, by Dhananjay Dhamdhere Overview of Recovery Techniques (continued) Recovery approaches are classified into: Backward recovery: resetting state of entity affected by fault to a prior state and resuming its operation Involves reexecution of some actions Forward recovery: repairing erroneous state of a system so system can .

Không thể tạo bản xem trước, hãy bấm tải xuống
TÀI LIỆU MỚI ĐĂNG
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.