DISTRIBUTED AND PARALLEL SYSTEMSCLUSTER AND GRID COMPUTING 2005 phần 6

Trong Condor CG thế hệ thứ hai đổ xô bị vô hiệu hóa, do đó một kỹ thuật mới được giới thiệu bằng cách nào đó gián đoạn toàn bộ ứng dụng song song và mang nó lên lịch Condor với các tập tin trạm kiểm soát. Cơ chế này cho phép một ứng dụng song song loại bỏ hoàn toàn từ các hồ bơi Condor sau khi checkpointing | PROCESS MIGRATION IN CLUSTERS AND CLUSTER GRIDS József Kovács MTA SZTAKI Parallel and Distributed Systems Laboratory H1518Budapest 63 Hungary smith@ Abstract The paper describes two working modes of the parallel program checkpointing mechanism of P-GRADE and its potential application in the nationwide Hungarian ClusterGrid CG project. The first generation architecture of ClusterGrid enables the migration of parallel processes among friendly Condor pools. In the second generation CG Condor flocking is disabled so a new technique is introduced to somehow interrupt the whole parallel application and take it out of the Condor scheduler with checkpoint files. The latter mechanism enables a parallel application to be completely removed from the Condor pool after checkpointing and to be resumed under another non-friendly Condor pool after resubmission. The checkpointing mechanism can automatically without user interaction support generic PVM programs created by the P-GRADE Grid programming environment. Keywords message-passing parallel programs graphical programming environment checkpointing migration cluster grid pvm condor 1. Introduction Process migration in distributed systems is a special event when a process running on a resource is redeployed on another one in a way that the migration does not cause any change in the process execution. In order to provide this capability special techniques are necessary to save the whole memory image of the target process and to reconstruct it. This technique is called checkpointing. During checkpointing a tool suspends the execution of the process collects all those internal status information necessary for resumption and terminates the The work presented in this paper has been supported by the Hungarian Chemistrygrid OMFB-00580 2003 project the Hungarian Supergrid OMFB-00728 2002 project the Hungarian IHM 4671 1 2003 project and the Hungarian Research Fund No. T042459. 104 DISTRIBUTED AND PARALLEL SYSTEMS .

Không thể tạo bản xem trước, hãy bấm tải xuống
TÀI LIỆU MỚI ĐĂNG
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.