Fault Detection and Tolerance in Cluster of Workstations using Message Passing Interface
A Cluster of Workstations (COW) is network based multi-computer system aimed to replace supercomputers. A cluster of workstations works on Divisible Load Theory (DLT) according to which a job is divided into n subtasks and delegated to n workstations in the COW architecture. To get the job...
Saved in:
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
Sir Syed University of Engineering and Technology, Karachi.
2011-12-01
|
Series: | Sir Syed University Research Journal of Engineering and Technology |
Subjects: | |
Online Access: | http://www.sirsyeduniversity.edu.pk/ssurj/rj/index.php/ssurj/article/view/72 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | A Cluster of Workstations (COW) is network based multi-computer system aimed to replace supercomputers. A cluster of workstations works on Divisible Load Theory (DLT) according to which a job is divided into n subtasks and delegated to n workstations in the COW architecture. To get the job completed, all subtasks must be completed. Therefore, for satisfactory job completion, all workstations must be functional. However, a faulty node can suspend the overall job completion task until and unless some fault avoidance and correction measures are taken. This paper presents a fault detection and fault tolerant algorithm which will use Message Passing Interface (MPI) to identify faulty workstations and transfer the subtask being performed by them to a normally working workstation. The assigned workstations will continue their original subtasks in addition to assigned subtasks on time sharing basis.
|
---|---|
ISSN: | 1997-0641 2415-2048 |