Full Text:   <3106>

CLC number: TP391

On-line Access: 

Received: 2004-08-31

Revision Accepted: 2004-11-10

Crosschecked: 0000-00-00

Cited: 6

Clicked: 6292

Citations:  Bibtex RefMan EndNote GB/T7714

-   Go to

Article info.
Open peer comments

Journal of Zhejiang University SCIENCE A 2005 Vol.6 No.7 P.676-682


Policy driven and multi-agent based fault tolerance for Web services

Author(s):  TANG Jing-fan, ZHOU Bo, HE Zhi-jun

Affiliation(s):  School of Computer Science & Technology, Zhejiang University, Hangzhou 310027, China

Corresponding email(s):   jingfan_t@zju.edu.cn, bzhou@zju.edu.cn, hezj@zju.edu.cn

Key Words:  Policy driven, Multi-agent based, Fault tolerance, Web service

TANG Jing-fan, ZHOU Bo, HE Zhi-jun. Policy driven and multi-agent based fault tolerance for Web services[J]. Journal of Zhejiang University Science A, 2005, 6(7): 676-682.

@article{title="Policy driven and multi-agent based fault tolerance for Web services",
author="TANG Jing-fan, ZHOU Bo, HE Zhi-jun",
journal="Journal of Zhejiang University Science A",
publisher="Zhejiang University Press & Springer",

%0 Journal Article
%T Policy driven and multi-agent based fault tolerance for Web services
%A TANG Jing-fan
%A HE Zhi-jun
%J Journal of Zhejiang University SCIENCE A
%V 6
%N 7
%P 676-682
%@ 1673-565X
%D 2005
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.2005.A0676

T1 - Policy driven and multi-agent based fault tolerance for Web services
A1 - TANG Jing-fan
A1 - ZHOU Bo
A1 - HE Zhi-jun
J0 - Journal of Zhejiang University Science A
VL - 6
IS - 7
SP - 676
EP - 682
%@ 1673-565X
Y1 - 2005
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.2005.A0676

This paper proposes a policy driven and multi-agent based model to enhance the fault tolerance and recovery capabilities of web services in distributed environment. The evaluation function of fault specifications and the corresponding handling mechanisms of the services are both defined in policies, which are expressed in XML. During the implementation of the services, the occurrences of faults are monitored by the service monitor agent through the local knowledge on the faults. Such local knowledge is dynamically generated by the service policy agent through querying and parsing the service policies from the service policies repository. When the fault occurs, the service process agent will focus on the process of fault handling and service recovery, which will be directed with the actions defined in the policies upon the specific conditions. Such a policy driven and multi-agent based fault handling approach can address the issues of flexibility, automation and availability.

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article


[1] Alonso, G., Hagen, C., Agrawal, D., Abbadi, A.E., Mohan, C., 2000. Enhancing the fault tolerance of workflow management systems. IEEE Concurrency, 8(3):74-81.

[2] Avizienis, A., 1985. The N-version approach to fault-tolerant software. IEEE Transactions on Software Engineering, SE-11(12):1491-1501.

[3] Bivens, A., Gao, L., Hulber, M., Szymanski, B., 1999. Agent-Based Network Monitoring. Proceedings of the 3rd International Conference on Autonomous Agents, p.41-53.

[4] Chang, W., 2001. A Resource Efficient Scheme for Network Service Recovery in a Cluster. IEEE 2001, p.1087-1091.

[5] Clematis, A., Deconinck, G., Gianuzzi, V., 1998. A Flexible State-saving Library for Message-passing Systems. Proc. 6th Euromicro Workshop on Parallel and Distributed Processing, IEEE Comp. Soc. Press.

[6] Ding, Y., Malaka, R., 2000. An Agent-based Architecture for Resource-Aware Mobile Computing. Proc. Intelligent Interactive Assistance and Mobile Multimedia Computing (IMC2000).

[7] El-Darieby, M., Petriu, D., Rlia, J., 2003. Hierarchical End-to-End Service Recovery. Proceedings of the 8th IEEE Symposium on Integrated Network Management (IM’03), p.649-661.

[8] Hong, L., Dong, B., Wei, D., 2002. A Policy-Based Solution for Management of Enhanced Network Services. Proceedings of IEEE TENCON’02, p.1684-1687.

[9] Hwang, S., Kesselman, C., 2003. Grid Workflow: A Flexible Failure Handling Framework for the Grid. Proceedings of 12th IEEE International Symposium on High Performance Distributed Computing (HPDC’03).

[10] Katchabaw, M.J., Lutfiyya, H.L., Marshall, A.D., Bauer, M.A., 1996. Policy-Driven Fault Management in Distributed Systems. Proceedings of the Seventh International Symposium on Software Reliability Engineering (ISSRE’ 96).

[11] Lee, B., Weissman, J., 2001. Dynamic Replica Management in the Service Grid. 10th IEEE International Symposium on High Performance Distributed Computing (HPDC-10’01), p.433-434.

[12] Liabotis, I., Prnjat, O., Sacks, L., 2001. Policy-based Resource Management for Application Level Active Networks. Second IEEE Latin American Network Operations and Management Symposium.

[13] Liu, Z.L., Zhu, M.L., Jiang, M., Wu, T.F., 2003. An overview of QoS protocols and architecture. Journal of Zhejiang University (Engineering Science), 37(3):288-294 (in Chinese).

[14] Overeinder, B.J., Wijngaards, N.J.E., van Steen, M., Brazier, F.M.T., 2002. Multi-Agent Support for Internet-Scale Grid Management. Proceedings of the AISB’02 Symposium on AI and Grid Computing, p.18-22.

[15] Sacks, L., Prnjat, O., Liabotis, I., Olukemi, T., Ching, A., Fisher, M., Mckee, P., Georgalas, N., Yoshii, H., 2003. Active robust resource management in cluster computing using policies. Journal of Network and Systems Management, Special Issue on Policy Based Management of Networks and Services, 11(3):329-350.

[16] Seilonen, I., Appelqvist, P., Halme, A., Koskinen, K., 2002. Agent-Based Approach to Fault-tolerance in Process Automation Systems. Proceedings of the 3rd International Symposium on Robotics and Automation.

[17] Sloman, M., Lupu, E., 2002. Security and management policy specification. IEEE Network Special Issue on Policy, 16(2):10-19.

[18] Yang, K., Gailis, A., Todd, C., 2002. Policy-Based Active Grid Management Architecture. Proceedings of 10th IEEE International Conference on Networks (ICON02), p.243-248.

[19] Zhang, Y., Chakrabarty, K., 2003. Fault Recovery Based on Checkpointing for Hard Real-Time Embedded Systems. Proceedings of the 18th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems, p.320-327.

Open peer comments: Debate/Discuss/Question/Opinion


Please provide your name, email address and a comment

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2024 Journal of Zhejiang University-SCIENCE