Full Text:   <1093>

Summary:  <415>

CLC number: TP391.3

On-line Access: 2015-06-04

Received: 2014-11-02

Revision Accepted: 2015-04-21

Crosschecked: 2015-05-18

Cited: 3

Clicked: 2275

Citations:  Bibtex RefMan EndNote GB/T7714

 ORCID:

Meng-ni Zhang

http://orcid.org/0000-0002-7547-0168

Can Wang

http://orcid.org/0000-0002-5890-4307

-   Go to

Article info.
Open peer comments

Frontiers of Information Technology & Electronic Engineering  2015 Vol.16 No.6 P.449-456

http://doi.org/10.1631/FITEE.1400377


A sampling method based on URL clustering for fast web accessibility evaluation


Author(s):  Meng-ni Zhang, Can Wang, Jia-jun Bu, Zhi Yu, Yu Zhou, Chun Chen

Affiliation(s):  College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China

Corresponding email(s):   mengnier@zju.edu.cn, wcan@zju.edu.cn

Key Words:  Page sampling, URL clustering, Web accessibility evaluation


Meng-ni Zhang, Can Wang, Jia-jun Bu, Zhi Yu, Yu Zhou, Chun Chen. A sampling method based on URL clustering for fast web accessibility evaluation[J]. Frontiers of Information Technology & Electronic Engineering, 2015, 16(6): 449-456.

@article{title="A sampling method based on URL clustering for fast web accessibility evaluation",
author="Meng-ni Zhang, Can Wang, Jia-jun Bu, Zhi Yu, Yu Zhou, Chun Chen",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="16",
number="6",
pages="449-456",
year="2015",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1400377"
}

%0 Journal Article
%T A sampling method based on URL clustering for fast web accessibility evaluation
%A Meng-ni Zhang
%A Can Wang
%A Jia-jun Bu
%A Zhi Yu
%A Yu Zhou
%A Chun Chen
%J Frontiers of Information Technology & Electronic Engineering
%V 16
%N 6
%P 449-456
%@ 2095-9184
%D 2015
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1400377

TY - JOUR
T1 - A sampling method based on URL clustering for fast web accessibility evaluation
A1 - Meng-ni Zhang
A1 - Can Wang
A1 - Jia-jun Bu
A1 - Zhi Yu
A1 - Yu Zhou
A1 - Chun Chen
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 16
IS - 6
SP - 449
EP - 456
%@ 2095-9184
Y1 - 2015
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1400377


Abstract: 
When evaluating the accessibility of a large website, we rely on sampling methods to reduce the cost of evaluation. This may lead to a biased evaluation when the distribution of checkpoint violations in a website is skewed and the selected samples do not provide a good representation of the entire website. To improve sampling quality, stratified sampling methods first cluster web pages in a site and then draw samples from each cluster. In existing stratified sampling methods, however, all the pages in a website need to be analyzed for clustering, causing huge I/O and computation costs. To address this issue, we propose a novel page sampling method based on URL clustering for web accessibility evaluation, namely URLSamp. Using only the URL information for stratified page sampling, URLSamp can efficiently scale to large websites. Meanwhile, by exploiting similarities in URL patterns, URLSamp cluster pages by their generating scripts and can thus effectively detect accessibility problems from web page templates. We use a data set of 45 web sites to validate our method. Experimental results show that our URLSamp method is both effective and efficient for web accessibility evaluation.

The paper is very interesting, and the theme is relevant. The authors address problems that developers have to face day by day. The proposed approach is simple and quite good.

基于URL聚类的快速无障碍检测抽样方法

目的:大多数残疾人士上网都会遇到各种障碍。为减少上网障碍,对网站进行无障碍检测评估是十分必要的。鉴于大部分网站具有海量网页且某些网页需涉及人工检测,通常利用抽样算法对网站进行无障碍检测评估。已有的分层抽样算法I/O开销和计算代价大。为解决这一问题,本文提出一种基于URL聚类的抽样算法。仅利用URL信息进行聚类,然后抽样,最终实现快速的无障碍检测和评估。
创新点:大部分网站的网页内容和URL信息都是由有限数量的模板生成的。因此这些网站的无障碍问题都可以追溯到模板。鉴于同一模板生成的网页具有相似结构和URL模式,可基于URL相似性对网页进行聚类,将同一模板的URL聚到一类中。本文所提抽样算法仅利用网页URL模式信息,无需存储全部网页内容,从而减少I/O开销和计算代价,实现快速的无障碍检测和评估。
方法:利用模板生成的网页具有相似URL模式,将URL进行聚类以实现同一模板生成的网页聚在一类中。具体过程:首先,解析爬取到的URL以获取候选URL分词和模板URL分词;然后利用最小长度描述原则进行URL聚类(算法1);最后在每类中按照抽样比例进行抽样。
结论:不同于现有的分层抽样算法,本文提出的抽样算法仅利用URL模式信息将网页进行聚类,可减少大量I/O开销和计算代价。

关键词:>网页抽样;URL聚类;无障碍检测

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Abou-Zahra, S., 2008. Web accessibility evaluation. In: Harper, S., Yesilada, Y. (Eds.), Web Accessibility: a Foundation for Research. Springer, London, p.79-106.

[2]Astbrink, G., 2001. The legislative impact in Australia on universal access in telecommunications. Proc. Universal Access in Human-Computer Interaction Conf., p.1042-1046.

[3]Blanco, L., Dalvi, N., Machanavajjhala, A., 2011. Highly efficient algorithms for structural clustering of large websites. Proc. 20th Int. Conf. on World Wide Web, p.437-446.

[4]Brajnik, G., 2006. An Accessibility Evaluation Method Based on Barrier Walkthrough. Available from http://www.dimi.uniud.it/giorgio/projects/bw.

[5]Brajnik, G., 2008. A comparative test of web accessibility evaluation methods. Proc. 10th Int. ACM SIGACCESS Conf. on Computers and Accessibility, p.113-120.

[6]Brajnik, G., Lomuscio, R., 2007. SAMBA: a semi-automatic method for measuring barriers of accessibility. Proc. 9th Int. ACM SIGACCESS Conf. on Computers and Accessibility, p.43-50.

[7]Brajnik, G., Mulas, A., Pitton, C., 2007. Effects of sampling methods on web accessibility evaluations. Proc. 9th Int. ACM SIGACCESS Conf. on Computers and Accessibility, p.59-66.

[8]Disability Rights Commission, 2004a. Formal Investigation Report: Web Accessibility.

[9]Disability Rights Commission, 2004b. The Web: Access and Inclusion for Disabled People—a Formal Investigation. The Stationery Office, UK.

[10]Ellison, J., 2004. Assessing the accessibility of fifty United States government web pages: using Bobby to check on Uncle Sam. First Monday, 9(7).

[11]Hanson, V.L., Richards, J.T., 2004. A web accessibility service: update and findings. Proc. 6th Int. ACM SIGACCESS Conf. on Computers and Accessibility, p.169-176.

[12]Hanson, V.L., Richards, J.T., 2013. Progress on website accessibility. ACM Trans. Web, 7(1), Article 2.

[13]Henzinger, M.R., Heydon, A., Mitzenmacher, M., et al., 2000. On near-uniform URL sampling. Comput. Netw., 33(1-6):295-308.

[14]Hong, S., Katerattanakul, P., Joo, S.J., 2008. Evaluating government website accessibility: a comparative study. Int. J. Inform. Technol. Dec. Mak., 7(3):491-515.

[15]Kawanaka, S., Borodin, Y., Bigham, J.P., et al., 2008. Accessibility commons: a metadata infrastructure for web accessibility. Proc. 10th Int. ACM SIGACCESS Conf. on Computers and Accessibility, p.153-160.

[16]King, M., Thatcher, J.W., Bronstad, P.M., et al., 2005. Managing usability for people with disabilities in a large web presence. IBM Syst. J., 44(3):519-535.

[17]Mankoff, J., Fait, H., Tran, T., 2005. Is your web page accessible?: a comparative study of methods for assessing web page accessibility for the blind. Proc. SIGCHI Conf. on Human Factors in Computing Systems, p.41-50.

[18]Marincu, C., McMullin, B., 2004. A comparative assessment of web accessibility and technical standards conformance in four EU states. First Monday, 9(7).

[19]Pernice, K., Nielsen, J., 2001a. Beyond ALT Text: Making the Web Easy to Use for Users with Disabilities. Technical Report, Nielsen Norman Group, USA.

[20]Pernice, K., Nielsen, J., 2001b. How to Conduct Usability Studies for Accessibility. Technical Report, Nielsen Norman Group, USA.

[21]Rusmevichientong, P., Pennock, D.M., Lawrence, S., et al., 2001. Methods for sampling pages uniformly from the World Wide Web. Proc. AAAI Fall Symp. on Using Uncertainty within Computation, p.121-128.

[22]Sullivan, T., Matson, R., 2000. Barriers to use: usability and content accessibility on the web’s most popular sites. Proc. ACM Conf. on Universal Usability, p.139-144.

[23]Ulltveit-Moe, N., Snaprud, M., Nietzio, A., et al., 2006. Early Results from Automatic Accessibility Benchmarking of Public European Web Sites from the European Internet Accessibility Observatory (EIAO). Available from http://mortengoodwin.net/publicationfiles/dfa2006.pdf.

[24]Velleman, E., Velasco, C., Snaprud, M., et al., 2006. D-WAB4 Unified Web Evaluation Methodology (UWEM 1.0). Technical Report, WAB Cluster.

[25]Vigo, M., Brajnik, G., 2011. Automatic web accessibility metrics: where we are and where we can go. Interact. Comput., 23(2):137-155.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - Journal of Zhejiang University-SCIENCE