Frontiers of Information Technology & Electronic Engineering  2015 Vol.16 No.6 P.449-456


A sampling method based on URL clustering for fast web accessibility evaluation

Author(s):  Meng-ni Zhang, Can Wang, Jia-jun Bu, Zhi Yu, Yu Zhou, Chun Chen

Affiliation(s):  College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China

Corresponding email(s):   mengnier@zju.edu.cn, wcan@zju.edu.cn

Key Words:  Page sampling, URL clustering, Web accessibility evaluation

Meng-ni Zhang, Can Wang, Jia-jun Bu, Zhi Yu, Yu Zhou, Chun Chen. A sampling method based on URL clustering for fast web accessibility evaluation[J]. Frontiers of Information Technology & Electronic Engineering, 2015, 16(6): 449-456.

When evaluating the accessibility of a large website, we rely on sampling methods to reduce the cost of evaluation. This may lead to a biased evaluation when the distribution of checkpoint violations in a website is skewed and the selected samples do not provide a good representation of the entire website. To improve sampling quality, stratified sampling methods first cluster web pages in a site and then draw samples from each cluster. In existing stratified sampling methods, however, all the pages in a website need to be analyzed for clustering, causing huge I/O and computation costs. To address this issue, we propose a novel page sampling method based on URL clustering for web accessibility evaluation, namely URLSamp. Using only the URL information for stratified page sampling, URLSamp can efficiently scale to large websites. Meanwhile, by exploiting similarities in URL patterns, URLSamp cluster pages by their generating scripts and can thus effectively detect accessibility problems from web page templates. We use a data set of 45 web sites to validate our method. Experimental results show that our URLSamp method is both effective and efficient for web accessibility evaluation.

The paper is very interesting, and the theme is relevant. The authors address problems that developers have to face day by day. The proposed approach is simple and quite good.




