CLC number: TP311
On-line Access: 2020-04-21
Received: 2019-10-06
Revision Accepted: 2020-01-17
Crosschecked: 2020-01-30
Cited: 0
Clicked: 4888
Mohammad Chegini, Jrgen Bernard, Jian Cui, Fatemeh Chegini, Alexei Sourin, Keith Andrews, Tobias Schreck. Interactive visual labelling versus active learning: an experimental comparison[J]. Frontiers of Information Technology & Electronic Engineering, 2020, 21(4): 524-535.
@article{title="Interactive visual labelling versus active learning: an experimental comparison",
author="Mohammad Chegini, Jrgen Bernard, Jian Cui, Fatemeh Chegini, Alexei Sourin, Keith Andrews, Tobias Schreck",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="21",
number="4",
pages="524-535",
year="2020",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1900549"
}
%0 Journal Article
%T Interactive visual labelling versus active learning: an experimental comparison
%A Mohammad Chegini
%A Jrgen Bernard
%A Jian Cui
%A Fatemeh Chegini
%A Alexei Sourin
%A Keith Andrews
%A Tobias Schreck
%J Frontiers of Information Technology & Electronic Engineering
%V 21
%N 4
%P 524-535
%@ 2095-9184
%D 2020
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1900549
TY - JOUR
T1 - Interactive visual labelling versus active learning: an experimental comparison
A1 - Mohammad Chegini
A1 - Jrgen Bernard
A1 - Jian Cui
A1 - Fatemeh Chegini
A1 - Alexei Sourin
A1 - Keith Andrews
A1 - Tobias Schreck
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 21
IS - 4
SP - 524
EP - 535
%@ 2095-9184
Y1 - 2020
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1900549
Abstract: Methods from supervised machine learning allow the classification of new data automatically and are tremendously helpful for data analysis. The quality of supervised maching learning depends not only on the type of algorithm used, but also on the quality of the labelled dataset used to train the classifier. Labelling instances in a training dataset is often done manually relying on selections and annotations by expert analysts, and is often a tedious and time-consuming process. active learning algorithms can automatically determine a subset of data instances for which labels would provide useful input to the learning process. interactive visual labelling techniques are a promising alternative, providing effective visual overviews from which an analyst can simultaneously explore data records and select items to a label. By putting the analyst in the loop, higher accuracy can be achieved in the resulting classifier. While initial results of interactive visual labelling techniques are promising in the sense that user labelling can improve supervised learning, many aspects of these techniques are still largely unexplored. This paper presents a study conducted using the mVis tool to compare three interactive visualisations, similarity map, scatterplot matrix (SPLOM), and parallel coordinates, with each other and with active learning for the purpose of labelling a multivariate dataset. The results show that all three interactive visual labelling techniques surpass active learning algorithms in terms of classifier accuracy, and that users subjectively prefer the similarity map over SPLOM and parallel coordinates for labelling. Users also employ different labelling strategies depending on the visualisation used.
[1]Attenberg J, Provost F, 2010. Inactive learning?: difficulties employing active learning in practice. ACM SIGKDD Explor Newslett, 12(2):36-41.
[2]Bernard J, Hutter M, Zeppelzauer M, et al., 2018a. Comparing visual-interactive labeling with active learning: an experimental study. IEEE Trans Vis Comput Graph, 24(1):298-308.
[3]Bernard J, Zeppelzauer M, Lehmann M, et al., 2018b. Towards user-centered active learning algorithms. Comput Graph Forum, 37(3):121-132.
[4]Bernard J, Zeppelzauer M, Sedlmair M, et al., 2018c. VIAL: a unified process for visual interactive labeling. Vis Comput, 34(9):1189-1207.
[5]Bishop CM, 2006. Pattern Recognition and Machine Learning. Springer, Berlin, Germany.
[6]Ceneda D, Gschwandtner T, May T, et al., 2016. Characterizing guidance in visual analytics. IEEE Trans Vis Comput Graph, 23(1):111-120.
[7]Chegini M, Shao L, Gregor R, et al., 2018. Interactive visual exploration of local patterns in large scatterplot spaces. Comput Graph Forum, 37(3):99-109.
[8]Chegini M, Bernard J, Berger P, et al., 2019a. Interactive labelling of a multivariate dataset for supervised machine learning using linked visualisations, clustering, and active learning. Vis Inform, 3(1):9-17.
[9]Chegini M, Bernard J, Shao L, et al., 2019b. mVis in the wild: pre-study of an interactive visual machine learning system for labelling. IEEE Vis 2019 Workshop on Evaluation of Interactive Visual Machine Learning Systems, p.1-4.
[10]Chegini M, Sourin A, Andrews K, et al., 2019c. Eye-tracking based adaptive parallel coordinates. 12th ACM SIGGRAPH Conf and Exhibition on Computer Graphics and Interactive Techniques in Asia, Article 44.
[11]Culotta A, McCallum A, 2005. Reducing labeling effort for structured prediction tasks. National Conf on Artificial Intelligence, p.746-751.
[12]Hall M, Frank E, Holmes G, et al., 2009. The weka data mining software: an update. ACM SIGKDD Explor Newslett, 11(1):10-18.
[13]Heimerl F, Koch S, Bosch H, et al., 2012. Visual classifier training for text document retrieval. IEEE Trans Vis Comput Graph, 18(12):2839-2848.
[14]Ho TK, 1995. Random decision forests. 3rd Int Conf on Document Analysis and Recognition, p.278-282.
[15]Höferlin B, Netzel R, Höferlin M, et al., 2012. Inter-active learning of ad-hoc classifiers for video visual analytics. IEEE Conf on Visual Analytics Science and Technology, p.23-32.
[16]Inselberg A, 1985. The plane with parallel coordinates. Vis Comput, 1(2):69-91.
[17]Jolliffe I, 2002. Principal Component Analysis. Springer, New York, USA.
[18]Kottke D, Calma A, Huseljic D, et al., 2017. Challenges of reliable, realistic and comparable active learning evaluation. Proc Interactive Adaptive Learning Workshop, p.1-14.
[19]Kruskal JB, 1964. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika, 29(1):1-27.
[20]LeCun Y, Bottou L, Bengio Y, et al., 1998. Gradient-based learning applied to document recognition. Proc IEEE, 86(11):2278-2324.
[21]van der Maaten L, Hinton G, 2008. Visualizing data using t-SNE. J Mach Learn Res, 9(2018):2579-2605.
[22]Scheffer T, Decomain C, Wrobel S, 2001. Active hidden Markov models for information extraction. Int Conf on Advances in Intelligent Data Analysis, p.309-318.
[23]Schreck T, von Landesberger T, Bremm S, 2010. Techniques for precision-based visual analysis of projected data. Inform Vis, 9(3):181-193.
[24]Settles B, 2009. Active learning literature survey. Technical Report No. 1648, Department of Computer Sciences, University of Wisconsin-Madison, WI, USA.
[25]Settles B, Craven M, 2008. An analysis of active learning strategies for sequence labeling tasks. Proc Conf on Empirical Methods in Natural Language Processing, p.1070-1079.
[26]Shao L, Mahajan A, Schreck T, et al., 2017. Interactive regression lens for exploring scatter plots. Comput Graph Forum, 36(3):157-166.
[27]Wu Y, Kozintsev I, Bouguet JY, et al., 2006. Sampling strategies for active learning in personal photo retrieval. IEEE Int Conf on Multimedia and Expo, p.529-532.
Open peer comments: Debate/Discuss/Question/Opinion
<1>