CLC number: TP391
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2022-12-12
Cited: 0
Clicked: 1442
Citations: Bibtex RefMan EndNote GB/T7714
Ziliang WU, Wei CHEN, Yuxin MA, Tong XU, Fan YAN, Lei LV, Zhonghao QIAN, Jiazhi XIA. Explainable data transformation recommendation for automatic visualization[J]. Frontiers of Information Technology & Electronic Engineering, 2023, 24(7): 1007-1027.
@article{title="Explainable data transformation recommendation for automatic visualization",
author="Ziliang WU, Wei CHEN, Yuxin MA, Tong XU, Fan YAN, Lei LV, Zhonghao QIAN, Jiazhi XIA",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="24",
number="7",
pages="1007-1027",
year="2023",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2200409"
}
%0 Journal Article
%T Explainable data transformation recommendation for automatic visualization
%A Ziliang WU
%A Wei CHEN
%A Yuxin MA
%A Tong XU
%A Fan YAN
%A Lei LV
%A Zhonghao QIAN
%A Jiazhi XIA
%J Frontiers of Information Technology & Electronic Engineering
%V 24
%N 7
%P 1007-1027
%@ 2095-9184
%D 2023
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2200409
TY - JOUR
T1 - Explainable data transformation recommendation for automatic visualization
A1 - Ziliang WU
A1 - Wei CHEN
A1 - Yuxin MA
A1 - Tong XU
A1 - Fan YAN
A1 - Lei LV
A1 - Zhonghao QIAN
A1 - Jiazhi XIA
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 24
IS - 7
SP - 1007
EP - 1027
%@ 2095-9184
Y1 - 2023
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2200409
Abstract: automatic visualization generates meaningful visualizations to support data analysis and pattern finding for novice or casual users who are not familiar with visualization design. Current automatic visualization approaches adopt mainly aggregation and filtering to extract patterns from the original data. However, these limited data transformations fail to capture complex patterns such as clusters and correlations. Although recent advances in feature engineering provide the potential for more kinds of automatic data transformations, the auto-generated transformations lack explainability concerning how patterns are connected with the original features. To tackle these challenges, we propose a novel explainable recommendation approach for extended kinds of data transformations in automatic visualization. We summarize the space of feasible data transformations and measures on explainability of transformation operations with a literature review and a pilot study, respectively. A recommendation algorithm is designed to compute optimal transformations, which can reveal specified types of patterns and maintain explainability. We demonstrate the effectiveness of our approach through two cases and a user study.
[1]Abdi H, Williams LJ, 2010. Principal component analysis. WIRE Comput Stat, 2(4):433-459.
[2]Borzsony S, Kossmann D, Stocker K, 2001. The skyline operator. Proc 17th Int Conf on Data Engineering, p.421-430.
[3]Burkart N, Huber MF, 2021. A survey on the explainability of supervised machine learning. J Artif Intell Res, 70:245-317.
[4]Cao MQ, Liang J, Li MZ, et al., 2020. TDIVis: visual analysis of tourism destination images. Front Inform Technol Electron Eng, 21(4):536-557.
[5]Chakraborty S, Nagwani NK, 2014. Analysis and study of incremental DBSCAN clustering algorithm. https://arxiv.org/abs/1406.4754
[6]Chegini M, Bernard J, Cui J, et al., 2020. Interactive visual labelling versus active learning: an experimental comparison. Front Inform Technol Electron Eng, 21(4):524-535.
[7]Chen BY, Wu H, Mo W, et al., 2018. Autostacker: a compositional evolutionary learning system. Proc Genetic and Evolutionary Computation Conf, p.402-409.
[8]Chen SM, Andrienko N, Andrienko G, et al., 2020. LDA ensembles for interactive exploration and categorization of behaviors. IEEE Trans Visual Comput Graph, 26(9):2775-2792.
[9]Chen W, Zhang TY, Zhu HY, et al., 2021. Perspectives on cross-domain visual analysis of cyber-physical-social big data. Front Inform Technol Electron Eng, 22(12):1559-1564.
[10]Collins C, Andrienko N, Schreck T, et al., 2018. Guidance in the human-machine analytics process. Vis Inform, 2(3):166-180.
[11]Cui Z, Badam SK, Yalçin MA, et al., 2019. DataSite: proactive visual data exploration with computation of insight-based recommendations. Inform Visual, 18(2):251-267.
[12]Dang TN, Wilkinson L, 2014. ScagExplorer: exploring scatterplots by their scagnostics. Proc IEEE Pacific Visualization Symp, p.73-80.
[13]Demiralp Ç, Haas PJ, Parthasarathy S, et al., 2017. Foresight: recommending visual insights. Proc VLDB Endow, 10(12):1937-1940.
[14]Dey K, Shrivastava R, Kaushik S, et al., 2017. EmTaggeR: a word embedding based novel method for hashtag recommendation on Twitter. Proc IEEE Int Conf on Data Mining Workshops, p.1025-1032.
[15]Dibia V, Demiralp Ç, 2019. Data2Vis: automatic generation of data visualizations using sequence-to-sequence recurrent neural networks. IEEE Comput Graph Appl, 39(5):33-46.
[16]Ding R, Han S, Xu Y, et al., 2019. QuickInsights: quick and automatic discovery of insights from multi-dimensional data. Proc ACM SIGMOD Int Conf on Management of Data, p.317-332.
[17]Dong XB, Yu ZW, Cao WM, et al., 2020. A survey on ensemble learning. Front Comput Sci, 14(2):241-258.
[18]Du L, Gao F, Chen X, et al., 2021. TabularNet: a neural network architecture for understanding semantic structures of tabular data. Proc 27th ACM SIGKDD Conf on Knowledge Discovery & Data Mining, p.322-331.
[19]Fu P, Lin Z, Yuan FC, et al., 2018. Learning sentiment-specific word embedding via global sentiment representation. Proc AAAI Conf on Artificial Intelligence, p.4808-4815.
[20]Geng LQ, Hamilton HJ, 2006. Interestingness measures for data mining: a survey. ACM Comput Surv, 38(3):9.
[21]Giovannangeli L, Bourqui R, Giot R, et al., 2020. Toward automatic comparison of visualization techniques: application to graph visualization. Vis Inform, 4(2):86-98.
[22]Gleicher M, 2013. Explainers: expert explorations with crafted projections. IEEE Trans Visual Comput Graph, 19(12):2042-2051.
[23]Golfarelli M, Rizzi S, 2018. From star schemas to big data: 20+ years of data warehouse research. In: Flesca S, Greco S, Masciari E, et al. (Eds.), A Comprehensive Guide Through the Italian Database Research Over the Last 25 Years. Springer, Cham, p.93-107.
[24]He YY, Ganjam K, Lee K, et al., 2018a. Transform-data-by-example (TDE): extensible data transformation in Excel. Proc ACM SIGMOD Int Conf on Management of Data, p.1785-1788.
[25]He YY, Chu X, Ganjam K, et al., 2018b. Transform-data-by-example (TDE): an extensible search engine for data transformations. Proc VLDB Endow, 11(10):1165-1177.
[26]Heffetz Y, Vainshtein R, Katz G, et al., 2020. DeepLine: AutoML tool for pipelines generation using deep reinforcement learning and hierarchical actions filtering. Proc 26th ACM SIGKDD Conf on Knowledge Discovery & Data Mining, p.2103-2113.
[27]Hu K, Orghian D, Hidalgo CA, 2018. DIVE: a mixed-initiative system supporting integrated data exploration workflows. Proc Workshop on Human-in-the-Loop Data Analytics, Article 5.
[28]Hu K, Bakker MA, Li S, et al., 2019. VizML: a machine learning approach to visualization recommendation. Proc CHI Conf on Human Factors in Computing Systems, Article 128.
[29]Ilyas A, da Trindade JMF, Fernandez RC, et al., 2018. Extracting syntactical patterns from databases. Proc 34th IEEE Int Conf on Data Engineering, p.41-52.
[30]Ingram S, Munzner T, Irvine V, et al., 2010. DimStiller: workflows for dimensional analysis and reduction. Proc IEEE Symp on Visual Analytics Science and Technology, p.3-10.
[31]Jin ZJ, Anderson MR, Cafarella M, et al., 2017. Foofah: transforming data by example. Proc ACM Int Conf on Management of Data, p.683-698.
[32]Jin ZJ, He YY, Chauduri S, 2020. Auto-transform: learning-to-transform by patterns. Proc VLDB Endow, 13(12):2368-2381.
[33]Kanter JM, Veeramachaneni K, 2015. Deep feature synthesis: towards automating data science endeavors. Proc IEEE Int Conf on Data Science and Advanced Analytics, p.1-10.
[34]Katz G, Shin ECR, Song D, 2016. ExploreKit: automatic feature generation and selection. Proc 16th IEEE Int Conf on Data Mining, p.979-984.
[35]Kaul A, Maheshwary S, Pudi V, 2017. AutoLearn—automated feature generation and selection. Proc IEEE Int Conf on Data Mining, p.217-226.
[36]Khurana U, Turaga D, Samulowitz H, et al., 2016. Cognito: automated feature engineering for supervised learning. Proc 16th IEEE Int Conf on Data Mining Workshops, p.1304-1307.
[37]Khurana U, Samulowitz H, Turaga D, 2018. Ensembles with automated feature engineering. ICML AutoML Workshop.
[38]Kolouri S, Pope PE, Martin CE, et al., 2018. Sliced-Wasserstein auto-encoders. Proc 17th Int Conf on Learning Representations.
[39]Lam HT, Thiebaut JM, Sinn M, et al., 2017. One button machine for automating feature engineering in relational databases. https://arxiv.org/abs/1706.00327
[40]Law PM, Endert A, Stasko J, 2020. Characterizing automated data insights. Proc IEEE Visualization Conf, p.171-175.
[41]Li DQ, Mei HH, Shen Y, et al., 2018. ECharts: a declarative framework for rapid construction of web-based visualization. Vis Inform, 2(2):136-146.
[42]Li HT, Wang Y, Zhang SH, et al., 2022. KG4Vis: a knowledge graph-based approach for visualization recommendation. IEEE Trans Vis Comput Graph, 28(1):195-205.
[43]Lin H, Moritz D, Heer J, 2020. Dziban: balancing agency & automation in visualization design via anchored recommendations. Proc CHI Conf on Human Factors in Computing Systems, p.1-12.
[44]Liu JF, Xiong L, Pei J, et al., 2015. Finding Pareto optimal groups: group-based skyline. Proc VLDB Endow, 8(13):2086-2097.
[45]Liu SX, Andrienko G, Wu YC, et al., 2018. Steering data quality with visual analytics: the complexity challenge. Vis Inform, 2(4):191-197.
[46]Lu JH, Chen W, Ma YX, et al., 2017. Recent progress and trends in predictive visual analytics. Front Comput Sci, 11(2):192-207.
[47]Luo YY, Qin XD, Tang N, et al., 2018. DeepEye: towards automatic data visualization. Proc 34th IEEE Int Conf on Data Engineering, p.101-112.
[48]McInnes L, Healy J, Melville J, 2018. UMAP: uniform manifold approximation and projection for dimension reduction. https://arxiv.org/abs/1802.03426v2
[49]Mikolov T, Chen K, Corrado G, et al., 2013. Efficient estimation of word representations in vector space. Proc 1st Int Conf on Learning Representations.
[50]Moritz D, Wang CL, Nelson GL, et al., 2019. Formalizing visualization design knowledge as constraints: actionable and extensible models in Draco. IEEE Trans Visual Comput Graph, 25(1):438-448.
[51]Nargesian F, Samulowitz H, Khurana U, et al., 2017. Learning feature engineering for classification. Proc 26th Int Joint Conf on Artificial Intelligence, p.2529-2535.
[52]Natani G, Watanabe S, 2021. Knowledge graph-based data transformation recommendation engine. Proc IEEE Int Conf on Big Data, p.4617-4623.
[53]Ngatchou P, Zarei A, El-Sharkawi A, 2005. Pareto multi objective optimization. Proc 13th Int Conf on Intelligent Systems Application to Power Systems, p.84-91.
[54]Pan JC, Han DM, Guo FZ, et al., 2020. RCAnalyzer: visual analytics of rare categories in dynamic networks. Front Inform Technol Electron Eng, 21(4):491-506.
[55]Pandey A, L’Yi S, Wang QW, et al., 2022. GenoREC: a recommendation system for interactive genomics data visualization. IEEE Trans Visual Comput Graph, early access.
[56]Qian X, Rossi RA, Du F, et al., 2021. Learning to recommend visualizations from data. Proc 27th ACM SIGKDD Conf on Knowledge Discovery & Data Mining, p.1359-1369.
[57]Qin XD, Luo YY, Tang N, et al., 2018. DeepEye: an automatic big data visualization framework. Big Data Min Anal, 1(1):75-82.
[58]Qin XD, Luo YY, Tang N, et al., 2020. Making data visualization more efficient and effective: a survey. VLDB J, 29(1):93-117.
[59]Rattaphun M, Fang WC, Chiu CY, 2022. Attention on global-local representation spaces in recommender systems. IEEE Trans Comput Soc Syst, 9(5):1394-1405.
[60]Shen LX, Shen EY, Tai ZW, et al., 2021. TaskVis: task-oriented visualization recommendation. Proc Eurographics Conf on Visualization.
[61]Shi DQ, Xu XY, Sun FL, et al., 2021. Calliope: automatic visual data story generation from a spreadsheet. IEEE Trans Visual Comput Graph, 27(2):453-463.
[62]Siddiqui T, Lee J, Kim A, et al., 2017. Fast-forwarding to desired visualizations with zenvisage. Proc 8th Biennial Conf on Innovative Data Systems Research.
[63]Singh R, 2016. BlinkFill: semi-supervised programming by example for syntactic string transformations. Proc VLDB Endow, 9(10):816-827.
[64]Tang B, Han S, Yiu ML, et al., 2017. Extracting top-k insights from multi-dimensional data. Proc ACM Int Conf on Management of Data, p.1509-1524.
[65]Tatu A, Albuquerque G, Eisemann M, et al., 2009. Combining automated analysis and visualization techniques for effective exploration of high-dimensional data. Proc IEEE Symp on Visual Analytics Science and Technology, p.59-66.
[66]Tran B, Xue B, Zhang MJ, 2016. Genetic programming for feature construction and selection in classification on high-dimensional data. Memet Comput, 8(1):3-15.
[67]Vartak M, Madden S, Parameswaran A, et al., 2014. SeeDB: automatically generating query visualizations. Proc VLDB Endow, 7(13):1581-1584.
[68]Wang HN, Liu N, Zhang YY, et al., 2020. Deep reinforcement learning: a survey. Front Inform Technol Electron Eng, 21(12):1726-1744.
[69]Wang Y, Sun ZD, Zhang HD, et al., 2019. DataShot: automatic generation of fact sheets from tabular data. IEEE Trans Visual Comput Graph, 26(1):895-905.
[70]Warren RH, Tompa FW, 2006. Multi-column substring matching for database schema translation. Proc 32nd Int Conf on Very Large Data Bases, p.331-342.
[71]Wen Z, Zhou MX, 2008a. Evaluating the use of data transformation for information visualization. IEEE Trans Vis Comput Graph, 14(6):1309-1316.
[72]Wen Z, Zhou MX, 2008b. An optimization-based approach to dynamic data transformation for smart visualization. Proc 13th Int Conf on Intelligent User Interfaces, p.70-79.
[73]Wilkinson L, Anand A, Grossman R, 2005. Graph-theoretic scagnostics. Proc IEEE Symp on Information Visualization, p.157-164.
[74]Wongsuphasawat K, Moritz D, Anand A, et al., 2016. Voyager: exploratory analysis via faceted browsing of visualization recommendations. IEEE Trans Visual Comput Graph, 22(1):649-658.
[75]Wongsuphasawat K, Qu ZN, Moritz D, et al., 2017. Voyager 2: augmenting visual analysis with partial view specifications. Proc CHI Conf on Human Factors in Computing Systems, p.2648-2659.
[76]Wu AY, Wang Y, Zhou MY, et al., 2022. MultiVision: designing analytical dashboards with deep learning based recommendation. IEEE Trans Visual Comput Graph, 28(1):162-172.
[77]Xia JZ, Zhang YH, Ye H, et al., 2020. SuPoolVisor: a visual analytics system for mining pool surveillance. Front Inform Technol Electron Eng, 21(4):507-523.
[78]Yan C, He YY, 2020. Auto-suggest: learning-to-recommend data preparation steps using data science notebooks. Proc ACM SIGMOD Int Conf on Management of Data, p.1539-1554.
[79]Yao QM, Wang MS, Hugo JE, et al., 2018. Taking human out of learning applications: a survey on automated machine learning. https://arxiv.org/abs/1810.13306v1
[80]Zeng ZH, Moh P, Du F, et al., 2022. An evaluation-focused framework for visualization recommendation algorithms. IEEE Trans Visual Comput Graph, 28(1):346-356.
[81]Zhou MY, Tao W, Ji PX, et al., 2020. Table2Analysis: modeling and recommendation of common analysis patterns for multi-dimensional data. Proc 34th AAAI Conf on Artificial Intelligence, p.320-328.
[82]Zhou MY, Li QT, He XY, et al., 2021. Table2Charts: recommending charts by learning shared table representations. Proc 27th ACM SIGKDD Conf on Knowledge Discovery & Data Mining, p.2389-2399.
[83]Zhu EK, He YY, Chaudhuri S, 2017. Auto-Join: joining tables by leveraging transformations. Proc VLDB Endow, 10(10):1034-1045.
[84]Zhu SJ, Sun GD, Jiang Q, et al., 2020. A survey on automatic infographics and visualization recommendations. Vis Inform, 4(3):24-40.
[85]Zöller MA, Huber MF, 2021. Benchmark and survey of automated machine learning frameworks. J Artif Intell Res, 70:409-472.
Open peer comments: Debate/Discuss/Question/Opinion
<1>