Full Text:  <3345>

CLC number: TP391.4

On-line Access: 2024-08-27

Received: 2023-10-17

Revision Accepted: 2024-05-08

Crosschecked: 2023-10-29

Cited: 0

Clicked: 1706

Citations:  Bibtex RefMan EndNote GB/T7714

 ORCID:

Wei Chen

https://orcid.org/0000-0002-8365-4741

Haiyang ZHU

https://orcid.org/0000-0002-4782-5654

-   Go to

Article info.
Open peer comments

Frontiers of Information Technology & Electronic Engineering 

Accepted manuscript available online (unedited version)


A visual analysis approach for data imputation via multi-party tabular data correlation strategies


Author(s):  Haiyang ZHU, Dongming HAN, Jiacheng PAN, Yating WEI, Yingchaojie FENG, Luoxuan WENG, Ketian MAO, Yuankai XING, Jianshu LV, Qiucheng WAN, Wei CHEN

Affiliation(s):  The State Key Lab of CAD & CG, Zhejiang University, Hangzhou 310058, China; more

Corresponding email(s):  hnsyzhy@zju.edu.cn, chenvis@zju.edu.cn

Key Words:  Data governance; Data incompleteness; Data imputation; Data visualization; Interactive visual analysis


Share this article to: More <<< Previous Paper|Next Paper >>>


Abstract: 
Data imputation is an essential pre-processing task for data governance, aimed at filling in incomplete data. However, conventional data imputation methods can only partly alleviate data incompleteness using isolated tabular data, and they fail to achieve the best balance between accuracy and efficiency. In this paper, we present a novel visual analysis approach for data imputation. We develop a multi-party tabular data association strategy that uses intelligent algorithms to identify similar columns and establish column correlations across multiple tables. Then, we perform the initial imputation of incomplete data using correlated data entries from other tables. Additionally, we develop a visual analysis system to refine data imputation candidates. Our interactive system combines the multi-party data imputation approach with expert knowledge, allowing for a better understanding of the relational structure of the data. This significantly enhances the accuracy and efficiency of data imputation, thereby enhancing the quality of data governance and the intrinsic value of data assets. Experimental validation and user surveys demonstrate that this method supports users in verifying and judging the associated columns and similar rows using their domain knowledge.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2025 Journal of Zhejiang University-SCIENCE