CLC number: TP391.4
On-line Access: 2021-07-20
Received: 2020-03-27
Revision Accepted: 2020-06-07
Crosschecked: 2021-07-05
Cited: 0
Clicked: 5478
Citations: Bibtex RefMan EndNote GB/T7714
Sihan Zhu, Jian Pu. A self-supervised method for treatment recommendation in sepsis[J]. Frontiers of Information Technology & Electronic Engineering, 2021, 22(7): 926-939.
@article{title="A self-supervised method for treatment recommendation in sepsis",
author="Sihan Zhu, Jian Pu",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="22",
number="7",
pages="926-939",
year="2021",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2000127"
}
%0 Journal Article
%T A self-supervised method for treatment recommendation in sepsis
%A Sihan Zhu
%A Jian Pu
%J Frontiers of Information Technology & Electronic Engineering
%V 22
%N 7
%P 926-939
%@ 2095-9184
%D 2021
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2000127
TY - JOUR
T1 - A self-supervised method for treatment recommendation in sepsis
A1 - Sihan Zhu
A1 - Jian Pu
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 22
IS - 7
SP - 926
EP - 939
%@ 2095-9184
Y1 - 2021
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2000127
Abstract: sepsis treatment is a highly challenging effort to reduce mortality in hospital intensive care units since the treatment response may vary for each patient. Tailored treatment recommendations are desired to assist doctors in making decisions efficiently and accurately. In this work, we apply a self-supervised method based on reinforcement learning (RL) for treatment recommendation on individuals. An uncertainty evaluation method is proposed to separate patient samples into two domains according to their responses to treatments and the state value of the chosen policy. Examples of two domains are then reconstructed with an auxiliary transfer learning task. A distillation method of privilege learning is tied to a variational auto-encoder framework for the transfer learning task between the low- and high-quality domains. Combined with the self-supervised way for better state and action representations, we propose a deep RL method called high-risk uncertainty (HRU) control to provide flexibility on the trade-off between the effectiveness and accuracy of ambiguous samples and to reduce the expected mortality. Experiments on the large-scale publicly available real-world dataset MIMIC-III demonstrate that our model reduces the estimated mortality rate by up to 2.3% in total, and that the estimated mortality rate in the majority of cases is reduced to 9.5%.
[1]Almirall D, Compton SN, Gunlicks-Stoessel M, et al., 2012. Designing a pilot sequential multiple assignment randomized trial for developing an adaptive treatment strategy. Stat Med, 31(17):1887-1902.
[2]Asiain E, Clempner JB, Poznyak AS, 2018. A reinforcement learning approach for solving the mean variance customer portfolio in partially observable models. Int J Artif Intell Tools, 27(8):1850034.
[3]Bajor JM, Lasko TA, 2017. Predicting medications from diagnostic codes with recurrent neural networks. Int Conf on Learning Representations, p.1-19.
[4]Chen JG, Li KL, Rong HG, et al., 2018. A disease diagnosis and treatment recommendation system based on big data mining and cloud computing. Inform Sci, 435:124-149.
[5]Chen Z, Marple K, Salazar E, et al., 2016. A physician advisory system for chronic heart failure management based on knowledge patterns. Theory Pract Log Progr, 16(5-6):604-618.
[6]Futoma J, Hariharan S, Heller KA, et al., 2017. An improved multi-output Gaussian process RNN with real-time validation for early sepsis detection. Proc 2nd Machine Learning for Healthcare Conf, p.243-254.
[7]Gidaris S, Singh P, Komodakis N, 2018. Unsupervised representation learning by predicting image rotations. Int Conf on Learning Representations, p.1-16.
[8]Gunlicks-Stoessel M, Mufson L, Westervelt A, et al., 2016. A pilot smart for developing an adaptive treatment strategy for adolescent depression. J Clin Child Adolesc Psychol, 45(4):480-494.
[9]Hendrycks D, Mazeika M, Kadavath S, et al., 2019. Using self-supervised learning can improve model robustness and uncertainty. Proc 33rd Conf on Neural Information Processing Systems, p.1-13.
[10]Hinton G, Vinyals O, Dean J, 2015. Distilling the knowledge in a neural network. https://arxiv.org/abs/1503.02531
[11]Jiang N, Li LH, 2016. Doubly robust off-policy value evaluation for reinforcement learning. Proc 33rd Int Conf on Machine Learning, p.652-661.
[12]Johnson AEW, Pollard TJ, Shen L, et al., 2016. MIMIC-III, a freely accessible critical care database. Sci Data, 3:160035.
[13]Kaelbling LP, Littman ML, Moore AW, 1995. An introduction to reinforcement learning. In: Steels L (Ed.), The Biology and Technology of Intelligent Autonomous Agents. Springer, Berlin, p.90-127.
[14]Katzman JL, Shaham U, Cloninger A, et al., 2018. DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med Res Method, 18(1):24.
[15]Kingma DP, Welling M, 2014. Auto-encoding variational Bayes. Int Conf on Learning Representations Ithacap, p.1-14.
[16]Kingma DP, Salimans T, Jozefowicz R, et al., 2016. Improved variational inference with inverse autoregressive flow. Proc 30th Int Conf on Neural Information Processing Systems, p.4743-4751.
[17]Komorowski M, Celi LA, Badawi O, et al., 2018. The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care. Nat Med, 24(11):1716-1720.
[18]Li Y, Zeng JB, Shan SG, et al., 2019. Self-supervised representation learning from videos for facial action unit detection. IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.10924-10933.
[19]Long M, Cao Y, Wang J, et al., 2015. Learning transferable features with deep adaptation networks. Int Conf on Machine Learning, p.97-105.
[20]Lopez-Paz D, Bottou L, Schölkopf B, et al., 2016. Unifying distillation and privileged information. https://arxiv.org/abs/1511.03643
[21]Mnih V, Kavukcuoglu K, Silver D, et al., 2015. Playing Atari with deep reinforcement learning. https://arxiv.org/abs/1312.5602
[22]Nemati S, Ghassemi MM, Clifford GD, 2016. Optimal medication dosing from suboptimal clinical examples: a deep reinforcement learning approach. Proc 38th Annual Int Conf of the IEEE Engineering in Medicine and Biology Society, p.2978-2981.
[23]Peng XF, Ding Y, Wihl D, et al., 2018. Improving sepsis treatment strategies by combining deep and kernel-based reinforcement learning. American Medical Informatics Association® Annual Symp, p.887-896.
[24]Raghu A, Komorowski M, Ahmed I, et al., 2017. Deep reinforcement learning for sepsis treatment. Proc 31st Conf on Neural Information Processing Systems, p.1-9.
[25]Raghu A, Komorowski M, Singh S, 2018. Model-based reinforcement learning for sepsis treatment. https://arxiv.org/abs/1811.09602
[26]Saria S, 2018. Individualized sepsis treatment using reinforcement learning. Nat Med, 24(11):1641-1642.
[27]Shortreed SM, Laber E, Lizotte DJ, et al., 2011. Informing sequential clinical decision-making through reinforcement learning: an empirical study. Mach Learn, 84(1-2):109-136.
[28]Singer M, Deutschman CS, Seymour CW, et al., 2016. The third international consensus definitions for sepsis and septic shock (sepsis-3). JAMA, 315(8):801-810.
[29]Vapnik V, Izmailov R, 2015. Learning using privileged information: similarity control and knowledge transfer. J Mach Learn Res, 16(1):2023-2049.
[30]Vondrick C, Pirsiavash H, Torralba A, 2016. Anticipating visual representations from unlabeled video. IEEE Conf on Computer Vision and Pattern Recognition, p.98-106.
[31]Wang L, Zhang W, He XF, et al., 2018. Supervised reinforcement learning with recurrent neural network for dynamic treatment recommendation. Proc 24th ACM SIGKDD Int Conf on Knowledge Discovery &Data Mining, p.2447-2456.
[32]Wang ZY, Schaul T, Hessel M, et al., 2016. Dueling network architectures for deep reinforcement learning. Proc 33rd Int Conf on Machine Learning, p.1995-2003.
[33]Weng WH, Gao MW, He Z, et al., 2017. Representation and reinforcement learning for personalized glycemic control in septic patients. Proc 31st Conf on Neural Information Processing Systems, p.1-5.
[34]Yu C, Liu JM, Nemati S, 2019. Reinforcement learning in healthcare: a survey. https://arxiv.org/abs/1908.08796
[35]Zhai XH, Oliver A, Kolesnikov A, et al., 2019. S4L: self-supervised semi-supervised learning. IEEE/CVF Int Conf on Computer Vision, p.1476-1485.
[36]Zhang YT, Chen R, Tang J, et al., 2017. LEAP: learning to prescribe effective and safe treatment combinations for multimorbidity. Proc 23rd ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, p.1315-1324.
[37]Zhao SJ, Song JM, Ermon S, 2017. InfoVAE: information maximizing variational autoencoders. https://arxiv.org/abs/1706.02262
Open peer comments: Debate/Discuss/Question/Opinion
<1>