Full Text:   <1708>

CLC number: R73

On-line Access: 

Received: 2008-05-15

Revision Accepted: 2008-07-30

Crosschecked: 0000-00-00

Cited: 5

Clicked: 3817

Citations:  Bibtex RefMan EndNote GB/T7714

-   Go to

Article info.
1. Reference List
Open peer comments

Journal of Zhejiang University SCIENCE B 2008 Vol.9 No.11 P.863~870


A data-mining approach to biomarker identification from protein profiles using discrete stationary wavelet transform

Author(s):  Hussain MONTAZERY-KORDY, Mohammad Hossein MIRAN-BAYGI, Mohammad Hassan MORADI

Affiliation(s):  Department of Electrical and Computer Engineering, Tarbiat Modares University, P.O. Box 14115-111, Tehran, Iran; more

Corresponding email(s):   Miranbmh@modares.ac.ir

Key Words:  Proteomics, Discrete stationary wavelet transform, Data mining, Feature selection, Biomarker, Cancer classification

Hussain MONTAZERY-KORDY, Mohammad Hossein MIRAN-BAYGI, Mohammad Hassan MORADI. A data-mining approach to biomarker identification from protein profiles using discrete stationary wavelet transform[J]. Journal of Zhejiang University Science B, 2008, 9(11): 863~870.

@article{title="A data-mining approach to biomarker identification from protein profiles using discrete stationary wavelet transform",
author="Hussain MONTAZERY-KORDY, Mohammad Hossein MIRAN-BAYGI, Mohammad Hassan MORADI",
journal="Journal of Zhejiang University Science B",
publisher="Zhejiang University Press & Springer",

%0 Journal Article
%T A data-mining approach to biomarker identification from protein profiles using discrete stationary wavelet transform
%A Mohammad Hossein MIRAN-BAYGI
%A Mohammad Hassan MORADI
%J Journal of Zhejiang University SCIENCE B
%V 9
%N 11
%P 863~870
%@ 1673-1581
%D 2008
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.B0820163

T1 - A data-mining approach to biomarker identification from protein profiles using discrete stationary wavelet transform
A1 - Mohammad Hossein MIRAN-BAYGI
A1 - Mohammad Hassan MORADI
J0 - Journal of Zhejiang University Science B
VL - 9
IS - 11
SP - 863
EP - 870
%@ 1673-1581
Y1 - 2008
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.B0820163

Objective: To develop a new bioinformatic tool based on a data-mining approach for extraction of the most informative proteins that could be used to find the potential biomarkers for the detection of cancer. Methods: Two independent datasets from serum samples of 253 ovarian cancer and 167 breast cancer patients were used. The samples were examined by surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS). The datasets were used to extract the informative proteins using a data-mining method in the discrete stationary wavelet transform domain. As a dimensionality reduction procedure, the hard thresholding method was applied to reduce the number of wavelet coefficients. Also, a distance measure was used to select the most discriminative coefficients. To find the potential biomarkers using the selected wavelet coefficients, we applied the inverse discrete stationary wavelet transform combined with a two-sided t-test. Results: From the ovarian cancer dataset, a set of five proteins were detected as potential biomarkers that could be used to identify the cancer patients from the healthy cases with accuracy, sensitivity, and specificity of 100%. Also, from the breast cancer dataset, a set of eight proteins were found as the potential biomarkers that could separate the healthy cases from the cancer patients with accuracy of 98.26%, sensitivity of 100%, and specificity of 95.6%. Conclusion: The results have shown that the new bioinformatic tool can be used in combination with the high-throughput proteomic data such as SELDI-TOF MS to find the potential biomarkers with high discriminative power.

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article


[1] Adam, B.L., Qu, Y., Davis, J.W., Ward, M.D., Clements, M.A., Cazares, L.H., Semmes, O.J., Schellhammer, P.F., Yasui, Y., Feng, Z., Wright, G.L., 2002. Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. Cancer Research, 62:3609-3614.

[2] Alaoui-Jamali, M.A., Xu, Y.J., 2006. Proteomic technology for biomarker profiling in cancer: an update. Journal of Zhejiang University SCIENCE B, 7(6):411-420.

[3] Alexe, G., Alexe, S., Liotta, L.A., Petricoin, E.F., Reiss, M., Hammer, P.L., 2004. Ovarian cancer detection by logical analysis of proteomic data. Proteomics, 4(3):766-783.

[4] Baggerly, K.A., Morris, J.S., Coombes, K.R., 2004. Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments. Bioinformatics, 20(5):777-785.

[5] Bhanot, G., Alexe, G., Venkataraghavan, B., Levine, A.J., 2006. A robust meta-classification strategy for cancer detection from MS data. Proteomics, 6(2):592-604.

[6] Chen, S., Hong, D., Shyr, Y., 2007. Wavelet-based procedures for proteomic mass spectrometry data processing. Computational Statistics & Data Analysis, 52(1):211-220.

[7] Coombes, K.R., Koomen, J., Baggerly, K.A., Morris, J.S., Kobayashi, R., 2005. Improved peak detection and quantification of mass spectrometry data acquired from SELDI by denoising spectra with the undecimated discrete wavelet transform. Proteomics, 5(16):4107-4117.

[8] Donoho, D.L., 1995. De-noising by soft-thresholding. IEEE Transaction on Information Theory, 41(3):613-627.

[9] Donoho, D., Johnstone, L., 1998. Minimax estimation via wavelet shrinkage. Annals of Statistics, 26(3):879-921.

[10] Hilario, M., Kalousis, A., 2008. Approaches to dimensionality reduction in proteomic biomarker studies. Briefings in Bioinformatics, 9(2):102-118.

[11] Hilario, M., Kalousis, A., Pellegrini, C., Muller, M., 2006. Processing and classification of protein mass spectra. Mass Spectrometry Reviews, 25(3):409-449.

[12] Hu, Y., Zhang, S., Yu, J., Liu, J., Zheng, S., 2005. SELDI-TOF-MS: the proteomics and bioinformatics approaches in the diagnosis of breast cancer. The Breast, 14(4):250-255.

[13] Hu, Y., Jiang, T., Shen, A., Li, W., Wang, X., Hu, J., 2007. A background elimination method based on wavelet transform for Raman spectra. Chemometrics and Intelligent Laboratory Systems, 85(1):94-101.

[14] Jemal, A., Siegel, R., Ward, E., Murray, T., Xu, J., Thun, M.J., 2007. Cancer statistics. CA Cancer J. Clin., 57(1):43-66.

[15] Liu, H., Li, J., Wong, L., 2002. A comparative study on feature selection and classification method using gene expression profiles and proteomic patterns. Genome Informatics, 13:51-60.

[16] Malyarenko, D.I., Cooke, W.E., Adam, B.L., Malik, G., Chen, H., Tracy, E.R., Trosset, M.W., Sasinowski, M., Semmes, O.J., Manos, D.M., 2005. Enhancement of sensitivity and resolution of SELDI-TOF mass spectrometric records for serum peptides using time-series analysis techniques. Clinical Chemistry, 51(1):65-74.

[17] Morris, J.S., Coombes, K.R., Koomen, J., Baggerly, K.A., Kobayashi, R., 2005. Feature extraction and quantification for mass spectrometry in biomedical applications using the mean spectrum. Bioinformatics, 21(9):1764-1775.

[18] Nason, G.P., Silverman, B.W., 1995. The Stationary Wavelet Transforms and Statistical Applications. In: Lecture Notes in Statistics: Wavelets and Statistics. Springer, p.281-299.

[19] Petricoin, E.F.III, Liotta, L.A., 2004. SELDI-TOF-based serum proteomic pattern diagnostics for early detection of cancer. Current Opinion in Biotechnology, 15(1):24-30.

[20] Petricoin, E.F.III, Ardekani, A.M., Hitt, B.A., Levine, P.J., Fusaro, V.A., Steinberg, S.M., Mills, G.B., Simone, C., Fishman, D.A., Kohn, E.C., Liotta, L.A., 2002. Use of proteomic patterns in serum to identify ovarian cancer. The Lancet, 359(9306):572-577.

[21] Qu, Y., Adam, B.L., Thornquist, M., Potter, J.D., Thompson, M.L., Yasui, Y., Davis, J.W., Cazares, L.H., Schellhammer, P.F., Clements, M.A., Wright, G.L., Feng, Z., 2003. Data reduction using a discrete wavelet transform in discriminant analysis of very high dimensionality data. Biometrics, 59(1):143-151.

[22] Ressom, H.W., Varghese, R.S., Abdel-Hamid, M., Eissa, S.A.L., Saha, D., Goldman, L., Petricoin, E.F., Conrads, T.P., Veenstra, T.D., Loffredo, C.A., Goldman, R., 2005. Analysis of mass spectral serum profiles for biomarker selection. Bioinformatics, 21(21):4039-4045.

[23] Ruckstuhl, A.F., Jacobson, M.P., Field, R.W., Dodd, J.A., 2001. Baseline subtraction using robust local regression estimation. Journal of Quantitative Spectroscopy and Radiative Transfer, 68(2):179-193.

[24] Shin, H., Sheu, B., Joseph, M., Markey, M.K., 2008. A guilt-by-association feature selection: identifying biomarkers from proteomic profiles. Journal of Biomedical Informatics, 41(1):124-136.

[25] Subramani, P., Sahu, R., Verma, S., 2006. Feature selection using Haar wavelet power spectrum. BMC Bioinformatics, 7(1):432.

[26] Theodoridis, S., Koutroumbas, K., 2003. Pattern Recognition, 2nd Ed. Academic Press, p.174-183.

[27] Thomas, A., Tourassi, G.D., Elmaghraby, A.S., Valdes, R., Jortani, S.A., 2006. Data mining in proteomic mass spectrometry. Clinical Proteomics, 2(1-2):13-32.

[28] Vannucci, M., Sha, N., Brown, P.J., 2005. NIR and mass spectra classification: baysian methods for wavelet-based feature selection. Chemometrics and Intelligent Laboratory Systems, 77(1-2):139-148.

[29] Whelehan, O.P., Earll, M.E., Johansson, E., Toft, M., Eriksson, L., 2006. Detection of ovarian cancer using chemometric analysis of proteomic profiles. Chemometrics and Intelligent Laboratory Systems, 84(1-2):82-87.

[30] Xu, W.H., Chen, Y.D., Hu, Y., Yu, J.K., Wu, X.G., Jiang, T.J., Zheng, S., Zhang, S.Z., 2006. Preoperatively molecular staging with CM10 ProteinChip and SELDI-TOF-MS for colorectal cancer patients. Journal of Zhejiang University SCIENCE B, 7(3):235-240.

[31] Yu, J.S., Ongarello, S., Fiedler, R., Chen, X.W., Toffolo, G., Cobelli, C., Trajanoski, Z., 2005. Ovarian cancer identification based on dimensionality reduction for high-throughput mass spectrometry data. Bioinformatics, 21(10):2200-2209.

[32] Zhang, X., Lu, X., Shi, Q., Xu, X.Q., Leung, H.C., Harris, L.N., Iglehart, J.D., Miron, A., Liu, J.S., Wong, W.H., 2006. Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data. BMC Bioinformatics, 7(1):197.

[33] Zhu, L.R., Zhang, W.Y., Yu, L., Zheng, Y.H., Hu, J., Liao, Q.P., 2008. Proteomic patterns for endometerial cancer using SELDITOF-MS. Journal of Zhejiang University SCIENCE B, 9(4):286-290.

[34] Zinkin, N.T., Grall, F., Bhaskar, K., Out, H., Spentzos, D., Kalmowitz, B., Wells, M., Guerrero, M., Asara, J.M., Libermann, T.A., Afdhal, N.H., 2008. Serum proteomics and biomarkers in hepatocellular carcinoma and chronic liver disease. Clinical Cancer Research, 14(2):470-477.

Open peer comments: Debate/Discuss/Question/Opinion


Please provide your name, email address and a comment

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - Journal of Zhejiang University-SCIENCE