樱花视频

Skip to main content
  • Research
  • Published:

Machine learning algorithms that predict the risk of prostate cancer based on metabolic syndrome and sociodemographic characteristics: a prospective cohort study

Abstract

Background

Given the rapid increase in the prevalence of prostate cancer (PCa), identifying its risk factors and developing suitable risk prediction models has important implications for public health. We used machine learning (ML) approach to screen participants with high risk of PCa and, specifically, investigated whether participants with metabolic syndrome (MetS) exhibited an elevated PCa risk.

Methods

A prospective cohort study was performed with 41,837 participants in South Korea. We predicted PCa based on MetS, its components, and sociodemographic factors using Cox proportional hazards and five ML models. Integrated Brier score (IBS) and C-index were used to assess model performance.

Results

A total of 210 incident PCa cases were identified. We found good calibration and discrimination for all models (C-index鈥夆墺鈥0.800 and IBS鈥=鈥0.01). Importantly, performance increased after excluding MetS and its components from the models; the highest C-index was 0.862 for survival support vector machine. In contrast, first-degree family history of PCa, alcohol consumption, age, and income were valuable for PCa prediction.

Conclusion

ML models are an effective approach to develop prediction models for survival analysis. Furthermore, MetS and its components do not seem to influence PCa susceptibility, in contrast to first-degree family history of PCa, age, alcohol consumption, and income.

Peer Review reports

Introduction

According to GLOBOCAN, which is a database to estimate cancer incidence and mortality in 185 countries indicated that prostate cancer (PCa) is the second-most frequent cancer and fifth-most common cause of cancer-related deaths among men [1]. PCa exhibits differences among countries due to economic development, Western lifestyle, and life expectancy [2]. The incidence rate of PCa is 3.0 times higher in countries with higher Human Development Indexes (HDIs) compared to those with lower HDIs (37.5/100,000 vs. 11.3/100,000) [1]. In Asia, PCa is one of the three most common cancers in 20 countries [2]. Although PCa incidence has experienced different trends since 1999, it has increased more rapidly than other major cancers among male Korean individuals [3, 4]. Thus, identifying important risk factors for PCa and developing suitable risk prediction models has key implications for public health.

Many factors have driven the rise in PCa incidence [5]. Although evidence has robustly indicated that age, family history, and race are risk factors for PCa, other factors may influence the development of PCa [6]. Notably, the HDI is used to assess national development in terms of three indicators: health, education, and income. The positive association between the HDI and PCa is likely related to age and Western lifestyle [2]. The Western diet is typically high in sugary beverages, red meat, and processed meat and low in vegetables and fruits; this diet has been implicated as an etiologic factor for metabolic syndrome (MetS) [7]. Due to the spread of the Western diet worldwide, MetS has been recognized as a global epidemic [8].

MetS represents a cluster of risk factors for noncommunicable diseases, including dysglycemia, low levels of abdominal obesity, high-density lipoprotein (HDL), high blood pressure and high triglyceride levels [9]. Accumulating evidence has demonstrated that MetS and its components are linked to cancer development. For example, hypertension may block and subsequently modify apoptosis and affect cell turnover, whereas diabetes is hypothesized to increase the risk of cancer through hyperinsulinemia and increased bioavailability of insulin-like growth factor-1 [10]. Notably, Asian migrants exhibited a high risk of PCa, which suggests that Westernization has a detrimental effect [11, 12]. Thus, MetS and its components are thought to increase PCa risk [13]. However, the roles of MetS and its components in the etiology of PCa remain ambiguous due to inconsistent results worldwide [11, 14, 15]. Indeed, contradictory results have been reported within Korea [14, 16]. Thus, the link between MetS and PCa remains unclear.

Machine learning (ML) has been widely used to accurately predict risks in medical fields. Notably, ML application for time-to-event data has received increased attention because it is expected to overcome the limitations of CPH model such as proportionality assumption, linear association [17]. We can classify survival analysis into statistical methods and ML-based methods. Both methods have a similar goal: to make predictions and estimate survival probability. However, statistical methods focus on the distribution of the event of interest and statistical properties of parameters, whereas ML-based methods combine traditional methods with various ML algorithms. Thus, ML is expected to be an effective algorithm for survival analysis [18]. Given this background, we proposed that the ML approach may be a feasible and promising method for predicting PCa risk in a prospective cohort study.

To the best of our knowledge, ML survival models have not previously been used to predict PCa. In addition, PCa incidence is high in Korea; thus, it is important to identify individuals at greater risk. Our study aimed to use the ML approach to screen participants with a high risk of PCa and, specifically, investigate whether an elevated risk is observed in participants with MetS.

Materials and methods

Study population

The information on establishment and recruitment of Cancer Screenee Cohort Study has been described [19]. Briefly, this cohort was designed by the National Cancer Center (NCC) to determine potential etiologic factors for cancer development in Korea. From August 2002 to December 2014, a baseline questionnaire was required to complete by 41,837 participants aged 16 and older who underwent health examinations at the Center for Cancer Prevention and Detection at the NCC. We followed participants until December 2019 to identify incident cancer cases. We excluded 1,754 participants for having an incomplete questionnaire and 2,100 participants due to a previous diagnosis of any cancer. Additionally, of the remaining participants, a total of 18,110 female participants, 5,780 participants who lacked information on individual characteristics, and 6 participants aged鈥<鈥20 years were excluded. In total, we had 13,371 participants for final analysis (Fig.听1).

Fig. 1
figure 1

Flow chart of the study participants. Among 41,837 participants were recruited, 41,121 participants were linked to the Korea Central Cancer Registry. We excluded 1,754 participants for having an incomplete questionnaire, 2,100 participants due to a previous diagnosis of any cancer, 18,110 female participants, 5,780 participants who lacked information on individual characteristics, and 6 participants aged鈥<鈥20. In total, we had 13,371 participants for final analysis

Outcome and risk factor measurement

We ascertained potential incident PCa cases through linkage with the 2019 Korea National Cancer Incidence Database of the Korea Central Cancer Registry. The International Classification of Diseases, 10th Revision (C61) was used to identify PCa cases.

MetS criteria are being widely used in Korea documented in a previous study [20]. In detail, Adult Treatment Panel III (ATP III) of the National Cholesterol Education Program (NCEP) and specific values of waist circumference from the World Health Organization and the Korean Society for the Study of Obesity are used as criteria for MetS diagnosis. MetS is presented if participants meet鈥夆墺鈥3 criteria as follows:

  1. 1.

    Waist circumference: males鈥夆墺鈥90听cm.

  2. 2.

    Blood pressure: 鈮130/85 mmHg.

  3. 3.

    Triglycerides: 鈮150听mg/dL.

  4. 4.

    High-density lipoprotein cholesterol (HDL cholesterol): males鈥<鈥40听mg/dL.

  5. 5.

    Fasting glucose: 鈮100听mg/dL or a history of diabetes.

Blood-related components of MetS were determined based on blood samples taken from participants after they had fasted for 8听h. We measured height (m) and weight (kg) with automatic height and weight measurements (DS-102, Dong Shin Jenix Co., Ltd., Seoul, Korea) or InBody 3.0 (Biospace, Seoul, Korea). The measurement of waist circumference was performed with a tape measure 1听cm above the umbilicus with minimal respiration. We utilized a chemistry analyser (TBA-200FR, Toshiba, Tokyo, Japan) for measuring triglyceride, HDL cholesterol, and fasting glucose levels. The measurement of blood pressure was performed after the patients had 15听min of rest by trained personnel with an automatic blood pressure monitor (FT-200听S, Jawon Medical, Kyungsan, Korea) [21]. Furthermore, we collected information on demographic characteristics and behaviors related to health based on a self-administered questionnaire at baseline.

Statistical analysis

We calculated person-years from baseline to the date of cancer diagnosis, death, or end of follow-up (December 31, 2019), whichever came first. The comparison of baseline characteristics between the incident PCa cases and nonincident PCa cases was conducted with chi-square tests and t tests.

We used the Cox proportional hazards (CPH) model and ML models involving survival trees (ST), random survival forest (RSF), extra survival trees (EST), survival support vector machine (SSVM), and gradient boosting (GB) to predict PCa. These ML models have been indicated for outcome prediction in right-censored time-to-event data [18].

We developed models through many steps [22]. First, we randomly split the data into training and testing datasets using an 80:20 ratio. Second, we searched hyperparameters for c-index maximization with a 10-fold cross validation using grid search. Third, we used the training dataset to fit the models based on selected input variables, the optimal hyperparameters, and default values of other hyperparameters. Fourth, the performance of models was evaluated using testing dataset. Then, the predictor importance was explored using the ELI5 package [23]. The robustness of the models was assessed with bootstrapping and 10-fold cross validation (Fig.听2).

Fig. 2
figure 2

Machine learning analysis. We presented several steps for model development. We randomly split the data into training and testing datasets and used the Cox proportional hazards model and ML survival models including random survival forest, survival trees, survival support vector machine, extra survival trees, and gradient boosting to predict prostate cancer

Additionally, a CPH model was used to investigate MetS and its components and sociodemographic factors in relation to PCa risk. Statistical analyses were performed using Python software (version 3.7.9) with the scikit-survival library [23] and SAS software (version 9.4, SAS Institute, Cary, NC, USA) with a two-sided P-value less than 0.05 was considered statistically significant. Furthermore, multivariate imputation by chained equations (MICE) has been considered a principled approach to handle missing values [24]. Thus, we applied this method to impute missing values in our data with 鈥淢ICE鈥 package in R software and performed sensitivity analysis to predict PCa.

Models

ML survival and CPH models were used to predict PCa. These models have been well documented to predict outcome in time-to-event analysis. The details were described as follows:

ST is utilized to solve tasks related to regression and classification and specifically tailored to handle survival analysis. The implementation of ST is described as follows: data partition is performed based on splitting criteria, and objects have similar events grouped as the same node. Minimizing within-node homogeneity and maximizing between-node heterogeneity are criteria for splitting in ST [18].

Random forest addresses problems in relation to regression and classification by building ensembles of decision trees and combining results to give a final decision. RSF adapts random forest to censored time-to-event data [25]. Thus, random forest and RSF share the same principles to implement. In detail, bootstrapped data and random feature selection are used to grow the tree and split tree nodes, respectively. Then, output is given based on combining the results of the individual trees. Notably, during the process of growing the tree, RSF incorporates censoring information into the log-rank splitting rules [26].

Similar to RSF, GB model is an ensemble model. While RSF averages predictions from independent trees to obtain the final prediction, the GB model follows an additive approach. New base-learners are constructed, which are correlated with the negative gradient of the loss function maximally and related to the whole ensemble, which is a potential principle of this model [27].

Support vector machine determines a hyperplane to maximize the separation between classes. SSVM is an extension of the support vector machine to deal with survival data. It uses an ordinal outcome to solve survival analysis as a classification task. Risk rank among participants is predicted rather than survival time estimation [26].

EST is a slightly different version of RSF [28]. The splitting criteria of EST are more random compared to RSF. A subset of predictors is selected randomly, which is similar to RSF. However, EST exhibits a different splitting rule. In detail, thresholds are drawn randomly for each predictor, and the best of these randomly generated thresholds is picked as the splitting rule [23].

Model evaluation

The performance of model was evaluated based on C-index, Integrated Brier score (IBS), and time-dependent area under the curve (AUC), which were widely used in survival analysis [18, 23]. C-index reflects the overall discriminative ability, whereas time-dependent AUC exhibits discriminative ability at an interested time. As the C-index approaches 1, the model鈥檚 performance improves [29]. We used IBS to assess calibration ability of models. IBS ranges from 0 to 1; a lower value of IBS indicates a greater accuracy [25]. Furthermore, we used D-index to evaluate the separation between two equal sized high risk and low risk groups divided according to the risk score obtained from different models [29].

Results

Participant characteristics

Table听1 represents information on participant characteristics. We determined 210 incident PCa cases among 13,371 individuals over a median of 10.5 years of follow-up. Compared to participants without PCa, those with incident PCa were significantly older (58.7鈥壜扁7.3 years vs. 49.4鈥壜扁9.2 years, P鈥<鈥0.001). Additionally, individuals with incident PCa had a higher proportion of first-degree relatives of PCa than those without PCa (3.8% vs. 1.2%, P鈥<鈥0.001) and were more likely to be ex-smokers (45.2% vs. 39.2%, P鈥<鈥0.001). Moreover, there was a higher percentage of participants with PCa who had high fasting glucose compared to that of participants without PCa (33.3% vs. 23.8%, P鈥=鈥0.001).

Table 1 Characteristics of the study subjects

Model performance

Table听2 shows the performance of the models. The CPH model had a slightly higher C-index than the other models (0.816 (0.815鈥0.818)). The RSF and GB models had a similar C-index: 0.808 (0.806鈥0.809) and 0.808 (0.806鈥0.810), respectively. We found a low C-index for ST and EST models of 0.795 (0.793鈥0.797) and 0.767 (0.765鈥0.769), respectively. Additionally, we found a good calibration for all models with an IBS鈥=鈥0.01. Regarding the D-index, CPH and SSVM models exhibited the highest value, followed by RSF and GB models. Furthermore, the mean AUC of the CPH and SSVM models was similar听(Fig. 3).

Fig. 3
figure 3

Time-dependent AUC. We presented time-dependent AUC of six models including CPH, RSF, ST, GB, EST, and SSVM

Table 2 Performance of models

Furthermore, we used MICE package to impute missing values and performed a sensitivity analysis. Notably, the slightly higher performance concerning C-index was also observed for CPH model compared to the remaining models (Supplementary Table 1).

MetS and PCa risk

We assessed the contribution of MetS and its components to predicting PCa by comparing the performance of models with and without MetS and its components. MetS and its individual components did not seem to contribute to predicting PCa. We found a higher performance in models that excluded MetS and its components compared to those that included MetS and its components. The highest C-index was found for the SSVM model, followed by the CPH model; the C-indexes were 0.862 (0.860鈥0.864) and 0.857 (0.855鈥0.858), respectively. All models were observed to have a good calibration with an IBS value of 0.02. The SSVM model showed a higher D-index with a value of 8.65 (Table听3). Similar results were observed for the time-dependent AUC, and a higher mean AUC was observed for the SSVM model (data not shown).

Table 3 Performance of models after excluding MetS and its components from models

Table听4 presents the top 5 most important predictors for PCa among the ML models. As MetS and its components did not contribute to prediction of PCa, their importance is not shown. Age, first-degree family history of PCa, alcohol consumption, and income were the factors that most contributed to prediction of PCa. The CPH model was utilized to analyze the association of MetS, its individual components, and the sociodemographic factors with PCa and found similar results. Specifically, MetS and its components were not significantly associated with PCa. Additionally, nonsignificant association was also observed for the combination of two components of MetS (data not shown). In contrast, participants with first-degree relatives of PCa had a 4.0-fold higher risk of PCa than those without first-degree relatives of PCa; the hazard ratio (HR) and 95% confidence interval (CI) were 4.00 (1.97鈥8.14) according to the multivariate analysis. Previous alcohol consumption (HR (95% CI)鈥=鈥1.91 (1.18鈥3.10)), increased age (HR (95% CI)鈥=鈥1.14 (1.12鈥1.16)), and higher household income (HR (95% CI)鈥=鈥1.82 (1.15鈥2.88)) were risk factors for PCa (Table听5).

Table 4 Top 5 most important predictors for incident PCa of ML models
Table 5 Hazard ratios and 95% confidence intervals of incident PCa related to MetS and sociodemographic factors

Discussion

Our study found good discrimination and calibration performance of the CPH model and ML models for predicting PCa risk. MetS and its components did not contribute to predicting PCa. In contrast, first-degree family history of PCa, alcohol consumption, age, and income were risk factors for PCa.

In recent years, digital medicine and computational science have demonstrated substantial value in constructing predictive models and exploring potential predictors for diseases. As a result, ML techniques are increasingly being used [30] to effectively predict cancer cases. A previous study used artificial neural networks, random forest, support vector machine, prostate-specific antigen (PSA) density, and PSA velocity to predict PCa; the authors found that ML models better predicted PCa (higher AUCs) than PSA density and PSA velocity [31]. Similarly, ML was suggested to be a promising approach for predicting PCa in another study that examined the use of five models in predicting PCa among 551 patients [32].

Notably, in the present study, we emphasized the construction of ML models for PCa prediction in time-to-event data. All models in our study exhibited good calibration and discriminative ability. Although the ML approach has been suggested to be a useful survival analysis method in previous studies, it has not been identified as a common epidemiological tool. Our study provides evidence that ML should be considered a feasible and promising survival analysis approach for PCa prediction.

To date, the association of MetS with PCa has been debated due to contradictory conclusions from previous studies. MetS was considered a risk factor for PCa in previous cohort studies [15, 16]. This positive association was reinforced by the finding of a meta-analysis of 24 studies; the results indicated a slight association of MetS with PCa incidence [11]. However, some studies found a protective effect of MetS against PCa instead of a detrimental effect [33, 34]; moreover, other studies did not report a significant association [10, 35, 36]. The latter set of findings is similar to our findings in the current study. Specifically, models that excluded MetS and its components exhibited better performance, indicating that these variables did not contribute to prediction of PCa. Importantly, this lack of a significant association is in agreement with the findings of a previous study in Korea, MetS was not found to contribute to PCa prediction in a cohort study based on a nationwide population with 1,917,430 participants [14].

Thus, research has been unable to establish a link between MetS and PCa due to insufficient epidemiological evidence [36]. Furthermore, the impact of MetS on PCa has demonstrated differences across countries. For example, MetS is a risk factor for PCa in Europe but not in Asia or the US [35]. The key reasons for this discrepancy remain unclear, but the possible explanations are as follows. First, there prevalences of obesity and MetS are lower in the Scandinavian population than in the US. Thus, more prevalent and aggressive MetS features, especially insulin resistance and diabetes may be speculated for a lower association [35]. Second, differences in the detection of PCa among populations might partially explain the observed discrepancies [33, 35]. Third, MetS includes at least 3 individual components, which may exert antagonistic effects, neutralizing the positive or negative effects. For example, diabetes and obesity were observed to have a negative association with localized PCa, whereas hypertension had a positive association with PCa [36]. Fourth, different cutoff values of MetS components may contribute to this variation [33, 35]. For example, the World Health Organization uses the waist-to-hip ratio to determine abdominal obesity, whereas the NCEP uses waist circumference; in Korean men, waist circumference鈥夆墺鈥90听cm is categorized as abdominal obesity [14].

In contrast, age and family history are strong predictors for PCa. PCa incidence curves were drawn to increase sharply after 55 years of age, reach a peak at 70鈥74 years, and then decline slightly thereafter [6]. Regarding family history, participants with first-degree relatives of PCa had a 3-fold higher PCa risk than those without a first-degree family history of PCa [6]. The risk of PCa further increased for individuals with relatives diagnosed with PCa at an early age and those with multiple relatives diagnosed with PCa [6]. Furthermore, we found a strong association of alcohol consumption with PCa susceptibility, which aligns with the findings of other studies [37, 38]. Alcohol is a known carcinogen for several cancers. Although the pathophysiological mechanisms by which alcohol affects the prostate are not fully understood, the main mechanisms proposed are as follows. Alcohol affects cell membrane composition and function, generates free radicals, affects the metabolism of detoxification enzymes, suppresses levels of enzymes responsible for DNA repair, and impairs the immune system [37, 38]. Furthermore, drinkers were found to have lower intakes of fiber, retinol, calcium and iron and higher intakes of total fats and monosaturated fatty acids [39]. Thus, alcohol consumption may be related to sufficient intake of micronutrients and macronutrients as well as increased PCa risk [37]. Additionally, PCa incidence is influenced by socioeconomic factors [40]. In our study, participants with higher household incomes had an elevated risk of PCa.

This study represents the first attempt to use ML approaches to predict PCa with time-to event-data and included MetS as a predictor to clarify its role in PCa susceptibility. We identified outcomes from the national cancer registry, which is a high-quality database with a long duration of follow-up. Furthermore, we performed the laboratory tests using standardized equipment and procedures, carried out by trained personnel following established protocols.

However, this study also has some limitations. First, we only used baseline measurements to identify MetS. Second, external validation was not considered in our study. Third, information on medication, nutrient intake and other potential predictors for PCa was not available for inclusion in our analysis. Fourth, the role of screening in the validation of true prostate cancer risk factors was not considered in our study. PCa screening could affect the true association of risk factors with PCa because participants with PCa screening are more likely to have healthy behaviors [41]. However, PSA screening is currently not included in the national cancer screening program in Korea. Thus, development of simple and effective prediction model for PCa independent of PSA test is necessary to assess individual risk, which helps participants make decisions regarding PCa screening [41, 42].

Conclusions

Our study found good performance of PCa prediction models, suggesting that ML models represent an effective approach for constructing prediction models in survival analysis. Furthermore, MetS and its components do not seem to contribute to the risk of PCa. In contrast, first-degree family history of PCa, age, alcohol consumption and income were risk factors for PCa.

Data availability

The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.

Abbreviations

ATP III:

Adult Treatment Panel III

BMI:

Body mass index

CI:

Confidence interval

CPH:

Cox proportional hazards

EST:

Extra survival trees

GB:

Gradient boosting

HDL cholesterol:

High-density lipoprotein cholesterol

HR:

Hazard ratio

IBS:

Integrated Brier score

MetS:

Metabolic syndrome

MICE:

Multivariate imputation by chained equations

ML:

Machine learning

NCC:

National Cancer Center

NCEP:

National Cholesterol Education Program

PCa:

Prostate cancer

RSF:

Random survival forest

SSVM:

Survival support vector machine

ST:

Survival trees

References

  1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71:209鈥49.

    听 听 听

  2. Zhu Y, Mo M, Wei Y, Wu J, Pan J, Freedland SJ, et al. Epidemiology and genomics of prostate cancer in Asian men. Nat Rev Urol. 2021;18:282鈥301.

    CAS听 听 听

  3. Hong S, Won YJ, Lee JJ, Jung KW, Kong HJ, Im JS, et al. Cancer statistics in Korea: incidence, mortality, survival, and prevalence in 2018. Cancer Res Treat. 2021;53:301鈥15.

    听 听 听 听

  4. Han HH, Park JW, Na JC, Chung BH, Kim CS, Ko WJ. Epidemiology of prostate cancer in South Korea. Prostate Int. 2015;3:99鈥102.

    听 听 听 听

  5. World Cancer Research Fund International. Diet, Nutrition, Physical activity and prostate cancer; World Cancer Research Fund International: London, UK, 2018.

  6. Gann PH. Risk factors for prostate cancer. Rev Urol. 2002;4(Suppl 5):S3鈥10.

    听 听 听

  7. Drake I, Sonestedt E, Ericson U, Wallstr枚m P, Orho-Melander M. A western dietary pattern is prospectively associated with cardio-metabolic traits and incidence of the metabolic syndrome. Br J Nutr. 2018;119:1168鈥76.

    CAS听 听 听

  8. Saklayen MG. The global epidemic of the metabolic syndrome. Curr Hypertens Rep. 2018;20:12.

    听 听 听 听

  9. Esposito K, Chiodini P, Colao A, Lenzi A, Giugliano D. Metabolic syndrome and risk of cancer: a systematic review and meta-analysis. Diabetes Care. 2012;35:2402鈥11.

    听 听 听 听

  10. Russo A, Autelitano M, Bisanti L. Metabolic syndrome and cancer risk. Eur J Cancer. 2008;44:293鈥7.

    听 听 听

  11. Gacci M, Russo GI, De Nunzio C, Sebastianelli A, Salvi M, Vignozzi L, et al. Meta-analysis of metabolic syndrome and prostate cancer. Prostate Cancer Prostatic Dis. 2017;20:146鈥55.

    CAS听 听 听

  12. Hsing AW, Devesa SS. Trends and patterns of prostate cancer: what do they suggest? Epidemiol Rev. 2001;23:3鈥13.

    CAS听 听 听

  13. Karzai FH, Madan RA, Dahut WL. Metabolic syndrome in prostate cancer: impact on risk and outcomes. Future Oncol. 2016;12:1947鈥55.

    CAS听 听 听 听

  14. Choi JB, Myong JP, Lee Y, Koh JS, Hong SH, Yoon BI, et al. Impact of age and metabolic syndrome-like components on prostate cancer development: a nationwide population-based cohort study. Transl Androl Urol. 2021;10:2990鈥7.

    听 听 听 听

  15. Laukkanen JA, Laaksonen DE, Niskanen L, Pukkala E, Hakkarainen A, Salonen JT. Metabolic syndrome and the risk of prostate cancer in Finnish men: a population-based study. Cancer Epidemiol Biomarkers Prev. 2004;13:1646鈥50.

    CAS听 听 听

  16. Yoo S, Oh S, Park J, Cho SY, Cho MC, Son H, et al. Effects of metabolic syndrome on the prevalence of prostate cancer: historical cohort study using the national health insurance service database. J Cancer Res Clin Oncol. 2019;145:775鈥80.

    听 听 听

  17. Wei YX, Liu BP, Zhang J, Wang XT, Chu J, Jia CX. Prediction of recurrent suicidal behavior among suicide attempters with Cox regression and machine learning: a 10-year prospective cohort study. J Psychiatr Res. 2021;144:217鈥24.

    听 听 听

  18. Wang P, Li Y, Reddy CK. Machine learning for survival analysis: a survey. ACM Comput Surv (CSUR). 2019;51:1鈥36.

    听 听

  19. Kim J. Cancer screenee cohort study of the National Cancer Center in South Korea. Epidemiol Health. 2014;36:e2014013.

    听 听 听 听

  20. Lee SH, Tao S, Kim HS. The prevalence of metabolic syndrome and its related risk complications among koreans. Nutrients. 2019;11:1755.

    CAS听 听 听 听

  21. Cho YA, Kim J, Cho ER, Shin A. Dietary patterns and the prevalence of metabolic syndrome in Korean women. Nutr Metab Cardiovasc Dis. 2011;21:893鈥900.

    CAS听 听 听

  22. Tran TT, Lee J, Gunathilake M, Kim J, Kim S-Y, Cho H et al. A comparison of machine learning models and Cox proportional hazards models regarding their ability to predict the risk of gastrointestinal cancer based on metabolic syndrome and its components. Front Oncol. 2023, 13.

  23. P枚lsterl S. Scikit-survival: a library for time-to-event analysis built on top of scikit-learn. J Mach Learn Res. 2020;21:1鈥6.

  24. Azur MJ, Stuart EA, Frangakis C, Leaf PJ. Multiple imputation by chained equations: what is it and how does it work? Int J Methods Psychiatr Res. 2011;20:40鈥9.

    听 听 听 听

  25. Liu Y, Zhou S, Wei H, An S. A comparative study of forest methods for time-to-event data: variable selection and predictive performance. 樱花视频 Med Res Methodol. 2021;21:193.

    CAS听 听 听 听

  26. Moncada-Torres A, van Maaren MC, Hendriks MP, Siesling S, Geleijnse G. Explainable machine learning can outperform Cox regression predictions and provide insights in breast cancer survival. Sci Rep. 2021;11:6968.

    CAS听 听 听 听

  27. Natekin A, Knoll A. Gradient boosting machines, a tutorial. Front Neurorobot. 2013;7:21.

    听 听 听 听

  28. Dey AK, Suhas N, Teja TS, Juneja A. Some variations on Ensembled Random Survival Forest with application to cancer research. Available online (accessed on 04 February 2022).

  29. Xiao J, Mo M, Wang Z, Zhou C, Shen J, Yuan J, et al. The application and comparison of machine learning models for the prediction of breast cancer prognosis: retrospective cohort study. JMIR Med Inf. 2022;10:e33440鈥揺.

    听 听

  30. Abdullah Alfayez A, Kunz H, Grace Lai A. Predicting the risk of cancer in adults using supervised machine learning: a scoping review. BMJ open. 2021;11:e047755.

    听 听 听 听

  31. Nitta S, Tsutsumi M, Sakka S, Endo T, Hashimoto K, Hasegawa M, et al. Machine learning methods can more efficiently predict prostate cancer compared with prostate-specific antigen density and prostate-specific antigen velocity. Prostate Int. 2019;7:114鈥8.

    听 听 听 听

  32. Chen S, Jian T, Chi C, Liang Y, Liang X, Yu Y, et al. Machine learning-based models enhance the prediction of prostate cancer. Front Oncol. 2022;12:941349.

    听 听 听 听

  33. Blanc-Lapierre A, Spence A, Karakiewicz PI, Aprikian A, Saad F. Parent M-脡.Metabolic syndrome and prostate cancer risk in a population-based case-control study in Montreal, Canada. 樱花视频. 2015;15:913.

    听 听 听 听

  34. Tande AJ, Platz EA, Folsom AR. The metabolic syndrome is associated with reduced risk of prostate cancer. Am J Epidemiol. 2006;164:1094鈥102.

    听 听 听

  35. Esposito K, Chiodini P, Capuano A, Bellastella G, Maiorino MI, Parretta E, et al. Effect of metabolic syndrome and its components on prostate cancer risk: meta-analysis. J Endocrinol Invest. 2013;36:132鈥9.

    CAS听 听 听

  36. Xiang YZ, Xiong H, Cui ZL, Jiang SB, Xia QH, Zhao Y, et al. The association between metabolic syndrome and the risk of prostate cancer, high-grade prostate cancer, advanced prostate cancer, prostate cancer-specific mortality and biochemical recurrence. J Exp Clin Cancer Res. 2013;32:9.

    听 听 听 听

  37. Sesso HD, Paffenbarger RS Jr., Lee IM. Alcohol consumption and risk of prostate cancer: the Harvard Alumni Health Study. Int J Epidemiol. 2001;30:749鈥55.

    CAS听 听 听

  38. Zhao J, Stockwell T, Roemer A, Chikritzhs T. Is alcohol consumption a risk factor for prostate cancer? A systematic review and meta鈥揳nalysis. 樱花视频 Cancer. 2016;16:845.

    听 听 听 听

  39. Fawehinmi TO, Ilom盲ki J, Voutilainen S, Kauhanen J. Alcohol consumption and dietary patterns: the FinDrink study. PLoS ONE. 2012;7:e38607.

    CAS听 听 听 听

  40. Coughlin SS. A review of social determinants of prostate cancer risk, stage, and survival. Prostate Int. 2020;8:49鈥54.

    听 听 听

  41. Yeo Y, Shin DW, Lee J, et al. Personalized 5-year prostate cancer risk prediction model in Korea based on nationwide representative data. J Pers Med. 2021;12:2.

    听 听 听 听

  42. Kim SH, Kim S, Joung JY, et al. Lifestyle risk prediction model for prostate cancer in a Korean population. Cancer Res Treat. 2018;50:1194鈥202.

    CAS听 听 听

Acknowledgements

Not applicable.

Funding

This work was supported by International Cooperation & Education Program (NCCRI鈥CCI 52210-52211, 2020) of National Cancer Center, Korea, and grants from National Research Foundation of Korea (2021R1A4A1032861), and National Cancer Center, Korea (NCC 2210990).

Author information

Authors and Affiliations

Authors

Contributions

Formal analysis, TT.T., J.L.; Preparation of original draft, TT.T.; Writing-review and editing, J.K., SY.K., H.C., J.K.; Data curation, J.L., J.K.; Investigation, J.L.; Methodology, J.L., J.K.; Funding acquisition, J.K.; Project administration, J.K.; Supervision, J.K. All authors have critically reviewed and approved the final version of the manuscript submitted for publication.

Corresponding author

Correspondence to Jeongseon Kim.

Ethics declarations

Ethics approval and consent to participate

We obtained written informed consent from all participants and approval for the study protocol from the institutional review board of the National Cancer Center (No. NCCNCS-07-077).

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher鈥檚 note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article鈥檚 Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article鈥檚 Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit .

About this article

Cite this article

Tran, T.T., Lee, J., Kim, J. et al. Machine learning algorithms that predict the risk of prostate cancer based on metabolic syndrome and sociodemographic characteristics: a prospective cohort study. 樱花视频 24, 3549 (2024). https://doi.org/10.1186/s12889-024-20852-8

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12889-024-20852-8

Keywords