Identification of the prognostic value of Th1/Th2 ratio and a novel prognostic signature in basal-like breast cancer

Background 
 Breast cancer is a heterogeneous group of diseases. The polarization of CD4+ T helper (Th) lymphocytes (mainly Th1 and Th2) may differ in breast cancers with different outcomes, but this has not been fully validated. Methods This study is a bioinformatic analysis, in which differentially expressed genes (DEGs) were identified in patients with low and high Th1/Th2 ratios. And then, DEG functions, hub genes and independent predictors were determined. Results Low Th1/Th2 ratio was associated with poor outcome in Luminal A and basal-like breast cancer (p < 0.05). GSEA and KEGG analysis of DEGs obtained from comparing low and high Th1/Th2 ratios illuminated downregulation of immune-related gene sets and pathways affecting Th1/Th2 balance toward Th2 polarization (p < 0.05). Survival and Cox analyses of all the DEGs confirmed CCL1 and MYH6 were independent protective factors and IFNK and SOAT2 were independent risk factors for basal-like breast cancer (95%CI: 1.06–2.5, p = 0.026). Then a four-gene signature was constructed and achieved a promising prognostic value (C-index = 0.82; AUC = 0.826). Conclusions Low Th1/Th2 ratio predicts poor outcome in Luminal A and Basal-like breast cancer, and downregulation of immune-related gene sets and pathways contribute to Th1/Th2 balance toward Th2 polarization. CCL1, MYH6, IFNK, and SOAT2 have an independent prognostic value of survival outcome and might be novel markers in basal-like breast cancer. Supplementary Information The online version contains supplementary material available at 10.1186/s41065-023-00265-0 .


Introduction
Breast cancer is one of the most frequently diagnosed cancers worldwide and has become a global health concern for women [ 1 ]. Through continuous scientific efforts, a foundation for the treatment of breast cancer has been laid, mainly consisting of surgery, chemotherapy, endocrine therapy, and radical therapy [ 2 ]. Although a relatively better outcome has been achieved with breast cancer compared with other solid tumors, and a 5-year survival rate of over 80% is a remarkable success, there are still patients with poor prognosis [ 3 ],while immune related gene signature may contribute to a better prognostic assessment.
Immune-related studies are widely used in oncology, among which the balance of T helper (Th)1/Th2 lymphocytes has been investigated intensively and found to be linked with other conditions such as inflammation, immune diseases, and tumors [ 4,5 ]. Previous studies have reported that Th1 cells produce interleukin (IL)-2, tumor necrosis factor (TNF)-β, and interferon (IFN)-γ and activate cytotoxic T lymphocytes (Tc), NK cells, macrophages, and monocytes, playing an important role in the immune response against tumors. Th2 cells produce IL-4, IL-5, IL-6, IL-9, IL-10, and IL-13 and act against Th1 cells [ 6 ]. A shift of Th1/Th2 cell subsets toward Th2 cells in malignant tumors has been reported [ 7,8 ]. Furthermore, T cell differentiation is a complex process stimulated by different antigens, cytokines, or antigen-presenting cells. Th0 cells can be transformed into Th1 or Th2 cells, or promote Th1 cell transformation to Th2 cells, thus, causing a shift in Th1/Th2 balance. Among the cytokines associated with the Th1/Th2 balance, IL-4 and IFN-γ play a key role in the differentiation of Th0 cells into Th1 and Th2. When the IL-4 level is high, Th0 cells mainly differentiate into Th2 cells. While IL-4 is deficient, the expression of IFN-γ increases and induces differentiation into Th1 cells. IFN-γ secreted by Th1 and IL-4 and IL-10 secreted by Th2 can not only promote their own differentiation and maturation but can also inhibit the differentiation and maturation of each other and form a regulatory network with other factors [ 9,10 ].
Emerging evidence has confirmed the predictive value in the prognostic and drug efficacy of Th1/Th2 balance in breast cancer [ 11,12 ]. However, the differences of gene expression pattern at different levels of the Th1/Th2 ratio and the mechanism behind them are still not fully clarified. This study aims to investigate the prognostic value of the Th1/Th2 ratio in different breast cancer subtypes and further explore the prognostic value of Th1/Th2 balance related gene signature in breast cancer.

Data preparing and survival analysis
The data of 1075 patients who were female and had complete overall survival (OS) information were acquired from the TCGA BRCA dataset. The RNA-seq counts data underwent a normalization procedure using variance stabilizing transformation in R and was annotated by gencode.v22.annotation downloaded from https:// gdc. cancer. gov/ . According to the PAM50 criteria, patients were classified into luminal A (LumA), luminal B (LumB), Her2 overexpressed (Her2), basal-like (Basal), and normal-like (Normal) subtypes [ 15 ]. The Th1/Th2 ratio was calculated and survival analyses comparisons with different levels of Th1/Th2 ratio were performed in different breast cancer subtypes (clinical details in Supplementary data Table s 1 ).

Identification of DEGs
Setting criteria as p value < 0.05 and |Log2FC| > 1.5, DEGs were identified by the R package "DEseq2" comparing the low Th1/Th2 ratio group with the high Th1/Th2 ratio group, and genes with an average count value lower than 1 were excluded. The DEGs were visualized as a heatmap and MAplot using "pheatmap", "ggplot2", and "ggrepel" packages in R [ 16 ].

GSEA and KEGG enrichment analyses
Gene set enrichment analysis (GSEA) and analysis using Kyoto Encyclopedia of Genes and Genomes (KEGG) of whole DEGs were conducted to annotate gene functions [ 17 ]. All GSEA presented in this study were based on hallmark gene sets using the R package "clusterProfiler" [ 18 ]. Both adjusted P value and FDR value < 0.05 were considered as indicating significant enrichment. DEGs meeting the criteria of p value < 0.05 and |Log2FC| > 1.5 were analyzed for KEGG enrichment and the enriched pathways were visualized by the R package "clusterProfiler" and "pathview" [ 19 ]. Both an adjusted P value and FDR value of < 0.05 were considered as indicating a significant enrichment.

Risk score signature construction and validation
Univariate cox analysis and survival analysis by Logrank test were used to filter potential predictive markers and multivariate cox regression analysis was used in risk model construction. Then the predictive ability was assessed by receiver operating characteristic curve (ROC) to compare Th1/Th2 ratio, tumor stage, and age. Data in this study were randomly divided into a training set and testing set for risk model construction and validation. The risk score signature was assessed by the following formula: where Coefi is the multivariate Cox regression coefficient for the target mRNA and exp(i) is the expression value of each mRNA. According to the risk score signature cutoff point calculated by the "ROC" method in the R package "ggrisk" in the training set, all patient samples were divided into high-risk and low-risk groups.
The risk score signature was validated in testing set and the entire set, and then was further validated in an independent dataset GSE202203 with 288 cases basal-like breast cancer from GEO.

Statistical analyses
Survival curves were generated using the Kaplan-Meier method with R package "survival". The best cut points of variates of survival analysis were evaluated by the R package "survminer". The heatmap was performed by the R package "pheatmap" and the MA plot was constructed by "ggplot2" and "ggrepel". The t test, chi-square test, and Fisher's Exact test were used in the variance analyses. Cox analysis was used for multivariate analysis and correlation analysis was performed by the Spearman method. A risk score plot was constructed by using the R package "ggrisk" for Cox regression. All the statistical analyses were performed by R (version 4.1.0) and two-tailed P < 0.05 was considered as the standard for statistical significance.

Abundance of Th1 and Th2 cells and survival analysis
The detailed workflow of this study is shown in Fig. 1 . The abundance of Th1 and Th2 cells and the Th1/Th2 ratio level were analyzed with TCGA BRCA data ( Fig.  2 A-C). Grouped by high and low Th1/Th2 ratio, survival analyses were performed and showed that a low Th1/Th2 ratio was a poor prognostic factor in LumA and Basal subtypes (P < 0.05), whereas the prognostic value was not significant in LumB, Her2, and Normal subtypes (excluding patients with an OS time < 30 days) ( Fig. 2 D-H). We identified that the cutoff value of the Th1/Th2 ratio was 0.531 for the Basal subtype and extracted the Basal subtype breast cancer data for further analyses (clinical details of Basal subtype data in Table 1 ).

DEG identification
A total of 332 DEGs were identified from 19,495 protein-coding genes in the Basal subtype from the low Th1/Th2 ratio group in comparison with the high Th1/ Th2 ratio group (DEG details in Supplementary data Table S 2 ). A heatmap of the DEGs was constructed and then volcano plot was conducted to reveal the significance of DEGs (Fig. 3 ).

GSEA and KEGG analyses
To clarify the function and related pathways of these DEGs, we conducted GSEA and KEGG analysis. Enrichments were detected in the downregulation of the IFN-γ response, allograft rejection, inflammatory response, IFN-α response, IL6 JAK STAT3 signaling, complement, TNFα signaling via NF-κB, IL2 STAT5 signaling, apoptosis, KRAS signaling, and E2F target gene sets, and in the upregulation of myogenesis and epithelial-mesenchymal transition by GSEA (Table 2 & Fig. 4 B). Top five gene sets were shown in Fig. 4 C-G. In the KEGG analysis of 332 DEGs, 22 pathways were enriched and the top 10 enriched pathways estimated by gene ratio were cytokine-cytokine receptor interaction, viral protein interaction with cytokine and cytokine receptor, primary immunodeficiency, graft versus host disease, Th17 cell differentiation, antigen processing and presentation, natural killer  cell-mediated cytotoxicity, chemokine signaling pathway, inflammatory bowel disease, and hematopoietic cell lineage (Table 3 & Fig. 4 A).
We discovered that most of the enriched pathways were immune-related. A low Th1/Th2 ratio was associated with the downregulation of nearly all the enriched pathways mentioned above. In particular, the enrichment of the Th1 and Th2 cell differentiation pathway showed that the DEGs identified in this study were associated mostly with the downregulation of Th1 cell

Cox and survival analysis of DEGs
Univariate Cox analysis and survival analysis by Logrank test were performed with the DEGs. 30 genes with a p value < 0.05 in both univariate Cox analysis and survival analysis were identified (Table 4 ). Then, multivariate Cox regression analysis was used to determine the independent predictive values of the 30 genes in survival outcomes (Fig. 5 ). The data were randomly divided into a training set and testing set in a ratio of 6:4 (training set: 128 patients; testing set: 86 patients). We confirmed that, for basal subtype breast cancer, CCL1 (95%CI: 0.00-0.50, p = 0.022) and MYH6 (95%CI: 0.00-0.61, p = 0.026) were independent protective factors while IFNK (95%CI: 7.05-1482.33, p < 0.001) and SOAT2 (95%CI: 4.42-1184.82, p = 0.003) were an independent risk factor in a training set with Cox regression analysis and survival analyses with the four genes mentioned above (Fig. 6 & 7 A-D).

Risk score signature construction and validation
A Cox proportional-hazards model (a four-gene signature) comprising CCL1, IFNK, MYH6, and SOAT2 was constructed in a training set with the formula: By calculating the risk score, patients were regrouped into high-risk and low-risk training sets, and this model achieved a concordance index (C-index) of 0.82. A risk score plot showed the distribution of patients (Fig. 7  E). The survival analysis showed significant differences between the high and low-risk groups (p < 0.001) (Fig. 7  F). ROC analysis showed a superior predictive ability when comparing the four-gene signature with the Th1/Th2 ratio, tumor stage, and age (ROC of risk score: 0.826) (Fig. 7 G). Then model validation was conducted in the testing set, the entire set. Similar results of the risk score possessing  maximum area under the curve were achieved (ROC of the four-gene signature in testing set: 0.699; ROC of the four-gene signature in entire set: 0.744) (Fig. 8 A-F). ROC comparison in a different data set is listed in Table 5 .
We then further validated the four-gene signature in an independent dataset GSE202203 from GEO, which is a dataset of primary breast tumors with expression profiling from high throughput sequencing. Validation analyses showed that patients with low risk scores had better survival outcomes than those with high risk scores, and that the gene signature yielded good prediction results(ROC: 0.674) (Fig. 8 G-I).

Discussion
The Th1/Th2 balance status of tumor patients has been a concern of researchers and clinicians in recent years. Previous studies have shown that Th1/Th2 unbalance contributes to tumor progression and could be one of the mechanisms that cause immune escape. A shift in Th1/Th2 cell subsets has been reported in lung cancer, glioma, cervical cancer, breast cancer, gastric cancer, colorectal cancer, ovarian cancer, and liver cancer [20][21][22]. In the anti-tumor immune response, Th1 cells dominate the cellular immune function of the body and secrete Th1 cytokines, which play a vital role in the antitumor immune response. In contrast, Th2 cells work against Th1 cells. The hyposecretion of IL-2 and IFN-γ in peripheral blood is often detected in patients with advanced tumors, and the secretion of IL-10 increases, indicating that Th0 to Th2 differentiation is dominant during tumor growth [ 23 ]. The dominant state of Th2 is closely related to tumor immune escape, but the exact mechanism still needs clarification [ 24 ]. The shift of Th1/Th2 balance and its resulting genomic phenotypic changes may have an impact on tumor development. Basal-like breast cancer belongs to triple-negative breast cancer (TNBC), which is considered to be a highly heterogeneous type of breast cancer. Based on the gene expression profile, Lehmann's study divided TNBC into six subtypes: basal-like 1 (BL1), basallike 2 (BL2), immunomodulatory (IM), mesenchymal (M), mesenchymal stem-like (MSL), and luminal androgen receptor (LAR) [ 25 ]. Among these, the IM subtype has a high expression of immune response-related genes. Analogously, the FUSCC subtyping proposed by Jiang and colleagues, which classified TNBC into [ 1 ] luminal androgen receptor (LAR), [ 2 ] immunomodulatory (IM), [ 3 ] basal-like immune-suppressed (BLIS), and [ 4 ] mesenchymallike (MES), also includes a class of IM subtype with the high expression of PD1, PD-L1, CTLA4, and IDO1, which may benefit from immune-therapy targeting PD1 and/or PD-L1 [ 26 ]. Hence, at least for a significant percentage of basal like breast cancer patients, immunoregulation is strongly associated with their development and outcome. However, many tumor-related immune regulation mechanisms remain to be defined. Thus, we hope to further understand the mechanism of immune-related regulation in breast cancer by exploring the shift of the Th1/Th2 balance.
In this study, we began with Th1/Th2 balance, an important concept in immune regulation, and identified enriched gene sets and pathways that are related to its regulation. As shown in this study, a suppression mainly in the IFN-α response, IFN-γ response, allograft rejection, IL6 JAK STAT3 signaling, and inflammatory response could influence the Th1/Th2 balance. Furthermore, KEGG analysis demonstrates that the downregulation of IFN-γ and IL2 can be found in almost every pathway enriched. All the evidence that has emerged in this study relates to Th2 polarization [ 6,27 ]. In addition, it is noted that the downregulation of PD-L1 can be found in the cell adhesion molecule pathway and downregulation of PD-1 can be found in the T cell receptor signaling pathway, and the D-L1 expression and PD-1 checkpoint pathway in the cancer pathway . This may indicate that immune-therapy targeting PD1 and/or PD-L1 is not effective in breast cancer with a Th1/Th2 balance toward Th2.
We identified CCL1 and MYH6 as independent protective factors based on the different gene expression pattern with high or low Th1 / Th2 ratios, while IFNK and SOAT2 were independent risk factors from univariate and multivariate Cox regression analysis. Among them, CCL1 is a major Treg-attracting chemokine in human invasive breast cancer, positively correlated with Treg infiltration and ER-negative high-grade tumors [ 28 ]. On the contrary, IFNK, MYH6, and SOAT2 have rarely been reported in association with breast cancer. Previous studies showed that IFNK can be regulated by lncRNA and might affect the response to anthracycline treatment in ER-negative breast cancer [ 29 ]. MYH6 and SOAT2 may be associated with the progression of prostate cancer [ 30,31 ]. Therefore, the expression of the above four genes is associated with the development and prognosis of breast cancer. In addition, the four-gene signature constructed in our study indicates a synergistic prognostic value of the four genes in basallike breast cancer.
However, due to the lack of adequate research, their roles in breast cancer still need to be further clarified. We expect that these mystery genes may be novel markers for basal-like breast cancer. Furthermore, our study is based on a comprehensive bioinformatic analysis, further validation is needed to confirm our theory.

Conclusion
The Th1 / Th2 ratio is a prognostic factor for breast cancer, and was statistically significant in LumA and Basal-like breast cancer survival analysis. Downregulation of immune-related gene sets and pathways affects the balance of Th1/Th2 towards Th2 polarization and leads to poor outcome. We further constructed a four-gene signature comprising CCL1, IFNK, MYH6, and SOAT2 genes, which shows a promising predictive value for basal-like breast cancer and may be related to the underlying Th1/Th2 balance regulation mechanism, which is worthy of further study.
We would like to thank the associate editor and the reviewers for their useful feedback that improved this paper. We thank Guozi Learn Bioinformatics (WeChat Official Accounts). We also thank International Science Editing ( http:// www. inter natio nalsc ience editi ng. com ) for editing this manuscript.