Immunological analysis and differential genes screening of venous thromboembolism

Purpose To explore the pathogenesis of venous thromboembolism (VTE) and provide bioinformatics basis for the prevention and treatment of VTE. Methods The R software was used to obtain the gene expression profile data of GSE19151, combining with the CIBERSORT database, obtain immune cells and differentially expressed genes (DEGs) of blood samples of VTE patients and normal control, and analyze DEGs for GO analysis and KEGG pathway enrichment analysis. Then, the protein-protein interaction (PPI) network was constructed by using the STRING database, the key genes (hub genes) and immune differential genes were screened by Cytoscape software, and the transcription factors (TFs) regulating hub genes and immune differential genes were analyzed by the NetworkAnalyst database. Results Compared with the normal group, monocytes and resting mast cells were significantly expressed in the VTE group, while regulatory T cells were significantly lower. Ribosomes were closely related to the occurrence of VTE. 10 hub genes and immune differential genes were highly expressed in VTE. MYC, SOX2, XRN2, E2F1, SPI1, CREM and CREB1 can regulate the expressions of hub genes and immune differential genes. Conclusions Ribosomal protein family genes are most relevant to the occurrence and development of VTE, and the immune differential genes may be the key molecules of VTE, which provides new ideas for further explore the pathogenesis of VTE.


Background
Venous thromboembolism (VTE), including deep vein thrombosis and pulmonary embolism, is a highly prevalent and potentially fatal disease that causes 3 million deaths worldwide every year [1]. VTE is a common cancer complication [2]. The risk of venous thrombosis in cancer patients is as high as 7% due to their hypercoagulability or surgical intervention [3], which has become the second leading cause of death in cancer patients [4]. Compared with healthy people, the incidence of VTE in malignant tumor patients is 4-7 times higher, with a high incidence of morbidity and mortality [5]. From 2004 to 2016, the incidence of VTE in China was on the rise, increased from 34.8% in 2005 to 60.9% in 2014 [6]. In etiology, the most common cause of VTE is active malignant tumors. Therefore, it is important to study the pathogenesis of VTE to reduce the risk of postoperative death in cancer patients. Previous studies have generally believed that the occurrence of VTE is mainly related to the activation of coagulation function, slow blood flow, vascular wall injury, and its treatment uses anticoagulants, and thrombolytic drugs to prevent thrombus expansion and reduce thrombus. Nevertheless, in many VTEs, tissue factors reflecting vascular endothelial cell injury, P selectin activated by platelets did not rise significantly, which is difficult to explain with the traditional theory. It is reported that VTE is a complex disease influenced by environment [7], genetics [8] and epigenetics [9], which play important roles in the occurrence of VTE. A number of genes have been found to be susceptible to VTE occurrence [10,11]. Recently, some scholars have shown that immune gene mutations may provide new insights into the pathogenesis of VTE [12]. However, the specific pathogenesis of VTE is still unclear.
Ribosomal protein is an important component of ribosomes. Abnormal expression of ribosomal protein gene will seriously affect cell growth, proliferation and differentiation. It has been found that the ribosomal proteins synthesis is closely related to the growth and proliferation of tumor cells during tumorigenesis [13,14]. Recently, studies have found that ribosomes and immune-related genes are associated with systemic vasculitis [15]. However, studies on ribosomal proteins, immune-related genes and VTE are rarely reported. This study enriched the function of genes with different expression in VTE through bioinformatics, and obtained important pathways and molecules involved in the development process of VTE, which provided bioinformatics basis for the research on the mechanism of VTE occurrence and development, and also provided new potential targets for the research on the treatment of VTE.

Microarray data
This study downloaded the microarray chip GSE19151 from the GEO database (https://www.ncbi.nlm.nih.gov/ geo/). The chip contains a total of 133 blood samples were obtained from 70 VTE patients and 63 healthy controls. The chip platform is GPL571 [HG-U133A_2] Affymetrix Human Genome U133A 2.0 Array.

Obtaining immune cells
The R language was used to conduct background correction, add missing value and remove duplicate of the original data. The probe name of the chip GSE19151 matrix data was converted into the gene name through the platform file GPL571. The CIBERSORT database (http://cibersort.stanford.edu) provides a calculation method for quantifying cell composition from gene expression profiles of a large number of tissues and can estimate the immune composition of the sample. CIBERSORT was used to convert the gene expression profile data into the corresponding immune cell proportion data of the samples, run with 100 permutations and a threshold of P < 0.05. Then the Perl was used to filter the data and delete the samples with P > 0.05. The R software pheatmap and vioplot package were used to draw the heatmap and vioplot of the immune cell distribution in the samples.

Obtaining differentially expressed genes (DEGs)
The Limma package was use to screen DEGs, the screening criterion was set as | log 2 FC |> 1 and adjusted P < 0.05. The R software is used to draw heatmap and volcano plot of DEGs.

Enrichment analysis of DEGs
Gene Ontology (GO) includes biological process (BP), cellular component (CC), and molecular function (MF). To understand the functions of DEGs, we used the R software clusterProfiler package to conduct GO enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis on DEGs, and drew bubble plot for GO analysis and network plot for KEGG pathway. A adjusted P < 0.05 was considered as statistically significant.
PPI network construction and hub genes screening STRING (Search Tool for the Retrieval of Interacting Genes, https://string-db.org/) database [16] can construct protein-protein interaction network to evaluate functional genomics data. We used STRING online database to construct PPI network of DEGs, the obtained source files were imported into Cytoscape software for visual analysis and hub gene screening. Meanwhile, MCC algorithm was used to select the top 10 hub genes.

Immune differential genes screening
To further understand the immune genes involved in VTE, the immune-related genes were obtained through the IMMPORT (https://www.immport.org/) database [17], which consists of four components-Private Data, Shared Data, Data Analysis, and Resources for data archiving, dissemination, analyses, and reuse. The VTE immune differential genes were screened in VTE DEGs. The screening criterion was set as | log 2 FC | > 1 and adjusted P < 0.05.

TF-Hub-and immune genes network establishment
NetworkAnalyst (http://www.networkanalyst.ca) database [18] is a comprehensive visualization analysis platform for gene expression profiling and meta-analysis designed to meet the demands of PPI networks to interpret gene expression data. It can conduct gene expression profile analysis of 17 different species. In addition to the general PPI network, it can also create cell type or tissue specific PPI networks, gene regulation networks, gene co-expression networks, and pharmacogenomics research networks. To further understand the TFs that regulate the hub gene and immune differential genes, we obtained the TFs of regulating hub gene through the NetworkAnalyst database based on the data provided on ChEA.

Immune cells in VTE and normal samples
Converting the gene expression profile data into the corresponding immune cell proportion data of the samples, we obtained the difference in the distribution of the immune cells in the VTE and normal group. The results showed that compared with the normal group, monocytes and resting mast cells were higher in number in the VTE group, while regulatory T cells were significantly lower (P < 0.05), as shown in Fig. 1a and b.

Screening of differentially expressed genes
To further investigate the intrinsic differences in immunity between VTE and normal samples, we performed a bayes test on GSE19151 gene chip data, and obtained a total of 88 DEGs (VTE group/normal control a b Fig. 1 Immune cells in VTE and normal control. a Heatmap analysis of the 22 immune cells. The horizontal axis shows the samples are divided into normal control and VTE group. b Vioplot of immune cells. The blue and red violin represent the normal control and the VTE group, respectively group), of which 77 were upregulated and 11 were downregulated. The expression of the differential genes in the two groups is shown in Fig. 2a and b.

Function and pathway enrichment analyses of DEGs
The R language was used to perform functional enrichment analysis on 88 DEGs, and the results were shown in Fig. 3a. DEGs are mainly involved in the composition of ribosomes and respiratory chains in the cytoplasm. The biological process involves nuclear transcription mRNA metabolism, translation initiation, and endoplasmic reticulum protein localization. The molecular function mainly focus on ribosome composition, cytochrome C oxidase activity, and oxidoreductase activity. KEGG pathway enrichment analysis showed that the DEGs are mainly involved ribosomes, oxidative phosphorylation, Huntington disease, Parkinson disease, myocardial contraction and other processes (Fig. 3b).

PPI network analysis and hub gene screening of DEGs
The 88 DEGs were input into the STRING database, and the obtained data were imported into Cytoscape to construct PPI network (Fig. 4a) and to find the top 10 hub genes, which were RPS29, RPL9, RPL27, RPS15A, RPS17, RPS27, RPS24, RPL30, RPL34 and RPL35 (Fig.  4b). The expression differences of these 10 hub genes in the VTE group and the normal control group are shown in Fig. 4c.

Immune differential genes screening
To further clarify the relationship between VTE and immune genes, we combined the selected DEGs and the immune genes of the IMMPORT database to obtain a total of 3 immune differential genes S100A12, NR1D2 and CD3G. Compared with the control group, S100A12, NR1D2 and CD3G were significantly overexpressed in the VTE group. The results are shown in Fig. 5.

Analysis of TFs regulatory network of DEGs
To further study the molecules that regulate 10 hub genes and three immune differential genes S100A12, NR1D2 and CD3G, this study screened the potential regulatory relationship between TFs and DEGs based on the data provided on ChEA, and predicted TFs that can regulate the expression of hub genes and immune differential genes. It was found that MYC, SOX2, XRN2, E2F1, SPI1, CREM and CREB1 can regulate the expression of these 13 genes (Fig. 6).

Discussion
VTE is a systemic disease. A study of more than 1 million cancer patients showed that the incidence of VTE increased by 28% between 1995 and 2003 [19]. In addition, among VTE patients, cancer patients can account for up to 20%, so it is crucial to study the mechanism of VTE. Mauracher LM et al. [20] studied the mechanism of cancer-related VTE and found that biomarkers formed by Neutrophil extracellular traps (NETs) are related to the occurrence of VTE in cancer patients, and can become a new pathogenic mechanism reflecting cancerrelated VTE. At present, although a lot of work has been devoted to the study of VTE molecular mechanism, the related research is still in the exploratory stage. Therefore, the research on VTE related molecular mechanisms can not only re-understand the pathogenesis of VTE, but also provide new ideas for clinical treatment. Omics and bioinformatic algorithms, mostly network-oriented, are one of the most effective ways to capture new pathways of cardiovascular disease [21,22]. Based on this, the molecular level analysis of VTE was carried out by bioinformatics to provide new targets for the prevention and treatment of VTE.
The immune cell analysis of this study showed that monocytes and resting mast cells were significantly overexpressed in VTE, while regulatory T cells were significantly lower expressed. A study suggested that monocyte-related inflammatory mechanisms may be involved in the pathophysiological processes of VTE [23]. Monocytes, platelet aggregation, and C-reactive protein are associated with VTE [24], which supports the conclusion that VTE is associated with inflammation. GO enrichment analysis found that DEGs are mainly involved in the composition of ribosomes in the cytoplasm. The occurrence of VTE is related to the ribosomes, cytochrome C oxidase activity, and the oxidoreductase activity on heme. It has been reported that the heme released by the large amount of erythrolysis in early thrombus is the main source of vascular endothelial cell oxidation, which can further lead to endothelial cell dysfunction and thrombosis [25,26]. Since heme is an inflammatory substance, the inflammatory process is increasingly recognized as an important mechanism for regulating thrombosis [27]. In addition, heme can also promote the transformation of LDL into cytotoxic oxidation products. Elevated oxidized LDL is a known risk factor for cardiovascular and cerebrovascular diseases, and it is also a common independent risk factor for VTE [28]. As can be seen, the oxidoreductase acting on heme plays an important role in VTE.
KEGG pathway enrichment analysis showed that DEGs are mainly involved in ribosomes, oxidative phosphorylation, Huntington's disease, Parkinson's disease, myocardial contraction and other processes. In addition, the 10 hub genes RPS29, RPL9, RPL27, RPS15A, RPS17, RPS27, RPS24, RPL30, RPL34 and RPL35 screened from DEGs all belong to the ribosomal protein family, and they are all highly expressed in VTE. Studies have shown a b Fig. 2 DEGs in the VTE and normal group. a Heatmap analysis of differential genes. The red and green represent the significantly upregulated and downregulated DEGs. b Volcano plot of the differentially expressed genes. These genes consist of 77 upregulated genes and 11 downregulated genes. The screening criterion was | log 2 FC |> 1 and adjusted P < 0.05 that cardiovascular and cerebrovascular diseases are important risk factors for VTE [29], and ribosomes can be used as targets for cardiovascular diseases [30]. Therefore, it can be speculated that there is a certain connection between them. In addition, scholars have found that ribosomes play an important role in the translation of platelet proteins [31], and platelets can promote the formation of NETs, which in turn promotes the formation of deep vein thrombosis [32]. This conclusion supports our speculation. However, there is little direct evidence of ribosomal protein family genes associated with thrombosis. Inflammation is one of the initial responses of the immune system to stimulus. Studies have found that immune dysfunction may be associated with the occurrence of VTE [33]. Recently, scholars can use immunology to explain some "unprovoked" VTE cases [34]. In order to further clarify the relationship between VTE and the immune system, we selected three immune differential genes S100A12, NR1D2 and CD3G from DEGs, which were significantly overexpressed in VTE. Calcium binding protein S100A12 is an immune molecule present on the mucosal surface, which is mainly secreted by neutrophils [35]. In addition, an increase in the number of neutrophils can   increase the risk of VTE events [36]. A cross-sectional study of 550 hemodialysis patients showed that S100A12 protein levels were closely related to the prevalence of cardiovascular disease [37]. Therefore, S100A12 can be considered as a biomarker for inflammatory diseases such as VTE. Studies have found that genetic defects of CD3G can lead to impaired expression of CD3 and TCR on the cell surface, so CD3G mutations are related to immune function defects [38]. CD3G gene polymorphism may affect the occurrence of liver cancer [39]. At present, most of the research on CD3G focuses on tumors, and there are few reports on vascular diseases and thrombosis. NR1D2, a nuclear hormone receptor gene, is associated with circadian rhythm, and circadian dysregulation widely affects Ras signaling pathways, T cell receptor signaling pathways, and then triggers various diseases such as tumors [40]. NR1D2 is also an important regulator of vascular inflammation and is related to the occurrence of cardiovascular events [41]. As a common complication of tumor, VTE may be closely related to immune genes CD3G and NR1D2. Compared with previous studies, this article focuses on the genetics and immunology of VTE, and finds that ribosomal protein family genes and immune-related genes are closely related to VTE, which provides new ideas for further exploring the pathogenesis of VTE. Of course, this study also has some limitations. First, the chip data in this study is a single-center study with limitations in representativeness. Second, the molecular mechanism and therapeutic targets of VTE still need to be verified by a series of experiments.

Conclusions
In summary, this study use bioinformatics analysis to find that the occurrence of VTE is related to the abnormal expression of monocytes and mast cells resting. Ribosomes play a vital role in the occurrence of VTE. The ribosomal protein family genes may be potential therapeutic targets for the treatment of VTE. In addition, immune genes S100A12, NR1D2 and CD3G may also be effective targets for the prevention and treatment of VTE, providing new ideas for the mechanism and treatment of VTE. It can be seen that the occurrence of VTE is related to inflammation, abnormal gene expression and immune abnormalities.