Identification of potential crucial genes and key pathways in osteosarcoma

Background The aim of this study is to identify the potential pathogenic and metastasis-related differentially expressed genes (DEGs) in osteosarcoma through bioinformatic analysis of Gene Expression Omnibus (GEO) database. Results Gene expression profiles of GSE14359, GSE16088, and GSE33383, in total 112 osteosarcoma tissue samples and 7 osteoblasts, were analyzed. Seventy-four normal-primary DEGs (NPDEGs) and 764 primary-metastatic DEGs (PMDEGs) were screened. VAMP8, A2M, HLA-DRA, SPARCL1, HLA-DQA1, APOC1 and AQP1 were identified continuously upregulating during the oncogenesis and metastasis of osteosarcoma. The enriched functions and pathways of NPDEGs include procession and presentation of antigens, activation of MHC class II receptors and phagocytosis. The enriched functions and pathways of PMDEGs include mitotic nuclear division, cell adhesion molecules (CAMs) and focal adhesion. With protein-protein interaction (PPI) network analyzed by Molecular Complex Detection (MCODE) plug-in of Cytoscape software, one hub NPDEG (HLA-DRA) and 7 hub PMDEGs (CDK1, CDK20, CCNB1, MTIF2, MRPS7, VEGFA and EGF) were eventually selected, and the most significant pathways in NPDEGs module and PMDEGs module were enriched in the procession and presentation of exogenous peptide antigen via MHC class II and the nuclear division, respectively. Conclusions By integrated bioinformatic analysis, numerous DEGs related to osteosarcoma were screened, and the hub DEGs identified in this study are possibly part of the potential biomarkers for osteosarcoma. However, further experimental studies are still necessary to elucidate the biological function and mechanism of these genes.


Introduction
Osteosarcoma is the most prevalent primary bone malignancy and the 8th most frequent type of malignancy that disproportionally affects children and young adults [1]. In recent decades, the improvement in osteosarcoma's treatment (surgery and chemotherapy) has largely increased the long-term survival rate (approximately 60-70%) of children and young adult patients with osteosarcoma without distal metastasis [2,3]. However, the etiology remains unknown, and this discourages the prevention and early diagnosis of osteosarcoma. Therefore, it is extremely necessary to explore the mechanisms behind the occurrence and progression of osteosarcoma.
In recent years, the development in molecular biology has provided some new insights into the potential diagnostic and therapeutic biomarkers for osteosarcoma [4]. Genome-wide molecular profiling, which reveals molecular changes in tumorigenesis and progression, has been proved to be an efficient approach to identify key genes [5,6]. However, it requires considerable time and fund to obtain clinical biological samples and subsequently conduct high-throughput genetic detection and analysis. Cumulative studies in the past have shown that re-analyzing gene datasets of previous experiments from online databases is a feasible way to find out biologically and clinically relevant biomarkers (genes) [7][8][9][10][11], and that some of those biomarkers (genes) have even been found to play important roles in osteosarcoma. For instance, by conducting bioinformatics analysis on three datasets deposited in GEO database (GSE36001, GSE19276 and GSE16088), Pan Liu et al. [10] revealed that tumor protein p53 (TP53), mitogen-activated protein kinase 1 (MAPK1), estrogen receptor 1 (ESR1), notch homolog protein 3 (NOTCH3) and caspase 1 (CASP1) might potentially be important osteosarcomaassociated genes. Among them, mutant TP53 was subsequently reported to be associated with poor survival of osteosarcoma patients, because it can increase the cell proliferation, migration, and chemoresistance in osteosarcoma [12]; MAPK1 has been confirmed to be highly expressed in osteosarcoma cells, and can be downregulated by osteosarcoma related tumor suppressive miR-511 [13]. Based on this, regulation of MAPK1 receptor expression may be a novel approach to treat osteosarcoma. Not long ago, Notch3 overexpression also was confirmed to be associated with metastasis and poor prognosis in osteosarcoma patients [14]. All those examples suggest that bioinformatics analysis is a feasible approach to identify specific genes that may provide valuable clues for investigating the pathogenesis of osteosarcoma.
The current study aims to investigate the crucial genes and key pathways potentially involved in osteosarcoma tumorigenesis and development. To achieve this, we integrated bioinformatics analysis based on Gene Expression Omnibus (GEO) datasets. The data obtained indicate that some genes might continue to participate in the occurrence and metastasis of osteosarcoma.
The three gene expression profiles above were downloaded from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/) [18] for identification of DEGs. Detailed information of all datasets included is listed in Table 1.

Data preprocessing
The analysis of raw probe-level data (.CEL files) was performed using the robust multiarray average algorithm (RMA) in the Affy package of R [19]. After background correction and quantile normalization, the expression values were obtained. The averages of the probe set of values were calculated as the expression values for the same gene with multiple probe sets [20].

Identification of DEGs
Identification of DEGs was performed using the LIMMA package of R [21]. The adjusted P-values (adj P-value) were adopted to avoid the occurrence of false-positive results. Using the Benjamini-Hochberg method [22] via the multtest package in R, the FDR (that is, adjusted pvalue) < 0.05 and |log2fold-change (FC)| > 1 were used as the cut-off criteria, as previously reported [7][8][9][10][11]. Online tool EHBIO ImageGP (http://www.ehbio.com/ ImageGP) operated by EHBIO Gene Technology (Beijing) Co., Ltd. (Beijing, China) was applied to generate volcano plot and Venn diagram, respectively, for the visualization of the identified DEGs.

Functional enrichment analysis
GO (Gene Ontology) function and KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway enrichment analyses of the DEGs were performed using the clusterProfiler package of R [8]. The GO and KEGG terms with FDR < 0.05 were regarded as significant functions and pathways.

Protein-protein interaction network construction and module analysis
The Search Tool for the Retrieval of Interacting Genes (STRING; http://string.embl.de/) is a database of protein-protein interactions known and predicted (PPIs) [23]. Based on the STRING online tool, PPIs of the DEGs were constructed with a confidence score ≥ 0.7. Subsequently, the PPI network was visualized by means of Cytoscape software (version 3.7.2). Furthermore, Molecular Complex Detection (MCODE) [24] plug-in in Cytoscape software was applied to explore the significant modules in PPI network. The advanced options set as degree cutoff = 2, K-Core = 2, and Node Score Cutoff = 0.2. Given that it's hard to conduct enrichment analysis based on small gene sets with the clusterProfiler package of R, instead conducted was the GO function enrichment analysis of DEGs in each module using ClueGo [25] and CluePedia [26] plug-ins of Cytoscape software. The GO terms with FDR < 0.05 (Benjamini-Hochberg method) were regarded as significant functions.

Identification of DEGs between normal osteoblasts and osteosarcoma samples
According to the screening criteria, this study enrolled three datasets (Table 1) Table 1).

Functional enrichment analysis of DEGs between Normal tissue and osteosarcoma samples
To further investigate the biological functions of the 74 NPDEGs, GO and pathway analysis were performed using the clusterProfiler package of R. GO analysis (Supplementary Table S2) showed that the NPDEGs between osteosarcoma and normal tissue samples were clustered in 82 significant biological process (BP) categories. As shown in Fig. 2a (top ten BP categories), most were clustered in antigen procession and presentation. NPDEGs were clustered in 43 significant cellular component (CC) categories. As shown in Fig. 2b (top ten CC categories), the most significant CC category was MHC class II protein complex. DEGs were clustered in 4 significant molecular function (MF) categories. As shown in Fig. 2c, the most significant MF category was MHC class II receptor activity. KEGG analysis identified 28 significant pathways, such as phagosome and antigen procession and presentation (Fig. 2b, Supplementary Table S2).

Identification of DEGs between primary and metastatic osteosarcoma samples
Based on the screening criteria, only GSE14359 dataset was selected for identifying genes differentially expressed between primary and metastatic osteosarcoma samples. There were 764 primary-metastasis DEGs (PMDEGs, 309 up-regulated and 455 down-regulated) in GSE14359 ( Fig. 3a and Supplementary Table S3). Interestingly, seven overlapping up-regulated DEGs (VAMP8, A2M, HLA-DRA, SPARCL1, HLA-DQA1, APOC1 and AQP1) were identified between the NPDEGs and PMDEGs (Fig. 3b, Table 2), whereas there was none overlapping down-regulated DEG (Fig. 3c). This suggests that these Functional enrichment analysis of DEGs between primary and metastatic osteosarcoma samples GO analysis (Supplementary Table S4) showed the 764 PMDEGs were clustered in 162 significant BP categories. As shown in Fig. 4a (top ten BP categories), the most significant BP category was mitotic nuclear division.
PMDEGs were clustered in 57 significant cellular CC categories. As shown in Fig. 4b (top ten CC categories), the most significant CC category was extracellular matrix (ECM). PMDEGs were clustered in 16 significant molecular function (MF) categories. As shown in Fig. 4c (top ten MF categories), the most significant MF category was alpha-amylase activity. KEGG analysis identified 25 significant biological pathways, such as cell adhesion molecules (CAMs) and focal adhesion (Fig. 4d, Supplementary Table S4).

PPI (protein-protein interaction) network and module analysis
PPI Network was subsequently analyzed and proteins were selected based on a combined score ≥ 0.7 in STRI NG analysis. There were 49 nodes and 91 interactions among the 74 NPDEGs (Fig. 5a, Supplementary Table  S5). In addition, one significant module with a score = 5 was screened out via MCODE, and HLA-DRA was the hub gene with the highest degree of connectivity (Table 3). GO analysis with ClueGO showed that the most significant BP category in this module was enriched in the antigen processing and presentation of exogenous peptide antigen via MHC class II (Fig. 5b, Supplementary Table S6). There were 521 nodes and 2415 interactions among the 764 PMDEGs, and three significant modules with a score ≥ 10 were screened out via MCODE (Fig. 6a,  Supplementary Table S7). Module 1 (score = 32.5) included 36 genes, with CDK1, CDK20 and CCNB1 as hub nodes (Table 3). GO analysis with ClueGO showed that the most significant BP category in this module was enriched in the nuclear division (Fig. 6b, Supplementary  Table S8). Module 2 (score = 13.8) included 14 genes, with MTIF2 and MRPS7 as hub nodes ( Table 3). The most significant BP category was enriched in the mitochondrial translation (Fig. 6c, Supplementary Table  S8). Module 3 (score = 12) included 12 genes, with VEGFA and EG as hub nodes ( Table 3). The most significant BP category was enriched in the platelet degranulation (Fig. 6d, Supplementary Table S8).

Discussion
This study has gained some insights into gene expression modules in osteosarcoma at a genome-wide scale through analyzing three osteosarcoma datasets. A panel of 74 NPDEGs was identified as associated with osteosarcoma tumorigenesis; and 364 PMDEGs were identified as associated with the osteosarcoma metastasis. In addition, it was noticed that seven genes (VAMP8, A2M, HLA-DRA, SPARCL1, HLA-DQA1, APOC1 and AQP1) were continuous upregulating during the oncogenesis and metastasis of osteosarcoma, which suggested that these genes may act as oncogenes and be consistently involved in the pathophysiological process of osteosarcoma. This study further obtained major histocompatibility complex, class II, DR alpha (HLA-DRA) as the hub NPDEGs from the top module with MCODE score = 5, and 7 hub PMDEGs (CDK1, CDK20, CCNB1, MTIF2, MRPS7, VEGFA and EGF) from three top modules with MCODE score > 10. These may be pivotal genes involved in the pathophysiological process of osteosarcoma.
Among these 8 hub genes, HLA-DRA, which is correlated with the procession and presentation of peptide antigen via MHC class II, continued to be up-regulated during the osteosarcoma oncogenesis and metastasis. Apparently, HLA-DRA may have the "driving" function in osteosarcoma. It has been proved as a predictor for metastasis of osteosarcoma [27]. Although until now there is no report about the function and mechanism of HLA-DRA in osteosarcoma, previous studies have shown that HLA-DRA is involved in the evasion of the virus from the immune system [28] and Alzheimer's disease [29]. Pan Y et al. also listed HLA-DRA as one of the crucial genes in the regulatory network of osteosarcoma they constructed from the dataset GSE28424 [30]. These data indicate that the function of HLA-DRA in osteosarcoma is worthy of attention, especially on the topic of whether it plays a pathophysiological role in osteosarcoma through the process of antigen procession and presentation of peptide antigen via MHC class II. The function enrichment analysis results revealed that HLA class II alleles may be a main impactive factor in osteosarcoma. HLA-DQA1 is also an HLA class II variant that has been reported to be associated with the osteosarcoma risks [31]. Profound understanding of those genes' immunologic contribution to the etiology of osteosarcoma may be helpful for selecting rational therapeutic targets.
SPARCL1 is an ECM remodeling gene. It modulates extracellular calcium by binding to collagen I [32,33], which may reveal its potential role in osteosarcoma cell metastasis. Although Zhao SJ et al. [34] reported that SPARCL1 was downregulated in OS by epigenetic methylation of promoter DNA, and that SPARCL1 could suppress osteosarcoma metastasis and recruit macrophages by activation of canonical WNT/β-catenin signaling through stabilization of the WNT-receptor complex, this study, on the contrary, noticed that SPARCL1 continued to be upregulated during osteosarcoma development and metastasis. This contradiction is worthy of further confirmation by collecting clinical samples and expression analysis. Aquaporin 1 (AQP1) is a waterselective transporting protein in cell membranes, and it has been found to be overexpressed in various tumors and promote metastasis and neo-angiogenesis. AQP1 can promote osteosarcoma cell proliferation, adhesion, invasion and tumorigenesis by targeting TGF-β signaling pathway and focal adhesion genes [35], and recruit human BM-MSCs into the osteosarcoma   microenvironment [36]. These reports strongly support our current study's analysis result and confirm that AQP1 is an oncogene and metastasis promoter in osteosarcoma. Although seldom previous studies have revealed the expression and role of VAMP8, APOC1 and A2M in osteosarcoma, exploration in some other tumors has well proved that these genes are important tumorrelated regulatory factors [37][38][39]. However, due to the specificity of osteosarcoma in pathogenesis and signaling pathways involved, additional work is needed to extend the current observation and to clarify the potential causal mechanisms underlying the deregulation of these genes in osteosarcoma. With regards to these hub genes identified, cyclindependent kinase 1 (CDK1) and cyclin-dependent kinase 20 (CDK20) belong to serine/threonine protein kinase family. Cyclin B1 (CCNB1) is a pivotal protein responsible for the control of the cell cycle at the G2/M (mitosis) transition. All the three genes are involved in the cell cycle and growth. Reduction of CDK1 activities is crucial for the survival of osteosarcoma cells [40]. Overexpression of CCNB1 can facilitate the growth rate of osteosarcoma cells and increase their sensitivity to paclitaxel [41]. Several drugs were reported to inhibit cell proliferation or induce cell cycle arrest and apoptosis in human osteosarcoma by downregulating CCNB1 and CDK1 [42][43][44][45]. Both mitochondrial translational initiation factor 2 (MTIF2) and mitochondrial ribosomal protein S7 (MRPS7) are proteins implicated in mitochondrial translation. In this study, we have identified MRPS7 and MTIF2 as hub genes involved in the metastasis of osteosarcoma. Mitochondrial translation pathway plays essential roles in programmed cell death. The implication of mitochondria-mediated intrinsic pathway in human osteosarcoma has been observed [46], and inhibition of mitochondrial translation has been reported to be effective and selective in targeting osteosarcoma [47]. Therefore, protein synthesis involved in MRPS7 and MTIF2 within the mitochondrion might also have a potential connection with the development of osteosarcoma. Vascular endothelial growth factor A (VEGFA) is a classic angiogenic factor, which facilitates endothelial proliferation, migration and new vessel formation [48]. Currently, VEGFA has been reported to be very important in evaluating the angiogenesis in osteosarcoma [49]. Inhibition of VEGFA can successfully suppress osteosarcoma growth, metastasis and angiogenesis [50]. All these highlight its therapeutic value in osteosarcoma. Indeed, VEGFA pathway has been prioritized for the development of antiangiogenic therapies in osteosarcoma [51]. Epidermal growth factor (EGF) promotes cell epithelialmesenchymal transition, metastasis, and progression of osteosarcoma by activating MAPK and PI3K/AKT pathway, which can be blocked by the EGFR-specific inhibitor gefitinib [52]. Thus, EGF-targeting agents should be evaluated to prevent osteosarcoma from deteriorating.
Among the 74 NPDEGs identified, notable dysregulation of gene expression was observed clustered in immune related diseases, phagocytosis, antigen procession and presentation. Bone resorption is accomplished by osteoclasts, which can be seen as highly specialized macrophages [53]. Thus, bone microenvironment represents a unique compartment of the immune system, in which immunological cytokines form part of an intercellular crosstalk that is relevant to the development of osteosarcoma [54,55]. Osteosarcoma cells control the recruitment and differentiation of immune infiltrating cells and establish a local immune tolerant environment that is favorable to the tumor growth [56]. This is in agreement with the current demonstration that those NPDEGs in osteosarcoma are clustered in multiple immune diseases and T helper cells differentiation. Besides, osteoblasts can express major histocompatibility complex II (MHC class II) to present antigen [4]. Thus, deregulation of genes involved in antigen presentation may be an early event in osteosarcoma oncogenesis. MHC II is only expressed on the surface of antigen presenting cells (APC), such as macrophages, dendritic cells and B cells. APC presents exogenous peptides or endogenous peptides to helper T cells by binding MHC-II to peptides, and thus informs that the body is being invaded [57]. Previous studies have shown that osteosarcoma cells can express moderate to high levels of Herpes virus entry mediator on the tumor [58], and osteosarcoma cells can  [59]. Therefore, during the process of malignant transformation, osteosarcoma cells express some antigenic substances, which are recognized by APC and presented to helper T cells via MHC-II. In this way, APC helps to connect innate and adaptive immunity to tumor. These suggest that MHC-II mediates immune responses in the tumor microenvironment, thus it could be an alternative target for novel immune therapies and targeting antigen presentation may be clinically valuable in early intervention. Among the 764 PMDEGs, notable dysregulation of gene expression was observed in well-known metastatic related pathways including CAMs, Focal adhesion and ECM-receptor interaction. It was also found that the tumor necrosis factor (TNF) signaling pathway, which is always activated in human osteosarcoma cells [60], was significantly correlated to osteosarcoma metastasis. Hence, to abnormalize the function of the TNF signaling pathway might be a potential target for chemotherapy of advanced osteosarcoma [61]. Interestingly, cell cycle is also the key signal involved in osteosarcoma metastasis. Previous reports have revealed that cell cycle and apoptosis are two major dysregulated events in human malignancy cells [62]. Evolution of cancer is a complex process. Potentially oncogenic proliferative signals can couple to the induction of apoptosis, which restricts subsequent clonal expansion and neoplastic evolution. However, tumor progression occurs when these growthinhibitory mechanisms are thwarted by compensatory mutations. Deregulated cell proliferation and the obligate compensatory suppression of apoptosis provide a minimal 'platform' that is necessary to support further neoplastic progression, which in turn propels the tumor cell and its progeny into uncontrolled expansion and invasion [62].
The limitations of this study also should be recognized. First of all, when analyzing the DEGs, in view of the complexity of datasets in the study, it is impossible to consider all important factors-for example, different ages, races, regions, cell lineage as well as tumor stages and classification of patient. Secondly, according to the results, all the seven genes, which are continuously deregulating during the oncogenesis and metastasis of osteosarcoma, are actually up-regulated ones. Yet, the mechanism of upregulation has not been clear. Therefore, more evidences are required to find out the biological foundation. Finally, this study mainly focuses on analyzing the expression levels of genes involved in tumorigenesis and metastasis. Some of these genes have been reported as biomarkers for osteosarcoma, while the role of HLA-DRA, MTIF2, MRPS7 and CDK20, should also be further systematically investigated based on actual diseased tissues or even cell lines and animal models.
In conclusion, this study identified several DEGs that may be involved in the carcinogenesis and metastasis of osteosarcoma through comprehensive bioinformatics analyses, and unveiled a series of hub genes and pathways. However, further experimental studies are needed to elucidate the biological function and underlying mechanism of these genes in osteosarcoma.