Integrated identification of key genes and pathways in Alzheimer’s disease via comprehensive bioinformatical analyses

Background Alzheimer’s disease (AD) is known to be caused by multiple factors, meanwhile the pathogenic mechanism and development of AD associate closely with genetic factors. Existing understanding of the molecular mechanisms underlying AD remains incomplete. Methods Gene expression data (GSE48350) derived from post-modern brain was extracted from the Gene Expression Omnibus (GEO) database. The differentially expressed genes (DEGs) were derived from hippocampus and entorhinal cortex regions between AD patients and healthy controls and detected via Morpheus. Functional enrichment analyses, including Gene Ontology (GO) and pathway analyses of DEGs, were performed via Cytoscape and followed by the construction of protein-protein interaction (PPI) network. Hub proteins were screened using the criteria: nodes degree≥10 (for hippocampus tissues) and ≥ 8 (for entorhinal cortex tissues). Molecular Complex Detection (MCODE) was used to filtrate the important clusters. University of California Santa Cruz (UCSC) and the database of RNA-binding protein specificities (RBPDB) were employed to identify the RNA-binding proteins of the long non-coding RNA (lncRNA). Results 251 & 74 genes were identified as DEGs, which consisted of 56 & 16 up-regulated genes and 195 & 58 down-regulated genes in hippocampus and entorhinal cortex, respectively. Biological analyses demonstrated that the biological processes and pathways related to memory, transmembrane transport, synaptic transmission, neuron survival, drug metabolism, ion homeostasis and signal transduction were enriched in these genes. 11 genes were identified as hub genes in hippocampus and entorhinal cortex, and 3 hub genes were identified as the novel candidates involved in the pathology of AD. Furthermore, 3 lncRNAs were filtrated, whose binding proteins were closely associated with AD. Conclusions Through GO enrichment analyses, pathway analyses and PPI analyses, we showed a comprehensive interpretation of the pathogenesis of AD at a systematic biology level, and 3 novel candidate genes and 3 lncRNAs were identified as novel and potential candidates participating in the pathology of AD. The results of this study could supply integrated insights for understanding the pathogenic mechanism underlying AD and potential novel therapeutic targets. Electronic supplementary material The online version of this article (10.1186/s41065-019-0101-0) contains supplementary material, which is available to authorized users.


Background
Alzheimer's disease (AD) is recognized as the most common neurodegenerative disease and a typical hippocampal amnesia, and also one of the dominating deadly disease affecting elderly population. The disease is characterized by the extracellular senile plaques formed by amyloid-β (Aβ) peptides, intracellular neurofibrillary tangles (NFTs), and also structure and function changes of brain regions related to memory [1][2][3]. It is well known that AD has complex multiple pathogenic factors, such as genetic factor, environmental factor, immunological factor, head injuries, depression, or hypertension [4][5][6][7][8]. Among these factors, genetic factors are estimated to attribute about 70% to the risk for AD [9]. Dominant mutations of genes encoding APP (amyloid precursor protein), PSEN1 (presenilin 1), and PSEN2 (presenilin 2), which enhanced generation and aggregation of Aβ, were included in the established genetic causes of AD [10]. However, APP, PSEN1 and PSEN2 are only partially accountable for the pathogenic mechanism of AD patients [11,12]. Besides, genetic analyses have demonstrated that, individual differences of AD could be resulted from multiple genes and their variants, which exert various biological functions in coordination to enhance the risk of the disease [13][14][15]. Except for identifying mechanisms involved in the AD pathogenesis, comprehensive analyses of potential candidate genes could suggest novel potential strategies to predictive or diagnostic test for AD.
Hub genes, regulatory transcription factors and micro-RNAs in the entorhinal cortex tissues of mid-stage AD cases have been identified via analyzing the database of GSE4757 and therapeutic targets or biomarkers of the AD were demonstrated in previous study [16]. Multiple methods were employed in the identification of potential molecules targets and drug candidates to AD, and hub genes like ZFHX3, ErbB2, ErbB4, OCT3, MIF, CDK13, GPI and so forth were found in the analyses of current datasets, such as GSE48350, GSE36980, GSE5281, and so forth [17][18][19][20][21][22][23]. Whereas, the remarkable and integrated details of key candidate genes and pathways related with the pathogenesis of AD are still incomplete. Furthermore, it is well documented that long noncoding RNAs (lncRNAs) play vital roles in the regulation of gene expression epigenetic, transcriptional, and posttranscriptional levels [24][25][26], and only several lncRNAs have been validated to be involved in the pathogenesis of AD [27][28][29][30]. Given that sufficiently illuminating human lncRNA-AD associations have great potential benefit to diagnosis, prevention, treatment, and prognosis of AD, it is an urgent task to find novel connections between lncRNA and AD.
In the present study, we implemented integrated analyses of genes involved in AD from the information filtration of the database, GSE48350 [31][32][33]. We employed Morpheus, an online tool, to identified differentially expressed genes (DEGs). Then biological enrichment analyses were conducted to detect the remarkable functional terms and analyzed the reciprocities among the biological pathways enriched by pathway analyses methods. Moreover, a protein network specific in AD was speculated and evaluated in the background of the human protein-protein interaction (PPI) network. 3 novel genes and 3 lncRNAs were differently expressed in the tissues of hippocampus and/or entorhinal cortex between AD patients and normal ones were identified as novel and potential candidates of AD pathology, and binding proteins of these lncRNAs closely associate with pathogenic mechanism of AD. The results of the present study should supply ponderable hints for understanding the pathogenesis molecular mechanisms of AD from a standpoint of systems biology.

Identification of DEGs
The gene expression profile and sample information of post-mortem brain tissue samples of AD patients and normal people of GSE48350 were obtained from National Center of Biotechnology Information-Gene Expression Omnibus (NCBI-GEO) and ArrayExpress database, respectively, which are free databases of microarray/gene profile and next-generation sequencing. There were total 253 samples in this database, including microarray data from normal controls and AD cases aged 20-99 years, from 4 brain regions: hippocampus, entorhinal cortex, superior frontal cortex, and postcentral gyrus. We analyzed the differences of gene expression between 18 AD samples (69-99 years old) and age-matched 24 normal samples of hippocampus tissues, and 15 AD samples (69-99 years old) and age-matched 17 normal samples of entorhinal cortex tissues in the present study (Additional file 1: Table S1). Employing Morpheus software and using p < 0.05 and |log2FC| ≥ 1 (FC, fold change) as cut-off criterion, 251 genes (56 upregulated and 195 down-regulated genes) and 74 genes (16 up-regulated and 58 down-regulated genes) were identified as DEGs in the AD samples compared with the normal ones in the tissues of hippocampus and entorhinal cortex, respectively (Table 1 and Additional file 2: Table S2).

Gene ontology (GO) enrichment analyses of DEGs
GO analyses for the DEGs after gene integration were performed via Cytoscape and its plugs, Cluego and Cluepedia. 86 of the 251 DEGs from hippocampus tissues were mapped to 34 different biological processes (Fig. 1a), of which prominent examples are memory 17.65%, response to anesthetic 14.71%, chemical synaptic transmission 11.76% and cellular potassium ion transport 11.76% and neuropilin signaling pathway 11.76% ( Fig. 1b and Additional file 3: Table S3). 54 of the 251 DEGs from hippocampus tissues were mapped to 23 different cellular components (Fig. 1c), of which prominent examples are integral component of synaptic membrane 39.13%, leading edge membrane 21.74% ( Fig. 1d and Additional file 4: Table S4). 44 of the 251 DEGs from hippocampus tissues were mapped to 19 different molecular functions (Fig. 1e), of which prominent examples are potassium ion transmembrane transporter activity 47.37%, dicarboxylic acid transmembrane transporter activity 15.79% ( Fig. 1f and Additional file 5: Table S5). 17 of the 74 DEGs from entorhinal cortex tissues were mapped to 20 different biological processes (Fig. 1g), of which prominent examples are sodium ion homeostasis 45%, positive regulation of muscle contraction 15% and endoderm formation 15% ( Fig. 1h and Additional file 6: Table S6).

Pathway enrichment analyses of DEGs
In all, pathway enrichment analyses of DEGs were classified by the Kyoto Encyclopedia of Genes and Genomes (KEGG), Reactome and Wikipathway databases, respectively, using p < 0.05 as cut-off value. The different databases provided similar information with the majority of AD-related proteins acting in 19 major pathways (13 from hippocampus and 6 from entorhinal cortex), mainly related transmembrane transportation, drug reactions, synapses function, ion homeostasis, neurogenesis and signal transduction (Additional file 7: Table S7).

PPI network analyses and module analyses
Using the Search Tool for the Retrieval of Interacting Genes database (STRING) online database and Cytoscape software, total of 135 DEGs (26 up-regulated and 109 down-regulated genes) of the 251 commonly altered DEGs from hippocampus were screened into the DEGs PPI network, containing 135 nodes and 221 edges (Fig. 2a), and 116 of the 251 DEGs did not fall into the DEGs PPI network. Based on the STRING database, we made the PPI network of a total of 135 nodes and 221 protein pairs was obtained with a combined score > 0.4. As shown in Fig. 2a, the majority of the nodes in the network were down-regulated DEGs in AD samples. Among the 135 nodes, 9 central node genes were identified with the filtering of degree≥10 criteria (i.e., each node had more than 10 connections/interactions) as top 9 hub genes, which were CDC42, BDNF, TH, PDYN, VEGFA, CALB, CD44, TAC1 and CACNA1A (Fig. 2b). Figure 2b presents these 9 genes and their first neighbor genes.
In total, 2 modules (Modules 1 and 2) with score > 3 were detected by Cytoscape plug-in Molecular Complex 56 up-regulated genes and 195 down-regulated genes were included in the hippocampus tissues of AD patients' samples compared to normal hippocampus tissues; meanwhile, 16 up-regulated genes and 58 down-regulated genes were included in the entorhinal cortex tissues of AD patients' samples compared to normal entorhinal cortex tissues. (The up-regulated genes were listed from the largest to the smallest of fold changes, and down-regulated genes were listed from the smallest to largest of fold changes) Detection (MCODE) (Fig. 2c and d). KEGG pathway enrichment analyses showed that Module 1 consisted of 4 nodes and 5 edges, which are mainly associated with Endocytosis, Rap1 signaling pathway and Ras signaling pathway ( Table 2), and that Module 2 consisted of 10 nodes and 14 edges, which are mainly associated with neuroactive ligand-receptor interaction, and cocaine addiction (Table 3).
Similarly, total of 32 DEGs (6 up-regulated and 26 down-regulated genes) of the 74 commonly altered DEGs from entorhinal cortex were screened into the DEGs PPI network, containing 32 nodes and 41 edges (Fig. 3a), and 42 of the 74 DEGs did not fall into the DEGs PPI network. As shown in Fig. 3a, the majority of the nodes in the network were down-regulated DEGs in AD samples. Among the 32 nodes, 2 central node genes were identified with the filtering of degree≥8 criteria (i.e., each node had more than 8 connections/interactions) as top 2 hub genes, which were OXT and TAC1 (Fig. 3b). Figure 3b presents these 2 genes and their first neighbor genes. In total, 1 module (Modules 3) with score > 3 was detected by MCODE (Fig. 3c). KEGG pathway enrichment analyses showed that Module 3 consisted of 5 nodes and 10 edges, which are mainly associated with signal transduction (including multiple receptors) ( Table 4).

Identification of lncRNAs and analysis of binding proteins
21 mutual DEGs were discovered through the tool of venn by employing Funrich software, and among which 1 lncRNA, linc00622, was identified as differently expressed both in the tissues of hippocampus and entorhinal cortex between AD patients and normal controls (Fig. 4a). Figure 4b and c show the relative expressed values of linc00622 in the tissues of hippocampus and entorhinal cortex. Linc00282 and linc00960 were differently expressed in the tissues of hippocampus and entorhinal cortex, respectively. After using the online tools, University of Califorina Santa Cruz (UCSC) and the Database of RNA-binding protein specificities (RBPDB), we obtained the RNA-binding proteins lists of linc00662, linc00282 and linc00960. Then, 14 mutual RNA-binding proteins were found to be shared by these 3 lncRNAs via the tool of venn (Fig. 4d). The biofunctions of these RNA-binding proteins were summarized in the Table 5.

Discussion
In general, AD is a genetically complex neurodegenerative disease and is characterized by the presence of extracellular deposition of senile plaques, intracellular NFTs and loss of neuron and synapses [50,51]. AD affects patients' living quality and is detrimental to their  life, and further imposes a considerable burden on their families and the whole society. However, there are rare effective therapies for AD patients nowadays; it is urgent to develop novel perspectives to improve treatment outcomes [52]. We selected GSE48350 database, which contains microarray data from AD cases (aged 20-99 years) and age matched normal controls, from 4 brain regions: hippocampus, entorhinal cortex, superior frontal cortex, and post-central gyrus. Previous study has demonstrated that frontal cortical dysfunction contributed a significant extent to cognitive deficits and memory loss, which was considered as the late characteristic of AD [53]; meanwhile, AD has been widely considered as an early amnesic syndrome of hippocampal type, which on behalf of the most significant clinic feature for the diagnosis of AD [54][55][56], and we believed that the data between agematched AD patients and normal controls were more convictive. Besides, entorhinal cortex is also considered as a vital brain region in characterizing AD, and aberrant changes of entorhinal cortex happen before hippocampus in the pathological mechanism of AD [57][58][59]. Therefore, we analyzed the data from hippocampus and entorhinal cortex tissues between aged AD cases from 69 to 99 years and age matched normal controls. We employed several types of tools to recognize critical molecular terms and mechanisms involved in AD.
We firstly used Morpheus to filtrate the DEGs from post-mortem samples between AD patients and normal controls using p < 0.05 and |log2FC| ≥ 1 as the criteria, and then we obtained 251 DEGs including 56 upregulated genes and 195 down-regulated genes in hippocampus tissues, and 74 DEGs including 16 up-regulated genes and 58 down-regulated genes in entorhinal cortex tissues.
The overrepresented biological processes, cellular components and molecular functions obtained from GO analyses of DEGs from hippocampus and entorhinal cortex tissues may give valuable information about the pathogenic molecular mechanisms of AD. Among the GO terms overrepresented in AD patients, those related to memory-related processes, drug reactions, transmembrane transportation, synaptic transmission and ion homeostasis were included. These results were in accordance with previous studies that complex interrelationships of synaptic depression, cognitive impairment, aberrant drug metabolism, and imbalance of ion homeostasis existed among the nosetiology and development processes of AD [60][61][62][63][64][65]. The multiformity in the biological process of genes involved in AD indicated the complicacy of the disease. Recent studies convinced that ion channels are well-known to be involved in AD pathophysiology, especially potassium ion channel, and is emerging as a new target candidate for AD [66,67]. Pathway enrichment analyses of DEGs from hippocampus and entorhinal cortex tissues were classified by the Reactome, KEGG and/or Wikipathway databases, respectively. The different databases provided similar information with the majority of AD-related proteins acting in 19 major pathways (13 from hippocampus and 6 from entorhinal cortex), mainly related transmembrane transportation, drug reactions, synapses function, ion homeostasis, neurogenesis and signal transduction, which were consistent with the results of GO analyses of  these DEGs and previous works focused on the aetiology of AD [60][61][62][63][64][65]68]. Besides, in the pathway analyses of 3 modules (2 for DEGs from hippocampus and 1 for entorhinal cortex), it was indicated that multiple signal transduction pathways were involved in the pathological mechanism of AD. From the results of functional enrichment analyses of DEGs, it can be concluded that synaptic depression, cognitive impairment, aberrant drug metabolism, and imbalance of ion homeostasis participated in the pathology of AD and signaling pathways that regulate these biological phenomena would be the efficient treatment targets for AD.
Among 11 hub genes (9 from hippocampus and 2 from entorhinal cortex) obtained from PPI network analyses in this study, several genes involved in the regulation of cell survival and cell growth, such as CDC42 and VEGFA; several genes involved in the memory, learning, and cognitive functions, such as BDNF, PDYN, CALB, TH, CACNA1A, and OXT; several genes involved in the immune and neuroprotective functions, such as CD44 and TAC1. Detailed information of these genes is as seen-shown in Table 6. Specially, as far as we know, CALB, CACNA1A and OXT were identified as the hub participants in the pathological mechanism of AD for the first time in this study.
Given multiple biofunctions of human lncRNA, the associations between lncRNA and AD have great potential benefits to understanding the cause of AD. So far, only several lncRNAs, such as BACe1-AS [27], 51A [28], 17A [29], BC200 [95] and so on, have been validated to be involved in the pathogenesis of AD. Identifying potential diagnostic lncRNA biomarkers by employing computational methods is promising in the biomarker filtration for AD. In the present study, among 21 mutual DEGs of hippocampus and entorhinal cortex tissues, 1 lncRNA, linc00622 was identified as differently expressed both in the tissues of hippocampus and entorhinal cortex. Besides, linc00282 and linc00960 were differently expressed in the tissues of hippocampus and entorhinal cortex, respectively. 14 mutual RNA-binding proteins were proved to have close relationship with paroxysm and development of AD. Therefore, linc00622, linc00282 and linc00960 could be considered as novel potential candidates participating in the pathological mechanism of AD.

Conclusions
Combined the results of comprehensive and systematic analyses focusing on the biological functions and interactions of the genes extracted from GSE48350 genome database of AD patients and normal controls, genes mainly related the biological functions of memory, synapse, neuron survival, drug metabolism, ion homeostasis and signal transduction were differently expressed in the hippocampus and entorhinal cortex tissues of AD patients aged from 69 to 99 years and age matched normal controls. Our study should shed some light toward a better understanding of the underlying molecular mechanisms and crucial molecular players of AD, and provide a new viewpoint for researchers with target the cause of the disease, and also these understandings need to be further validated by experiments in the future.

Microarray data extraction and identification of DEGs
The network-based analyses of AD began with the authentication of microarray gene expression dataset. We downloaded the gene expression profile and sample information of GSE48350 from the public availability repository GEO  [97]. GSE48350 contains postmortem brain tissue samples from diseased (patients with AD) and control (normal) conditions. Hippocampus region of brain plays significant role in memory formation, which is necessary in diagnostics since loss of memory and cognitive competence and disorientation are the early signs of AD [56]. In this study, we analyzed 18 AD samples of hippocampus region of post-mortem brain aged from 69 to 99 (mean age 84.33 ± 6.56 years) and 24 age-matched control samples of hippocampus region of post-mortem brain (mean age 82.71 ± 9.47 years), and 15 AD samples of entorhinal cortex region aged from 69 to 99 (mean age 86.47 ± 5.46 years) and 17 age-matched control samples of entorhinal cortex region (mean age 81.65 ± 9.76 years). Morpheus (https://software.broadinstitute.org/morpheus/ ) [98] online tool allows researchers to carry out of GEO data to identify DEGs. A gene is defined as a DEG between the patients' samples and the normal control samples when the p-value is < 0.05 and the FC is at least 2 times higher or lower (|log2FC| ≥ 1).

Functional enrichment analyses for DEGs
Executing functional enrichment analyses for DEGs gives a functional overview of the DEGs through computing the whole conspicuousness of the gene expression. GO comprises biological process, cellular component, and molecular function, providing biological functional interpretation of large lists of genes screened from genomic studies such as microarray and proteomics experiments [99,100]. KEGG is an encyclopedical database resource consisting of graphical diagrams of biochemical pathways for functional gene and molecules to be integrally analyzed [101,102]. Reactome, an online bioinformatics resource of pathway information, supplies integrated analysis of the biologic reaction network [103,104]. WikiPathway provides a database in a curated, machine readable way to analyze and visualize data [105]. Pathway analyses of KEGG, Reactome and Wikipathway were employed to illuminate how DEGs perform function through a certain path.
We selected Functional Enrichment analysis tool (Cytoscape v3.7.0), which is an autocephalous software tool employed mainly for functional enrichment and interaction network analyses of genes and proteins. Cytoscape is open source software who can integrate interaction networks of high-throughput expression data and other molecular states of genes and proteins into a unitive conceptual framework [106]. This software has been widely employed by researchers to study biological domains, the genome, proteome and metabonomics [107][108][109][110]. The functional enrichment analyses for the up-regulated and down-regulated DEGs and pathways were performed via Cytoscape and its plugins, ClueGO v2.5.3 and Cluepedia v1.5.3, using p < 0.05 as the selected criterion in the present study.

Construction of PPI network for DEGs and recognition of hub proteins
PPI was employed to analyze the interrelationship among DEGs, and further illustrate the models of genes which play significant roles in physiological and pathological status. The STRING database (https://www.string-db.org/) [111] supplies information about the predicted and experimental interrelationships of proteins, and helps to assess and integrate PPI, including direct (physical) and indirect (functional) correlations [112,113]. In this study, the DEGs were mapped into PPI using STRING database neurotrophic and inflammation related functions [90,91] v10.5. Then, Cytoscape software was employed to visualize the PPI network. The network module was one of the peculiarities of the protein network and contains peculiar biological importance. The MCODE (v1.5.1) was employed to identify remarkable modules in this PPI network. Degree cutoff = 2, Node Score Cutoff = 0.2, and K-Core = 2 were set as the advanced settings. MCODE was applied to filtrate hub proteins within the PPI network. At last, the enrichment analyses of the DEGs in different modules were also conducted by the STRING database.

Identification of lncRNAs and binding proteins prediction
Mutual DEGs were discovered through the tool of venn by employing Funrich (3.1.3) software. The relative expressed values of linc00622 in the tissues of hippocampus and entorhinal cortex were analyzed by Graphpad Prism (7.0) software. Online tools, UCSC (https://genome.ucsc.edu/index. html) [114,115] and RBPDB (http://rbpdb.ccbr.utoronto. ca/index.php) [116,117] were used to obtain the RNAbinding proteins lists of linc00662, linc00282 and linc00960. Then, mutual RNA-binding proteins shared by these 3 lncRNAs were found via the tool of venn.

Additional files
Additional file 1