- Open Access
Contrasting patterns of nucleotide polymorphism suggest different selective regimes within different parts of the PgiC1 gene in Festuca ovina L.
Hereditasvolume 154, Article number: 11 (2017)
Phosphoglucose isomerase (PGI, EC 188.8.131.52) is an essential metabolic enzyme in all eukaryotes. An earlier study of the PgiC1 gene, which encodes cytosolic PGI in the grass Festuca ovina L., revealed a marked difference in the levels of nucleotide polymorphism between the 5’ and 3’ portions of the gene.
In the present study, we characterized the sequence polymorphism in F. ovina PgiC1 in more detail and examined possible explanations for the non-uniform pattern of nucleotide polymorphism across the gene.
Our study confirms that the two portions of the PgiC1 gene show substantially different levels of DNA polymorphism and also suggests that the peptide encoded by the 3’ portion of PgiC1 is functionally and structurally more important than that encoded by the 5’ portion. Although there was some evidence of purifying selection (d N/d S test) on the 5’ portion of the gene, the signature of purifying selection was considerably stronger on the 3’ portion of the gene (d N/d S and McDonald–Kreitman tests). Several tests support the action of balancing selection within the 5’ portion of the gene. Wall’s B and Q tests were significant only for the 5’ portion of the gene. There were also marked peaks of nucleotide diversity, Tajima’s D and the d N/d S ratio at or around a PgiC1 codon site (within the 5’ portion of the gene) that a previous study had suggested was subject to positive diversifying selection.
Our results suggest that the two portions of the gene have been subject to different selective regimes. Purifying selection appears to have been the main force contributing to the relatively low level of polymorphism within the 3’ portion of the sequence. In contrast, it is possible that balancing selection has contributed to the maintenance of the polymorphism within the 5’ portion of the gene.
Levels of nucleotide polymorphism have been shown to vary greatly between different parts of the genome (e.g. [1,2,3]), and there may also be variation in the levels of polymorphism within individual genes (e.g. [4,5,6,7]). A non-uniform pattern of nucleotide polymorphism within genes may arise if different types of selective pressure are operating on different regions of the gene (cf. [8, 9]). Different regions of a gene may code for peptides that have different structural or functional significances, and the regions of a gene with more stringent structural and/or functional requirements are expected to be subject to stronger purifying selection  and, therefore, tend to show lower levels of nucleotide polymorphism than regions that are subject to less stringent constraints [11, 12]. Positive directional selection may reduce the levels of local nucleotide polymorphism within a gene , while balancing selection may increase the levels of nucleotide polymorphism at, and in the vicinity of, the selected sites [13, 14]. A classic example of a case where selection results in non-uniform levels of nucleotide polymorphism between different gene regions is that of the major histocompatibility complex (MHC) genes. These genes are crucial for the ability of a vertebrate host’s immune system to detect evolving pathogens, and it is frequently suggested that the maintenance of the high levels of non-synonymous polymorphism in the MHC gene regions encoding the antigen binding site is a reflection of pathogen-driven balancing selection [15, 16]. In addition to selective processes, varying rates of recombination and mutation, as well as stochastic processes, may also contribute to non-uniform levels of nucleotide polymorphism between different regions of a gene (cf. [11, 17]).
The PgiC1 gene, which encodes the cytosolic version of the metabolic enzyme phosphoglucose isomerase (PGI, EC 184.108.40.206), in the grass Festuca ovina L., represents one of the few reported cases in which the levels of nucleotide polymorphism differ substantially between the 3’ and 5’ portions of a gene . PGI catalyses the second step of glycolysis , and is also known to have diverse moonlighting functions (see the references in ). The functional PGI enzyme is formed by two monomers, with each monomer being composed of two main domains (the “small domain” and the “large domain”) [21, 22] (Fig. 1-a). High levels of allozyme/isozyme variation have been frequently reported for PGI in many different species . Observed differences in enzyme activity between PGI variants in a number of species are consistent with observed associations between the PGI variation and environmental variables or life-history traits – suggesting that the loci coding for PGI may be under selection (e.g. [24, 25]).
Festuca ovina is a perennial, tussock-forming and outcrossing grass, with wind-dispersed pollen and seeds . The species has a broad ecological amplitude and is widespread in unfertilized grasslands in northern Europe (e.g. [26, 27]). The steppe-like “alvar” grasslands on the Baltic island of Öland (Sweden) are characterized by a fine-scale edaphic mosaic, with moist and dry, and high and low pH microhabitats. Earlier studies suggest that cytosolic PGI isozyme variation in F. ovina may be involved in fine-scale microhabitat adaptation on Öland [26, 28, 29]. Analysis of replicated samples from different alvar sites shows that, despite the fact that F. ovina is strongly outcrossing, the frequencies of different cytosolic PGI isozyme electromorphs are significantly associated with microhabitat variation in the alvar grasslands and that electromorph frequencies change in response to experimental habitat manipulation [26, 28].
In F. ovina, cytosolic PGI is coded for by two loci, PgiC1 and PgiC2 . The “native” PgiC1 locus is present in all F. ovina individuals, whereas PgiC2 is only present in some individuals and appears to have been horizontally acquired from a distantly related grass genus [29,32,, 31–33]. Earlier analyses of the PgiC1 gene in F. ovina suggest that two PgiC1 amino acid codon sites may be affected by positive selection , and SNP (single nucleotide polymorphism) alleles at these two codon sites show significant associations with microhabitat variables in the alvar grasslands (Y Li, B Hansson, M Lönn, HC Prentice, unpublished results).
The uneven distribution of polymorphic nucleotide sites along the PgiC1 gene was noted in an earlier study that included five PgiC1 coding sequences from Skåne, S Sweden . The longest intron (intron 12, Fig. 1-b) was used as a demarcation point between the polymorphic 5’ portion of the gene and the, substantially less polymorphic, 3’ portion of the gene. The aim of the present study was to investigate the possible evolutionary mechanisms that may have contributed to the contrasting levels of nucleotide polymorphism in the two portions of the PgiC1 gene in F. ovina. We analysed the levels of PgiC1 nucleotide polymorphism within a larger dataset (29 PgiC1 cDNA sequences) from F. ovina individuals collected from the alvar grasslands on Öland, and carried out a range of tests to assess the relative importance of different types of selection that may have contributed to the non-uniform pattern of nucleotide polymorphism within PgiC1.
The 3’ portion of PgiC1 in F. ovina encodes the structurally important large domain and three functionally essential active site residues (Figs. 1-a and 2). The extensive inter-monomer interaction between the large domains of the two monomers is necessary for the formation of a stable PGI dimer  and the three active site residues (equivalent to Glu360, His391 and Lys516 in F. ovina) participate directly in the isomerization reaction of PGI . If the 3’ portion of PgiC1 codes for products that are subject to greater structural or functional constraints than the products of the 5’ portion of the gene, then a relatively stronger level of purifying selection (i.e. negative selection) may be expected to have contributed to the low level of nucleotide polymorphism within the 3’ portion of PgiC1 in F. ovina. The 5’ portion of PgiC1 contains the amino acid codon sites 173 and 200 (Fig. 2). If these sites are under balancing selection (i.e. positive intraspecific diversifying selection), as suggested by , then balancing selection targeting the two sites might be expected to contribute to the high level of nucleotide polymorphism within the 5’ portion of PgiC1. The present study provides support for the prediction that there is a stronger purifying selection on the 3’ portion than on the 5’ portion of PgiC1, and suggests that there is balancing selection on the 5’ portion of the gene.
Plant material and sequences
The present study examined variation within 29 PgiC1 cDNA sequences (GenBank accession numbers KF487737-KF487765, ) from Öland populations of F. ovina. The sequences were derived from 15 individuals that were chosen to represent the five cytosolic PGI electromorphs (EMs 1, 2, 4, 5 and 6) that occur most frequently within populations of F. ovina on Öland [26, 28, 29] – with a particular focus on the two most common electromorphs, EM 1 and EM 2 . The sequences were obtained by, first, synthesizing the total cDNA from the total RNA of each studied F. ovina individual . The PgiC1 cDNA was then PCR-amplified from the synthesized total cDNA, and the amplified PgiC1 cDNA was cloned and sequenced . Two PgiC1 cDNA alleles were acquired from each (diploid) individual, giving a total of 30 alleles from the 15 studied individuals . However, one of the alleles (GenBank accession number KF487766) contained an aberrant (113 bp) insertion  and was excluded from the analyses in the present study, unless specified.
Each of the 29 analysed PgiC1 cDNA sequences covers 96% (1 633 bp) of the full-length (1701 bp, excluding stop codon) PgiC1 coding sequence, and ranges from exon 1 to exon 22 (Fig. 1-b). For comparative purposes, we also downloaded the five Skåne F. ovina PgiC1 coding sequences (GenBank accession numbers DQ225731-DQ225735) which were examined in the earlier study that noted the difference in the levels of polymorphism between the 5’ and 3’ portions of PgiC1 . These five Skåne sequences represent the common cytosolic PGI isozyme electromorphs 1, 2 and 6, as well as the rare electromorph 8 [26, 28]. Each of the five sequences covers 1 182 bp, out of the full-length PgiC1 coding sequence , and ranges from exons 5 to 11 and from exons 13 to 21.
Analysis of sequence data
The number of polymorphic sites (S), nucleotide diversity (π; ), Watterson’s estimator of the population mutation rate (θ W; ), and the number of haplotypes (N h) were calculated using DnaSP v. 5.10.01 . All the statistics were calculated, separately, for the two portions of the five Skåne PgiC1 sequences (5’: exons 5-11; 3’: exons 13-21) and of the 29 Öland (5’: exons 1-12; 3’: exons 13-22) PgiC1 sequences. Total nucleotide diversity (π T) was estimated separately for each of the 22 studied PgiC1 exons for the 29 Öland PgiC1 sequences. Sliding window analysis of π T was also carried out for the 29 Öland PgiC1 sequences using DnaSP v. 5.10.01 (window length: 100 bp; step size: 10 bp). The remaining analyses only considered the 29 Öland PgiC1 sequences, unless specified.
The level of recombination was estimated (as the minimum number of recombination events, R M, using the method of Hudson RR and Kaplan NL  as implemented in DnaSP v. 5.10.01) for each of the two PgiC1 gene portions in the 29 Öland sequences. The level of recombination was also estimated as the population recombination rate (ρ = 4N e r, where Ne is the effective population size and r is the per-generation per-site recombination rate ), using the program omegaMap . We used the same procedure as in  to run omegaMap, but used a sliding window of 10 codons to estimate ρ in the present study. The level of linkage disequilibrium (LD) within PgiC1 was estimated using r 2 statistics , calculated between all pairs of polymorphic sites (excluding a single site that segregates with more than two nucleotides ), using Haploview v. 4.2 . The genotypes of the 15 studied Öland individuals , at each of the analyzed polymorphic sites, were used as input to the Haploview analyses. In order to generate a complete set of genotype data for the 15 individuals, as required for the Haploview analyses, we included the additional sequence (GenBank accession number KF487766) – acquired from the 15 F. ovina individuals but containing an aberrant insertion that may have resulted from incomplete splicing of the PgiC1 precursor mRNA . This insertion was removed for the Haploview analyses.
The dN/dS ratio (ω) (dN = non-synonymous substitution rate; dS = synonymous substitution rate) was estimated, together with ρ (using omegaMap), for each amino acid codon translated from the PgiC1 sequence. The estimated ω value was used to examine whether purifying (i.e. negative selection, ω < 1) or balancing (i.e. positive intraspecific diversifying) selection (ω > 1) may have contributed to the amounts and patterning of sequence variation within and between the two gene portions for the 29 studied PgiC1 sequences (cf. ). Sliding window analysis of ω was also carried out (manually, on the basis of the results from OmegaMap) with a window length of 99 and a step size of 12.
Neutrality tests, including the Hudson-Kreitman-Aguadé (HKA) test , Tajima’s D test , Fay and Wu’s H test , MacDonald and Kreitman’s (MK) test  and Wall’s B and Q tests  were also used to examine whether selection may have contributed to the amounts and patterning of sequence variation within each of the two PgiC1 gene portions in the 29 sequences. All the neutrality tests were carried out using DnaSP v. 5.10.01. The significances of the Tajima’s D, Fay and Wu’s H, and Wall’s B and Q were conservatively estimated (without allowing for recombination), using 10 000 coalescent simulations and on the basis of the observed number of segregating sites. A single PgiC cDNA sequence from F. altissima (GenBank accession number DQ225740), encoding cytosolic PGI, was used as the outgroup for the HKA test, Fay and Wu’s H test and the MK test. We used the HKA test to compare the 5’- and 3’-portions of PgiC1 in terms of intraspecific polymorphism (within F. ovina) and interspecific divergence (between F. ovina and F. altissima). In the MK test, we compared the ratio of D N/D S (D N or D S = the number of fixed non-synonymous [for D N] or synonymous [for D S] substitutions per gene between F. ovina and F. altissima), with the ratio of P N/P S (P N or P S = the number of non-synonymous [for P N] or synonymous [for P S] polymorphic sites per gene within F. ovina). The degree of synonymous divergence (K S [JC]: Jukes-Cantor corrected number of synonymous substitutions per synonymous site) between the outgroup PgiC sequence and the 29 F. ovina PgiC1 sequences was 0.240. Sliding window analyses were also carried out, respectively, for Tajima’s D, ω, Fay and Wu’s H, as well as for Ka/Ks (the number of nonsynonymous substitutions per nonsynonymous site/the number of sysnonymous substitutions per synonymous site), with a window length of 100 bp and a step size of 10 bp using DnaSP v. 5.10.01.
Analysis of the evolutionary conservation of amino acid sites
The degree of evolutionary conservation at each of the respective PGI amino acid sites corresponding to the PgiC1 translated amino acid sites was estimated, on the basis of the phylogenetic relationships among a large set of homologous sequences, from a wide range of different species, using the online application ConSurf Server . The database UniRef90  was searched for sequences that were homologous with the F. ovina PgiC1 input sequence, using CSI-BLAST  (cutoff E-value = 0.0001; number of interactions = 3; maximum homologs to collect = 150). Within CSI-BLAST, redundant sequences were filtered out by clustering blast hits with a sequence identity of 95% or more and only using one representative of each cluster in the analysis. BLAST hits that shared a sequence identity of less than 35% with the input sequence were ignored. A multiple-species alignment (Additional file 1: File S1) of the acquired homologous sequences was constructed using MAFFT [51, 52], and this alignment was then used to build a phylogenetic tree using the neighbour-joining algorithm as implemented in the Rate4Site program . The level of evolutionary conservation was then estimated, as a conservation score for each amino acid site using an empirical Bayesian algorithm  implemented in the ConSurf Server. The lower the conservation score, the more evolutionarily conserved are the amino acid residues at that specific site. The translated amino acid sequence from a PgiC1 sequence (GenBank accession number KF487738), representing the most common PgiC1 sequence in F. ovina, was used as the input to the ConSurf Server.
The analyses of the five Skåne PgiC1 coding sequences confirmed the earlier finding  that the 5’ portion of the gene (L [sequence length] = 570, S = 30, N h = 5, π = 0.025, θ W = 0.025) was considerably more polymorphic than the 3’ portion (L = 612, S = 3, N h = 4, π = 0.002, θ W = 0.002). Also in agreement with the observed pattern of sequence polymorphism within the five Skåne PgiC1 sequences, the analyses of the 29 Öland PgiC1 cDNA sequences showed that the 5’ portion of the sequence (π T = 0.019, θ W = 0.020) was substantially more polymorphic than the 3’ portion of the sequence (π T = 0.004, θ W = 0.007) (Fig. 2, Table 1). There was a significant difference between the π T values for each of the 12 exons (exons 1–12) within the 5’ portion of the PgiC1 gene in the 29 sequences, and those for each of the 10 exons (exons 13–22) within the 3’ portion of the 29 sequences (Wilcoxon rank sum test; W = 96, P = 0.019, Additional file 2: Table S1). The sliding window analyses of π T showed that the PgiC1 codon site 200 (under positive diversifying selection ) was within the highest peak of π T at exon 8 within the 5’ portion of the gene (Fig. 2).
Recombination and linkage disequilibrium
The analyses of the 29 PgiC1 sequences from Öland revealed a high overall level of recombination (R M = 22; ρ = 0.217, Additional file 3: Table S2). The level of recombination in the highly polymorphic 5’ portion (R M = 20; ρ = 0.383) was substantially higher than that for the less variable 3’ portion of the gene (R M = 1; ρ = 0.026) (Table 1 and Fig. 3_a). However, the matrix of r 2 values (Fig. 3_b) shows that there is a low level of LD throughout the entire PgiC1 gene, with no “strong LD” blocks (cf. ).
The ω and neutrality tests
The fact that the overall ω (dN/dS ratio) value for the entire PgiC1 sequence was substantially lower than 1 (mean ω over all the studied amino acid codons = 0.209, Additional file 3: Table S2) indicates that purifying selection has acted on the overall sequence. Purifying selection is also indicated for both the 5’ portion (291 amino acid codons; average ω = 0.280) and the 3’ portion (253 amino acid codons; average ω = 0.128) of the sequence. The fact that codons within the 3’ portion of the PgiC1 sequence had a significantly lower ω than those in the 5’ portion (Wilcoxon rank sum test; W = 42229, P = 0.003) suggests that the 3’ portion may be under stronger purifying selection than the 5’ portion. The sliding window analyses of ω showed a marked plateau of ω values (between exons 8-9 within the 5’ portion) around the PgiC1 codon site 200 (Fig. 2), that a previous study had suggested was subject to positive diversifying selection . There were 40 segregating sites out of the total of 570 nucleotide sites included in the analysis of the 5’ portion, whereas there were only 14 segregating sites out of 612 sites in the 3’ portion (Additional file 4: Table S3). The levels of interspecific DNA divergence between the two gene portions were similar: 29.6 nucleotide differences among 570 sites for the 5’ portion, and 35.8 nucleotide differences among 612 sites for the 3’ portion (Additional file 4: Table S3). However the HKA test didn’t reject the null neutral model (P = 0.064, Additional file 4: Table S3).
Wall’s B and Q values were significant for the 5’ portion of the PgiC1 sequence (Table 1), indicating an excess of LD between adjacent segregating sites, which may reflect balancing selection (cf. [47, 56]). Wall’s B and Q values were non-significant for the 3’portion of the PgiC1 sequence (Table 1). There was a negative, but non-significant (P = 0.064, Table 1) Tajima’s D for the 3’ portion of the PgiC1 sequence, while the Tajima’s D for the 5’ portion was near zero (D = -0.195, Table 1). The sliding window analyses of the Tajima’s D showed that the highest peak of D was located at the PgiC1 codon site 200 (exon 8) (Fig. 2) which was identified as a potential target of positive diversifying selection in an earlier study . The D value at this peak (D = 1.439) was near-significant (P = 0.060; post hoc significance test without correction for multiple tests). The Fay & Wu’s H values were non-significant for both the 5’ (P = 0.669) and the 3’ (P = 0.148) portions of the PgiC1 sequence (Table 1). The sliding window analysis of Fay and Wu’s H showed that the deepest valley of H was at exon 16 (within the 3’ portion), near which a striking valley of Tajima’s D was also observed (Additional file 5: Figure S1). Both D and H had negative values in these valleys (Additional file 5: Figure S1), and according to the post hoc tests of significance (not subject to multiple test correction), the H was significant (H = -1.862, P = 0.006), while D was near-significant (D = -1.509, P = 0.095).
The MK test rejected the neutral null model for the 3’ portion of the PgiC1 sequence (Fisher’s exact test, P = 0.018). The significant MK test reflects a lower D N/D S ratio (0.064) relative to the P N/P S ratio (0.556), which is consistent with purifying selection within this portion of the gene (Table 2). The MK test for the 5’ portion of the sequence was non-significant (D N/D S = 0.235; P N/P S = 0.323; Fisher’s exact test, P = 0.755, Table 2).
Degree of amino acid site conservation
The PGI amino acid sites showed a tendency to be less evolutionarily conserved within the region corresponding to the 5’ portion of the translated PgiC1 amino acid sequence (291 amino acid sites; mean conservation score = 0.105) than those corresponding to the 3’ portion of the translated PgiC1 amino acid sequence (253 amino acid sites; mean conservation score = -0.121) (Wilcoxon rank sum test; W = 39886, P = 0.093) (Table 1, Additional file 6: Table S4).
Analyses of the 29 Öland F. ovina PgiC1 cDNA sequences in the present study, together with the analyses of the five Skåne PgiC1 sequences, shows that the nucleotide polymorphism is not evenly distributed within the PgiC1 gene (Fig. 2). The 5’ portion of the PgiC1 sequence is substantially more polymorphic than the 3’ portion, and our analyses suggest that the difference in the level of polymorphism may have resulted from different selective regimes in the two portions of the gene.
Which evolutionary mechanisms may have contributed to the relatively low level of nucleotide polymorphism within the 3’ portion of the PgiC1 sequence?
The parts of a protein that are more important for the stability and/or function of an enzyme are likely to be subject to stronger purifying selection  and, therefore, tend to exhibit a lower level of intraspecific polymorphism than the parts of the protein with less stringent functional and structural requirement [11, 12]. In F. ovina, the 3’ portion of the PgiC1 sequence encodes a peptide that includes the structurally important large domain of the PGI monomer  and the three most conserved, functionally essential, active site residues . The fact that the peptide translated from the 3’ portion of PgiC1 contains important components of the 3-D structure of PGI suggests that this peptide may have a greater overall importance for the function of PGI than the peptide translated from the 5’ portion of PgiC1. The suggested difference in the functional and structural significance of the translated peptides between the two portions of PgiC1 is supported by the estimated amino acid conservation scores in the present study, which show that the PGI amino acid sites corresponding to the 3’ portion of the PgiC1 sequence tend to be evolutionarily more conserved than those of the 5’ portion in a wide range of species (cf. ).
If the peptide translated from the 3’ portion of PgiC1 is more important for the function of the PGI enzyme than the peptide from the 5’ portion, then the 3’ portion of the gene may be expected to be under stronger purifying selection than the 5’ portion (cf. [10, 12]). In line with this expectation, the average ω value for the 3’ portion of PgiC1 was considerably lower than that for the 5’ portion. The average values for both portions were much lower than 1 (suggesting purifying selection ). The fact that the value of Tajima’s D was more strongly negative for the 3’ portion (D = -1.363; P = 0.064) than for the 5’ portion (D = -0.195; P = 0.484) of the PgiC1 sequence, is also consistent with the 3’ portion being under stronger purifying selection than the 5’ portion of the sequence (cf. ). The significant MK test result for the 3’ portion and the non-significance of the test for the 5’ portion of the PgiC1 sequence may, again, suggest that the 3’ portion is under stronger purifying selection than the 5’ portion of the sequence. The significant MK test for the 3’ portion of PgiC1, with a lower ratio of fixed non-synonymous/synonymous substitutions between species (D N /D S = 0.064) than the ratio of non-synonymous/synonymous polymorphism within species (P N /P S = 0.556), suggests purifying selection, where the within-species nonsynonymous polymorphism that is maintained in selection–mutation balance consists mainly of weakly deleterious mutations .
Because the PGI protein structural elements are closely similar in a wide range of organisms (e.g. [21, 35, 59]), the functional and structural significance and the pattern of nucleotide polymorphism within the Pgi gene might be expected to be similar between F. ovina and other species. However, at present, only a few studies of Pgi have investigated gene-wide patterns of nucleotide polymorphism. Two of these studies, on the butterflies Melitaea cinxia  and Colias eurytheme [61, 62], reveal a nearly uniformly high level of nucleotide polymorphism across the entire Pgi gene. The pattern of nucleotide polymorphism in these two species was interpreted in terms of balancing selection (targeting a few amino acid sites located within the large domain) and moderate to high levels of LD within the entire Pgi gene [60, 61]. A high level of synonymous polymorphism was observed regionally around a nonsynonynous mutation within the 5’ portion of the PgiC gene in Arabidopsis thaliana, and interpreted in terms of balancing selection and an overall low level of recombination .
In contrast to the Pgi genes of the three species mentioned above, the F. ovina PgiC1 gene showed a high level of recombination (R M = 22): M. cinxia had a R M of 6 , whereas C. eurytheme R M = 11  and A. thaliana showed no clear evidence of recombination ). The high estimated recombination value is consistent with the fact that F. ovina is a highly outcrossing species , and may indicate that different parts of the sequence could have evolved (at least to some extent) independently, resulting in a non-uniform pattern of nucleotide polymorphism across the sequence. Most of the identified recombination is within the 5’ portion of the PgiC1 sequence, while the 3’ portion of the sequence shows limited recombination. However, the relatively low level of recombination detected for the 3’ portion of the sequence may be a consequence of purifying selection having removed variation at both the sites under selection and linked neutral sites (cf. ) – thereby removing the molecular signatures of recombination and lowering the numbers of identified recombination events (cf. ). The hypothesis that purifying selection may have removed the detectable signatures of recombination within the 3’ portion of PgiC1 agrees with the PgiC1 LD matrix, where a uniformly low level of LD is observed along the entire PgiC1 cDNA sequence.
In addition to purifying selection, a selective sweep may also have contributed to the low level of nucleotide diversity within the 3’ portion of the gene: the near, but non-significant (P = 0.064), result of the HKA test suggests that variation-reducing selective forces may be acting on the PgiC1 3’ portion and/or variation-increasing forces acting on the 5’ portion (cf. ). The sliding window analyses revealed striking valleys of both Fay and Wu’s H and Tajima’s D at exon 16 (within the 3’ portion) (Additional file 5: Figure S1). In these valleys, the Fay and Wu’s H is significantly negative (H = -1.862; P = 0.006) and the Tajima’s D is also near-significantly negative (D = -1.509; P = 0.095). These negative values reflect a high-frequency of derived SNPs (around the valleys), suggesting a selective sweep (cf. ). A nearby shallow valley of total nucleotide diversity (Fig. 2) is also suggestive of a selective sweep . The valleys of D and H at exon 16 are close to active site residue His391 (Additional file 5: Figure S1). Sequence patterns that are identified as the signatures of selection in neutrality tests (e.g. Tajima’s D and Fay and Wu’s H) may also result from factors such as population size changes or reflect population structure [66,67,68]. In the case of the highly outcrossing populations of F. ovina on Öland, population structure is unlikely to be a confounding factor in the neutrality tests. However, because possible confounding effects resulting from changes in population size cannot be excluded, the selective sweep suggested by the H and D tests should be interpreted with caution.
Which evolutionary mechanisms may have contributed to the relatively high level of nucleotide polymorphism within the 5’ portion of the PgiC1 sequence?
An earlier study  identified two F. ovina PgiC1 codon sites (sites 173 and 200) as candidate targets of balancing selection (i.e. positive intraspecific diversifying selection) with a considerably stronger signal of selection for site 200 than for site 173 . The sliding window analyses in the present study support the results of the earlier study and reveal a marked plateau of ω around the selected site 200 (but no peak associated with site 173) (Fig. 2). Protein structure modelling suggests that the translated amino acid polymorphism at these two PgiC1 sites may affect either the interaction between the two monomers, or the domain-domain packing of the encoded PGI enzyme and, thus, influence the biochemical properties of the cytosolic PGI enzyme in F. ovina . Biochemical studies in humans have shown that mutations at a few amino acid sites, which have similar 3-D structural locations to the two selected amino acid sites in F. ovina, significantly affect the activity of the PGI enzyme [69, 70].
Within F. ovina PgiC1, both codon sites 173 and 200, which were previously identified as candidates for balancing selection , are located within the 5’ portion of the sequence (Fig. 2). The significant results from the Wall’s B and Q tests (Table 1) support the suggestion that there has been balancing selection on the 5’ portion of the PgiC1 sequence in F. ovina (cf. ). In addition to the significant B and Q tests, signals of balancing selection were also detected at the putative selected site 200. The highest peaks of, respectively, positive Tajima’s D and total nucleotide diversity were observed at or around codon site 200 in the sliding window analyses (Fig. 2), and these peaks are a typical signature of balancing selection (cf. [13, 14]). No marked peak or plateau for polymorphism or for Tajima’s D was observed for the second putative selected site (site 173) in the present study (Fig. 2), in agreement with the previous study  which showed a weaker signal of balancing selection for site 173 than for site 200.
The PgiC1 gene in F. ovina represents one of the few reported cases in which the levels of nucleotide polymorphism differ substantially between the 5’ and 3’ portions of a gene. The present study suggests that the contrasting levels of nucleotide polymorphism between the two portions of PgiC1 may have resulted from different selective regimes in the two gene portions. Relatively strong purifying selection appears to have reduced the level of polymorphism within the 3’ portion, whereas balancing selection may have contributed to the maintenance of the polymorphism in the 5’ portion of the sequence. A high overall level of recombination and a low level of LD within PgiC1 may have allowed partially independent selection and evolution within the two portions of the gene.
Highest posterior density
The major histocompatibility complex
- MK test:
MacDonald and Kreitman’s test
Single nucleotide polymorphism
Huang X, Kurata N, Wei X, Wang Z-X, Wang A, Zhao Q, et al. A map of rice genome variation reveals the origin of cultivated rice. Nature. 2012;490:497–501.
Hufford MB, Xu X, van Heerwaarden J, Pyhäjärvi T, Chia J-M, Cartwright RA, et al. Comparative population genomics of maize domestication and improvement. Nat Genet. 2012;44:808–11.
Jiao Y, Zhao H, Ren L, Song W, Zeng B, Guo J, et al. Genome-wide genetic changes during modern breeding of maize. Nat Genet. 2012;44:812–5.
Le Corre V, Roux F, Reboud X. DNA polymorphism at the FRIGIDA gene in Arabidopsis thaliana: extensive nonsynonymous variation is consistent with local selection for flowering time. Mol Biol Evol. 2002;19:1261–71.
Moore RC, Grant SR, Purugganan MD. Molecular population genetics of redundant floral-regulatory genes in Arabidopsis thaliana. Mol Biol Evol. 2005;22:91–103.
Hasselmann M, Beye M. Signatures of selection among sex-determining alleles of the honey bee. Proc Natl Acad Sci U S A. 2004;101:4888–93.
Ding Z, Wang C, Chen S, Yu S. Diversity and selective sweep in the OsAMT1; 1 genomic region of rice. BMC Evol Biol. 2011;11:61.
Charlesworth B. Effective population size and patterns of molecular evolution and variation. Nat Rev Genet. 2009;10:195–205.
Cutter AD, Payseur BA. Genomic signatures of selection at linked sites: unifying the disparity among species. Nat Rev Genet. 2013;14:262–74.
Tourasse NJ, Li W-H. Selective constraints, amino acid composition, and the rate of protein evolution. Mol Biol Evol. 2000;17:656–64.
Hudson RR, Kreitman M, Aguadé M. A test of neutral molecular evolution based on nucleotide data. Genetics. 1987;116:153–9.
Kimura M. The neutral theory of molecular evolution. Cambridge: Cambridge University Press; 1983.
Weedall GD, Conway DJ. Detecting signatures of balancing selection to identify targets of anti-parasite immunity. Trends Parasitol. 2010;26:363–9.
Charlesworth D. Balancing selection and its effects on sequences in nearby genome regions. PLoS Genet. 2006;2:e64.
Lamaze FC, Pavey SA, Normandeau E, Roy G, Garant D, Bernatchez L. Neutral and selective processes shape MHC gene diversity and expression in stocked brook charr populations (Salvelinus fontinalis). Mol Ecol. 2014;23:1730–48.
Bernatchez L, Landry C. MHC studies in nonmodel vertebrates: what have we learned about natural selection in 15 years? J Evol Biol. 2003;16:363–77.
Wright SI, Foxe JP, DeRose-Wilson L, Kawabe A, Looseley M, Gaut BS, et al. Testing for effects of recombination rate on nucleotide diversity in natural populations of Arabidopsis lyrata. Genetics. 2006;174:1421–30.
Ghatnekar L. Genetic analysis of cytosolic PGI in Festuca ovina. PhD thesis. Lund University, 2003.
Gillespie JH. The causes of molecular evolution. New York: Oxford University Press; 1991.
Marden JH. Nature’s inordinate fondness for metabolic enzymes: why metabolic enzyme loci are so frequently targets of selection. Mol Ecol. 2013;22:5743–64.
Shaw PJ, Muirhead H. Crystallographic structure analysis of glucose 6-phosphate isomerase at 3.5 Å resolution. J Mol Biol. 1977;109:475–85.
Wang B, Watt WB, Aakre C, Hawthorne N. Emergence of complex haplotypes from microevolutionary variation in sequence and structure of Colias phosphoglucose isomerase. J Mol Evol. 2009;68:433–47.
Riddoch BJ. The adaptive significance of electrophoretic mobility in phosphoglucose isomerase (PGI). Biol J Linn Soc Lond. 1993;50:1–17.
Dahlhoff EP, Rank NE. Functional and physiological consequences of genetic variation at phosphoglucose isomerase: Heat shock protein expression is related to enzyme genotype in a montane beetle. Proc Natl Acad Sci U S A. 2000;97:10056–61.
Watt WB. Eggs, enzymes, and evolution: natural genetic variants change insect fecundity. Proc Natl Acad Sci U S A. 1992;89:10608–12.
Prentice HC, Lönn M, Lefkovitch LP, Runyeon H. Associations between allele frequencies in Festuca ovina and habitat variation in the alvar grasslands on the Baltic island of Öland. J Ecol. 1995;83:391–402.
Cope T, Gray A. Grasses of the British Isles. London: Botanical Society of the British Isles; 2009.
Prentice HC, Lönn M, Lager H, Rosén E, van der Maarel E. Changes in allozyme frequencies in Festuca ovina populations after a 9-year nutrient/water experiment. J Ecol. 2000;88:331–47.
Prentice HC, Li Y, Lönn M, Tunlid A, Ghatnekar L. A horizontally transferred nuclear gene is associated with microhabitat variation in a natural plant population. Proc R Soc Lond Ser B Biol Sci. 2015;282:20152453.
Ghatnekar L. A polymorphic duplicated locus for cytosolic PGI segregating in sheep’s fescue (Festuca ovina L.). Heredity (Edinb). 1999;83:451–9.
Vallenback P, Jaarola M, Ghatnekar L, Bengtsson BO. Origin and timing of the horizontal transfer of a PgiC gene from Poa to Festuca ovina. Mol Phylogenet Evol. 2008;46:890–6.
Vallenback P, Ghatnekar L, Bengtsson BO. Structure of the natural transgene PgiC2 in the common grass Festuca ovina. Plos One. 2010;5:e13529.
Vallenback P, Bengtsson BO, Ghatnekar L. Geographic and molecular variation in a natural plant transgene. Genetica. 2010;138:355–62.
Li Y, Canbäck B, Johansson T, Tunlid A, Prentice HC. Evidence for positive selection within the PgiC1 locus in the grass Festuca ovina. Plos One. 2015;10:e0125831.
Jeffery CJ, Hardré R, Salmon L. Crystal structure of rabbit phosphoglucose isomerase complexed with 5-phospho-D-arabinonate identifies the role of Glu357 in catalysis. Biochemistry. 2001;40:1560–6.
Nei M. Molecular evolutionary genetics. New York: Columbia University Press; 1987.
Watterson GA. On the number of segregating sites in genetical models without recombination. Theor Popul Biol. 1975;7:256–76.
Librado P, Rozas J. DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics. 2009;25:1451–2.
Hudson RR, Kaplan NL. Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics. 1985;111:147–64.
Stumpf MPH, McVean GAT. Estimating recombination rates from population-genetic data. Nat Rev Genet. 2003;4:959–68.
Wilson DJ, McVean G. Estimating diversifying selection and functional constraint in the presence of recombination. Genetics. 2006;172:1411–25.
Hill WG, Robertson A. Linkage disequilibrium in finite populations. Theor Appl Genet. 1968;38:226–31.
Barrett JC, Fry B, Maller J, Daly MJ. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 2005;21:263–5.
Tajima F. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 1989;123:585–95.
Fay JC, Wu CI. Hitchhiking under positive Darwinian selection. Genetics. 2000;155:1405–13.
McDonald JH, Kreitman M. Adaptive protein evolution at the Adh locus in Drosophila. Nature. 1991;351:652–4.
Wall JD. Recombination and the power of statistical tests of neutrality. Genet Res. 1999;74:65–79.
Ashkenazy H, Erez E, Martz E, Pupko T, Ben-Tal N. ConSurf 2010: calculating evolutionary conservation in sequence and structure of proteins and nucleic acids. Nucleic Acids Res. 2010;38:W529–33.
Suzek BE, Wang Y, Huang H, McGarvey PB, Wu CH, the UniProt Consortium. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics. 2015;31:926–32.
Biegert A, Söding J. Sequence context-specific profiles for homology searching. Proc Natl Acad Sci U S A. 2009;106:3770–5.
Yamada KD, Tomii K, Katoh K. Application of the MAFFT sequence alignment program to large data—reexamination of the usefulness of chained guide trees. Bioinformatics. 2016;32:3246–51.
Katoh K, Misawa K, Kuma K, Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002;30:3059–66.
Pupko T, Bell RE, Mayrose I, Glaser F, Ben-Tal N. Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinformatics. 2002;18:S71–7.
Mayrose I, Graur D, Ben-Tal N, Pupko T. Comparison of site-specific rate-inference methods for protein sequences: empirical Bayesian methods are superior. Mol Biol Evol. 2004;21:1781–91.
Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, Blumenstiel B, et al. The structure of haplotype blocks in the human genome. Science. 2002;296:2225–9.
Olsen KM, Kooyers NJ, Small LL. Recurrent gene deletions and the evolution of adaptive cyanogenesis polymorphisms in white clover (Trifolium repens L.). Mol Ecol. 2013;22:724–38.
Nei M, Suzuki Y, Nozawa M. The neutral theory of molecular evolution in the genomic era. Annu Rev Genom Hum G. 2010;11:265–89.
Kreitman M. Methods to detect selection in populations with applications to the human. Annu Rev Genom Hum G. 2000;1:539–59.
Anand K, Mathur D, Anant A, Garg LC. Structural studies of phosphoglucose isomerase from Mycobacterium tuberculosis H37Rv. Acta Crystallogr F Struct Biol Commun. 2010;66:490–7.
Wheat CW, Hagg CR, Marden JH, Hanski I, Frilander MJ. Nucleotide polymorphism at a gene (Pgi) under balancing selection in a butterfly metapopulation. Mol Biol Evol. 2010;27:267–81.
Wheat CW, Watt WB, Pollock DD, Schulte PM. From DNA to fitness differences: sequences and structures of adaptive variants of Colias phosphoglucose isomerase (PGI). Mol Biol Evol. 2006;23:499–512.
Wang BQ, DePasse JM, Watt WB. Evolutionary Genomics of Colias Phosphoglucose Isomerase (PGI) Introns. J Mol Evol. 2012;74:96–111.
Kawabe A, Yamane K, Miyashita NT. DNA polymorphism at the cytosolic phosphoglucose isomerase (PgiC) locus of the wild plant Arabidopsis thaliana. Genetics. 2000;156:1339–47.
Hahn MW. Toward a selection theory of molecular evolution. Evolution. 2008;62:255–65.
Martin DP, Lemey P, Posada D. Analysing recombination in nucleotide sequences. Mol Ecol Resour. 2011;11:943–55.
Jensen JD, Foll M, Bernatchez L. The past, present and future of genomic scans for selection. Mol Ecol. 2016;25:1–4.
Teshima KM, Coop G, Przeworski M. How reliable are empirical genomic scans for selective sweeps? Genome Res. 2006;16:702–12.
Excoffier L, Hofer T, Foll M. Detecting loci under selection in a hierarchically structured population. Heredity (Edinb). 2009;103:285–98.
Lin HY, Kao YH, Chen ST, Meng M. Effects of inherited mutations on catalytic activity and structural stability of human glucose-6-phosphate isomerase expressed in Escherichia coli. Biochim Biophys Acta. 2009;1794:315–23.
Somarowthu S, Brodkin HR, D’Aquino JA, Ringe D, Ondrechen MJ, Beuning PJ. A tale of two isomerases: compact versus extended active sites in ketosteroid isomerase and phosphoglucose isomerase. Biochemistry. 2011;50:9283–95.
We would like to thank Torbjörn Säll and Anders Tunlid for comments on an earlier version of the manuscript.
The study was supported by grant 621-2008-5617 (to H.C.P and Anders Tunlid) from the Swedish Research Council. Bengt Hansson also acknowledges support from the Swedish Research Council (grant 621-2014-5222). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Availability of data and materials
All data generated or analyzed during this study are either available in the GenBank database (GenBank accession numbers DQ225731-DQ225735 & KF487737-KF487765; https://www.ncbi.nlm.nih.gov/nucleotide) or included in this published article and its additional files.
YL conceived the study with input from BH, HCP and LG. YL was responsible for the data collection and data analyses. YL wrote the paper with input from BH and HCP. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
File S1. The aligned multiple-species homologous amino acid sequences acquired from the Consurf Server. (TXT 96 kb)
The total nucleotide diversity (π T) for each studied PgiC1 exon. (DOCX 47 kb)
Estimates of ω and ρ (made with omegaMap) for each analyzed PgiC1 codon. (XLS 119 kb)
Comparison (Hudson-Kreitman-Aguadé test) between the 5’ and 3’ portions of the sequenced F. ovina PgiC1 in terms of level of polymorphism and level of divergence from the outgroup F. altissima. (DOC 33 kb)
Results for the sliding window analyses of, respectively, Tajima’s D, Fay and Wu’s H, and Ka/Ks. The ticks on the x axis represent the boundary of each analysed PgiC1 exon within the PgiC1 coding sequence. In F. ovina, PgiC1 exons 5–12 encode the small domain of a PGI monomer while exons 13–21 encode the large domain. The three stars on the x axis represent the three active site residues (equivalent to Lys516, Glu360, and His391 in F. ovina) that are directly involved in the PGI isomerization reaction . The dotted vertical line highlights a position where both D and H have marked valleys. (TIF 1634 kb)
The estimated (normalized) conservation scores for each PGI amino acid site. (XLS 49 kb)