Contrasting patterns of nucleotide polymorphism suggest different selective regimes within different parts of the PgiC1 gene in Festuca ovina L.

Background Phosphoglucose isomerase (PGI, EC 5.3.1.9) is an essential metabolic enzyme in all eukaryotes. An earlier study of the PgiC1 gene, which encodes cytosolic PGI in the grass Festuca ovina L., revealed a marked difference in the levels of nucleotide polymorphism between the 5’ and 3’ portions of the gene. Methods In the present study, we characterized the sequence polymorphism in F. ovina PgiC1 in more detail and examined possible explanations for the non-uniform pattern of nucleotide polymorphism across the gene. Results Our study confirms that the two portions of the PgiC1 gene show substantially different levels of DNA polymorphism and also suggests that the peptide encoded by the 3’ portion of PgiC1 is functionally and structurally more important than that encoded by the 5’ portion. Although there was some evidence of purifying selection (d N/d S test) on the 5’ portion of the gene, the signature of purifying selection was considerably stronger on the 3’ portion of the gene (d N/d S and McDonald–Kreitman tests). Several tests support the action of balancing selection within the 5’ portion of the gene. Wall’s B and Q tests were significant only for the 5’ portion of the gene. There were also marked peaks of nucleotide diversity, Tajima’s D and the d N/d S ratio at or around a PgiC1 codon site (within the 5’ portion of the gene) that a previous study had suggested was subject to positive diversifying selection. Conclusions Our results suggest that the two portions of the gene have been subject to different selective regimes. Purifying selection appears to have been the main force contributing to the relatively low level of polymorphism within the 3’ portion of the sequence. In contrast, it is possible that balancing selection has contributed to the maintenance of the polymorphism within the 5’ portion of the gene. Electronic supplementary material The online version of this article (doi:10.1186/s41065-017-0032-6) contains supplementary material, which is available to authorized users.


Background
Levels of nucleotide polymorphism have been shown to vary greatly between different parts of the genome (e.g. [1][2][3]), and there may also be variation in the levels of polymorphism within individual genes (e.g. [4][5][6][7]). A non-uniform pattern of nucleotide polymorphism within genes may arise if different types of selective pressure are operating on different regions of the gene (cf. [8,9]). Different regions of a gene may code for peptides that have different structural or functional significances, and the regions of a gene with more stringent structural and/or functional requirements are expected to be subject to stronger purifying selection [10] and, therefore, tend to show lower levels of nucleotide polymorphism than regions that are subject to less stringent constraints [11,12]. Positive directional selection may reduce the levels of local nucleotide polymorphism within a gene [9], while balancing selection may increase the levels of nucleotide polymorphism at, and in the vicinity of, the selected sites [13,14]. A classic example of a case where selection results in non-uniform levels of nucleotide polymorphism between different gene regions is that of the major histocompatibility complex (MHC) genes. These genes are crucial for the ability of a vertebrate host's immune system to detect evolving pathogens, and it is frequently suggested that the maintenance of the high levels of non-synonymous polymorphism in the MHC gene regions encoding the antigen binding site is a reflection of pathogen-driven balancing selection [15,16]. In addition to selective processes, varying rates of recombination and mutation, as well as stochastic processes, may also contribute to non-uniform levels of nucleotide polymorphism between different regions of a gene (cf. [11,17]).
The PgiC1 gene, which encodes the cytosolic version of the metabolic enzyme phosphoglucose isomerase (PGI, EC 5.3.1.9), in the grass Festuca ovina L., represents one of the few reported cases in which the levels of nucleotide polymorphism differ substantially between the 3' and 5' portions of a gene [18]. PGI catalyses the second step of glycolysis [19], and is also known to have diverse moonlighting functions (see the references in [20]). The functional PGI enzyme is formed by two monomers, with each monomer being composed of two main domains (the "small domain" and the "large domain") [21,22] (Fig. 1-a). High levels of allozyme/isozyme variation have been frequently reported for PGI in many different species [23]. Observed differences in enzyme activity between PGI variants in a number of species are consistent with observed associations between the PGI variation and environmental variables or life-history traitssuggesting that the loci coding for PGI may be under selection (e.g. [24,25]).
Festuca ovina is a perennial, tussock-forming and outcrossing grass, with wind-dispersed pollen and seeds [26]. The species has a broad ecological amplitude and is widespread in unfertilized grasslands in northern Europe (e.g. [26,27]). The steppe-like "alvar" grasslands on the Baltic island of Öland (Sweden) are characterized by a fine-scale edaphic mosaic, with moist and dry, and high and low pH microhabitats. Earlier studies suggest that cytosolic PGI isozyme variation in F. ovina may be involved in fine-scale microhabitat adaptation on Öland [26,28,29]. Analysis of replicated samples from different alvar sites shows that, despite the fact that F. ovina is strongly outcrossing, the frequencies of different cytosolic PGI isozyme electromorphs are significantly associated with microhabitat variation in the alvar grasslands and that electromorph frequencies change in response to experimental habitat manipulation [26,28].
In F. ovina, cytosolic PGI is coded for by two loci, PgiC1 and PgiC2 [30]. The "native" PgiC1 locus is Fig. 1 The structure of the PgiC1 gene and the 3-D protein structure of its gene product. a PGI dimer, coded for by PgiC1, in F. ovina and homology modelled in an earlier study [34]. One monomer is shown in yellow. In the other monomer, the large domain is shown in green and the small domain is shown in dark blue. The three active site residues (equivalent to Lys516, Glu360, and His391 in F. ovina) that are directly involved in the PGI isomerization reaction are shown in red. The rest of the monomer is represented in light blue. b The gene structure is summarized for the part of the PgiC1 gene corresponding to the 1 633 bp sequence of the 29 Öland sequences characterized in the present study. Boxes represent the exons and lines represent introns. The gene structure is scaled according to an earlier published PgiC1 gene sequence (GenBank accession numbers HQ616103). The 5' and 3' portions (see Fig. 2) of the PgiC1 gene that are compared in the present study include, respectively, exons 1-12 and exons [13][14][15][16][17][18][19][20][21][22] present in all F. ovina individuals, whereas PgiC2 is only present in some individuals and appears to have been horizontally acquired from a distantly related grass genus [29,[31][32][33]. Earlier analyses of the PgiC1 gene in F. ovina suggest that two PgiC1 amino acid codon sites may be affected by positive selection [34], and SNP (single nucleotide polymorphism) alleles at these two codon sites show significant associations with microhabitat variables in the alvar grasslands (Y Li, B Hansson, M Lönn, HC Prentice, unpublished results).
The uneven distribution of polymorphic nucleotide sites along the PgiC1 gene was noted in an earlier study that included five PgiC1 coding sequences from Skåne, S Sweden [18]. The longest intron (intron 12, Fig. 1-b) was used as a demarcation point between the polymorphic 5' portion of the gene and the, substantially less polymorphic, 3' portion of the gene. The aim of the present study was to investigate the possible evolutionary mechanisms that may have contributed to the contrasting levels of nucleotide polymorphism in the two portions of the PgiC1 gene in F. ovina. We analysed the levels of PgiC1 nucleotide polymorphism within a larger dataset (29 PgiC1 cDNA sequences) from F. ovina individuals collected from the alvar grasslands on Öland, and carried out a range of tests to assess the relative importance of different types of selection that may have contributed to the non-uniform pattern of nucleotide polymorphism within PgiC1.
The 3' portion of PgiC1 in F. ovina encodes the structurally important large domain and three functionally essential active site residues (Figs. 1-a and 2). The extensive inter-monomer interaction between the large domains of the two monomers is necessary for the formation of a stable PGI dimer [21] and the three active site residues (equivalent to Glu360, His391 and Lys516 in F. ovina) participate directly in the isomerization reaction of PGI [35]. If the 3' portion of PgiC1 codes for products that are subject to greater structural or functional constraints than the products of the 5' portion of the gene, then a relatively stronger level of purifying selection (i.e. negative selection) may be expected to have contributed to the low level of nucleotide polymorphism within the 3' portion of PgiC1 in F. ovina. The 5' portion of PgiC1 contains the amino acid codon sites 173 and 200 (Fig. 2). If these sites are under balancing selection (i.e. positive intraspecific diversifying selection), as suggested by [34], then balancing selection targeting the two sites might be expected to contribute to the high level of nucleotide polymorphism within the 5' portion of PgiC1. The present study provides support for the prediction that there is a stronger purifying selection on the 3' portion than on the 5' portion of PgiC1, and suggests that there is balancing selection on the 5' portion of the gene.

Plant material and sequences
The present study examined variation within 29 PgiC1 cDNA sequences (GenBank accession numbers KF487737-KF487765, [34]) from Öland populations of F. ovina. The sequences were derived from 15 individuals that were chosen to represent the five cytosolic PGI electromorphs (EMs 1, 2, 4, 5 and 6) that occur most frequently within populations of F. ovina on Öland [26,28,29] with a particular focus on the two most common electromorphs, EM 1 and EM 2 [34]. The sequences were obtained by, first, synthesizing the total cDNA from the total RNA of each studied F. ovina individual [34]. The PgiC1 cDNA was then PCR-amplified from the synthesized total cDNA, and the amplified PgiC1 cDNA was cloned and sequenced [34]. Two PgiC1 cDNA alleles were acquired from each (diploid) individual, giving a total of 30 alleles from the 15 studied individuals [34]. However, one of the alleles (Gen-Bank accession number KF487766) contained an aberrant (113 bp) insertion [34] and was excluded from the analyses in the present study, unless specified.
Each of the 29 analysed PgiC1 cDNA sequences covers 96% (1 633 bp) of the full-length (1701 bp, excluding  [34]. The three stars on the x axis represent the three active site residues (equivalent to Lys516, Glu360, and His391 in F. ovina) that are directly involved in the PGI isomerization reaction [35]. The grey dotted vertical line shows the location of intron 12, which is used as the demarcation point for defining the 5' and 3' portions of PgiC1 sequence. The brown dotted vertical line indicates codon site 200 which is under positive diversifying selection and located at or near to peaks of π T , Tajima's D and ω stop codon) PgiC1 coding sequence, and ranges from exon 1 to exon 22 ( Fig. 1-b). For comparative purposes, we also downloaded the five Skåne F. ovina PgiC1 coding sequences (GenBank accession numbers DQ225731-DQ225735) which were examined in the earlier study that noted the difference in the levels of polymorphism between the 5' and 3' portions of PgiC1 [18]. These five Skåne sequences represent the common cytosolic PGI isozyme electromorphs 1, 2 and 6, as well as the rare electromorph 8 [26,28]. Each of the five sequences covers 1 182 bp, out of the full-length PgiC1 coding sequence [32], and ranges from exons 5 to 11 and from exons 13 to 21.
The level of recombination was estimated (as the minimum number of recombination events, R M , using the method of Hudson RR and Kaplan NL [39] as implemented in DnaSP v. 5.10.01) for each of the two PgiC1 gene portions in the 29 Öland sequences. The level of recombination was also estimated as the population recombination rate (ρ = 4N e r, where Ne is the effective population size and r is the per-generation per-site recombination rate [40]), using the program omegaMap [41]. We used the same procedure as in [34] to run omegaMap, but used a sliding window of 10 codons to estimate ρ in the present study. The level of linkage disequilibrium (LD) within PgiC1 was estimated using r 2 statistics [42], calculated between all pairs of polymorphic sites (excluding a single site that segregates with more than two nucleotides [34]), using Haploview v. 4.2 [43]. The genotypes of the 15 studied Öland individuals [34], at each of the analyzed polymorphic sites, were used as input to the Haploview analyses. In order to generate a complete set of genotype data for the 15 individuals, as required for the Haploview analyses, we included the additional sequence (GenBank accession number KF487766)acquired from the 15 F. ovina individuals but containing an aberrant insertion that may have resulted from incomplete splicing of the PgiC1 precursor mRNA [34]. This insertion was removed for the Haploview analyses.
The dN/dS ratio (ω) (dN = non-synonymous substitution rate; dS = synonymous substitution rate) was estimated, together with ρ (using omegaMap), for each amino acid codon translated from the PgiC1 sequence. The estimated ω value was used to examine whether purifying (i.e. negative selection, ω < 1) or balancing (i.e. positive intraspecific diversifying) selection (ω > 1) may have contributed to the amounts and patterning of sequence variation within and between the two gene portions for the 29 studied PgiC1 sequences (cf. [41]). Sliding window analysis of ω was also carried out (manually, on the basis of the results from OmegaMap) with a window length of 99 and a step size of 12.
Neutrality tests, including the Hudson-Kreitman-Aguadé (HKA) test [11], Tajima's D test [44], Fay and Wu's H test [45], MacDonald and Kreitman's (MK) test [46] and Wall's B and Q tests [47] were also used to examine whether selection may have contributed to the amounts and patterning of sequence variation within each of the two PgiC1 gene portions in the 29 sequences. All the neutrality tests were carried out using DnaSP v.

Analysis of the evolutionary conservation of amino acid sites
The degree of evolutionary conservation at each of the respective PGI amino acid sites corresponding to the PgiC1 translated amino acid sites was estimated, on the basis of the phylogenetic relationships among a large set of homologous sequences, from a wide range of different species, using the online application ConSurf Server [48]. The database UniRef90 [49] was searched for sequences that were homologous with the F. ovina PgiC1 input sequence, using CSI-BLAST [50] (cutoff E-value = 0.0001; number of interactions = 3; maximum homologs to collect = 150). Within CSI-BLAST, redundant sequences were filtered out by clustering blast hits with a sequence identity of 95% or more and only using one representative of each cluster in the analysis. BLAST hits that shared a sequence identity of less than 35% with the input sequence were ignored. A multiple-species alignment (Additional file 1: File S1) of the acquired homologous sequences was constructed using MAFFT [51,52], and this alignment was then used to build a phylogenetic tree using the neighbour-joining algorithm as implemented in the Rate4Site program [53]. The level of evolutionary conservation was then estimated, as a conservation score for each amino acid site using an empirical Bayesian algorithm [54] implemented in the ConSurf Server. The lower the conservation score, the more evolutionarily conserved are the amino acid residues at that specific site. The translated amino acid sequence from a PgiC1 sequence (GenBank accession number KF487738), representing the most common PgiC1 sequence in F. ovina, was used as the input to the ConSurf Server.

Sequence variation
The analyses of the five Skåne PgiC1 coding sequences confirmed the earlier finding [18] that the 5' portion of the gene (L [sequence length] = 570, S = 30, N h = 5, π = 0.025, θ W = 0.025) was considerably more polymorphic than the 3' portion (L = 612, S = 3, N h = 4, π = 0.002, θ W = 0.002). Also in agreement with the observed pattern of sequence polymorphism within the five Skåne PgiC1 sequences, the analyses of the 29 Öland PgiC1 cDNA sequences showed that the 5' portion of the sequence (π T = 0.019, θ W = 0.020) was substantially more polymorphic than the 3' portion of the sequence (π T = 0.004, θ W = 0.007) (Fig. 2, Table 1). There was a significant difference between the π T values for each of the 12 exons (exons 1-12) within the 5' portion of the PgiC1 gene in the 29 sequences, and those for each of the 10 exons (exons 13-22) within the 3' portion of the 29 sequences (Wilcoxon rank sum test; W = 96, P = 0.019, Additional file 2: Table S1). The sliding window analyses of π T showed that the PgiC1 codon site 200 (under positive diversifying selection [34]) was within the highest peak of π T at exon 8 within the 5' portion of the gene (Fig. 2).

Recombination and linkage disequilibrium
The analyses of the 29 PgiC1 sequences from Öland revealed a high overall level of recombination (R M = 22; ρ = 0.217, Additional file 3: Table S2). The level of recombination in the highly polymorphic 5' portion (R M = 20; ρ = 0.383) was substantially higher than that for the less variable 3' portion of the gene (R M = 1; ρ = 0.026) ( Table 1 and Fig. 3_a). However, the matrix of r 2 values (Fig. 3_b) shows that there is a low level of LD throughout the entire PgiC1 gene, with no "strong LD" blocks (cf. [55]).

The ω and neutrality tests
The fact that the overall ω (dN/dS ratio) value for the entire PgiC1 sequence was substantially lower than 1 (mean ω over all the studied amino acid codons = 0.209, Additional file 3: Table S2) indicates that purifying selection has acted on the overall sequence. Purifying selection is also indicated for both the 5' portion (291 amino acid codons; average ω = 0.280) and the 3' portion (253 amino acid codons; average ω = 0.128) of the sequence. The fact that codons within the 3' portion of the PgiC1 sequence had a significantly lower ω than those in the 5' portion (Wilcoxon rank sum test; W = 42229, P = 0.003) suggests that the 3' portion may be under stronger purifying selection than the 5' portion. The sliding window analyses of ω showed a marked plateau of ω values (between exons 8-9 within the 5' portion) around the PgiC1 codon site 200 (Fig. 2), that a previous study had suggested was subject to positive diversifying selection [34]. There were 40 segregating sites out of the total of 570 nucleotide sites included in the analysis of the 5' portion, whereas there were only 14 segregating sites out of 612 sites in the 3' portion (Additional file 4: Table S3). The levels of interspecific DNA divergence between the  Table S3). However the HKA test didn't reject the null neutral model (P = 0.064, Additional file 4: Table S3). Wall's B and Q values were significant for the 5' portion of the PgiC1 sequence (Table 1), indicating an excess of LD between adjacent segregating sites, which may reflect balancing selection (cf. [47,56]). Wall's B and Q values were non-significant for the 3'portion of the PgiC1 sequence (Table 1). There was a negative, but non-significant (P = 0.064, Table 1) Tajima's D for the 3' portion of the PgiC1 sequence, while the Tajima's D for the 5' portion was near zero (D = -0.195, Table 1). The sliding window analyses of the Tajima's D showed that the highest peak of D was located at the PgiC1 codon site 200 (exon 8) (Fig. 2) which was identified as a potential target of positive diversifying selection in an earlier study [34]. The D value at this peak (D = 1.439) was near-significant (P = 0.060; post hoc significance test without correction for multiple tests). The Fay & Wu's H values were non-significant for both the 5' (P = 0.669) and the 3' (P = 0.148) portions of the PgiC1 sequence ( Table 1). The sliding window analysis of Fay and Wu's H showed that the deepest valley of H was at exon 16 (within the 3' portion), near which a striking valley of Tajima's D was also observed (Additional file 5: Figure S1). Both D and H had negative values in these valleys (Additional file 5: Figure S1), and according to the post hoc tests of significance (not subject to multiple test correction), the H was significant (H = -1.862, P = 0.006), while D was near-significant (D = -1.509, P = 0.095).  [42] for all the polymorphic nucleotide sites (except for one that segregates into more than two nucleotides [34]). Shades of grey indicate the r 2 values, ranging from r 2 = 0 (white) to r 2 = 1 (black). The proportional spacing of the polymorphic sites, which is scaled according to an earlier published PgiC1 gene sequence (GenBank accession number HQ616103), is indicated by black vertical lines on a white horizontal bar (shown above the LD matrix) The MK test rejected the neutral null model for the 3' portion of the PgiC1 sequence (Fisher's exact test, P = 0.018). The significant MK test reflects a lower D N /D S ratio (0.064) relative to the P N /P S ratio (0.556), which is consistent with purifying selection within this portion of the gene ( Table 2). The MK test for the 5' portion of the sequence was non-significant (D N /D S = 0.235; P N /P S = 0.323; Fisher's exact test, P = 0.755, Table 2).

Degree of amino acid site conservation
The PGI amino acid sites showed a tendency to be less evolutionarily conserved within the region corresponding to the 5' portion of the translated PgiC1 amino acid sequence (291 amino acid sites; mean conservation score = 0.105) than those corresponding to the 3' portion of the translated PgiC1 amino acid sequence (253 amino acid sites; mean conservation score = -0.121) (Wilcoxon rank sum test; W = 39886, P = 0.093) ( Table 1, Additional file 6: Table S4).

Discussion
Analyses of the 29 Öland F. ovina PgiC1 cDNA sequences in the present study, together with the analyses of the five Skåne PgiC1 sequences, shows that the nucleotide polymorphism is not evenly distributed within the PgiC1 gene (Fig. 2). The 5' portion of the PgiC1 sequence is substantially more polymorphic than the 3' portion, and our analyses suggest that the difference in the level of polymorphism may have resulted from different selective regimes in the two portions of the gene.
Which evolutionary mechanisms may have contributed to the relatively low level of nucleotide polymorphism within the 3' portion of the PgiC1 sequence?
The parts of a protein that are more important for the stability and/or function of an enzyme are likely to be subject to stronger purifying selection [10] and, therefore, tend to exhibit a lower level of intraspecific polymorphism than the parts of the protein with less stringent functional and structural requirement [11,12]. In F. ovina, the 3' portion of the PgiC1 sequence encodes a peptide that includes the structurally important large domain of the PGI monomer [21] and the three most conserved, functionally essential, active site residues [35]. The fact that the peptide translated from the 3' portion of PgiC1 contains important components of the 3-D structure of PGI suggests that this peptide may have a greater overall importance for the function of PGI than the peptide translated from the 5' portion of PgiC1. The suggested difference in the functional and structural significance of the translated peptides between the two portions of PgiC1 is supported by the estimated amino acid conservation scores in the present study, which show that the PGI amino acid sites corresponding to the 3' portion of the PgiC1 sequence tend to be evolutionarily more conserved than those of the 5' portion in a wide range of species (cf. [48]).
If the peptide translated from the 3' portion of PgiC1 is more important for the function of the PGI enzyme than the peptide from the 5' portion, then the 3' portion of the gene may be expected to be under stronger purifying selection than the 5' portion (cf. [10,12]). In line with this expectation, the average ω value for the 3' portion of PgiC1 was considerably lower than that for the 5' portion. The average values for both portions were much lower than 1 (suggesting purifying selection [41]). The fact that the value of Tajima's D was more strongly negative for the 3' portion (D = -1.363; P = 0.064) than for the 5' portion (D = -0.195; P = 0.484) of the PgiC1 sequence, is also consistent with the 3' portion being under stronger purifying selection than the 5' portion of the sequence (cf. [57]). The significant MK test result for the 3' portion and the non-significance of the test for the 5' portion of the PgiC1 sequence may, again, suggest that the 3' portion is under stronger purifying selection than the 5' portion of the sequence. The significant MK test for the 3' portion of PgiC1, with a lower ratio of fixed non-synonymous/synonymous substitutions between species (D N /D S = 0.064) than the ratio of nonsynonymous/synonymous polymorphism within species (P N /P S = 0.556), suggests purifying selection, where the within-species nonsynonymous polymorphism that is maintained in selection-mutation balance consists mainly of weakly deleterious mutations [58].
Because the PGI protein structural elements are closely similar in a wide range of organisms (e.g. [21,35,59]), the functional and structural significance and the pattern of Two of these studies, on the butterflies Melitaea cinxia [60] and Colias eurytheme [61,62], reveal a nearly uniformly high level of nucleotide polymorphism across the entire Pgi gene. The pattern of nucleotide polymorphism in these two species was interpreted in terms of balancing selection (targeting a few amino acid sites located within the large domain) and moderate to high levels of LD within the entire Pgi gene [60,61]. A high level of synonymous polymorphism was observed regionally around a nonsynonynous mutation within the 5' portion of the PgiC gene in Arabidopsis thaliana, and interpreted in terms of balancing selection and an overall low level of recombination [63]. In contrast to the Pgi genes of the three species mentioned above, the F. ovina PgiC1 gene showed a high level of recombination (R M = 22): M. cinxia had a R M of 6 [60], whereas C. eurytheme R M = 11 [61] and A. thaliana showed no clear evidence of recombination [63]). The high estimated recombination value is consistent with the fact that F. ovina is a highly outcrossing species [26], and may indicate that different parts of the sequence could have evolved (at least to some extent) independently, resulting in a non-uniform pattern of nucleotide polymorphism across the sequence. Most of the identified recombination is within the 5' portion of the PgiC1 sequence, while the 3' portion of the sequence shows limited recombination. However, the relatively low level of recombination detected for the 3' portion of the sequence may be a consequence of purifying selection having removed variation at both the sites under selection and linked neutral sites (cf. [64])thereby removing the molecular signatures of recombination and lowering the numbers of identified recombination events (cf. [65]). The hypothesis that purifying selection may have removed the detectable signatures of recombination within the 3' portion of PgiC1 agrees with the PgiC1 LD matrix, where a uniformly low level of LD is observed along the entire PgiC1 cDNA sequence.
In addition to purifying selection, a selective sweep may also have contributed to the low level of nucleotide diversity within the 3' portion of the gene: the near, but nonsignificant (P = 0.064), result of the HKA test suggests that variation-reducing selective forces may be acting on the PgiC1 3' portion and/or variation-increasing forces acting on the 5' portion (cf. [11]). The sliding window analyses revealed striking valleys of both Fay and Wu's H and Tajima's D at exon 16 (within the 3' portion) (Additional file 5: Figure S1). In these valleys, the Fay and Wu's H is significantly negative (H = -1.862; P = 0.006) and the Tajima's D is also near-significantly negative (D = -1.509; P = 0.095). These negative values reflect a high-frequency of derived SNPs (around the valleys), suggesting a selective sweep (cf. [45]). A nearby shallow valley of total nucleotide diversity (Fig. 2) is also suggestive of a selective sweep [9]. The valleys of D and H at exon 16 are close to active site residue His391 (Additional file 5: Figure S1). Sequence patterns that are identified as the signatures of selection in neutrality tests (e.g. Tajima's D and Fay and Wu's H) may also result from factors such as population size changes or reflect population structure [66][67][68]. In the case of the highly outcrossing populations of F. ovina on Öland, population structure is unlikely to be a confounding factor in the neutrality tests. However, because possible confounding effects resulting from changes in population size cannot be excluded, the selective sweep suggested by the H and D tests should be interpreted with caution.
Which evolutionary mechanisms may have contributed to the relatively high level of nucleotide polymorphism within the 5' portion of the PgiC1 sequence?
An earlier study [34] identified two F. ovina PgiC1 codon sites (sites 173 and 200) as candidate targets of balancing selection (i.e. positive intraspecific diversifying selection) with a considerably stronger signal of selection for site 200 than for site 173 [34]. The sliding window analyses in the present study support the results of the earlier study and reveal a marked plateau of ω around the selected site 200 (but no peak associated with site 173) (Fig. 2). Protein structure modelling suggests that the translated amino acid polymorphism at these two PgiC1 sites may affect either the interaction between the two monomers, or the domain-domain packing of the encoded PGI enzyme and, thus, influence the biochemical properties of the cytosolic PGI enzyme in F. ovina [34]. Biochemical studies in humans have shown that mutations at a few amino acid sites, which have similar 3-D structural locations to the two selected amino acid sites in F. ovina, significantly affect the activity of the PGI enzyme [69,70].
Within F. ovina PgiC1, both codon sites 173 and 200, which were previously identified as candidates for balancing selection [34], are located within the 5' portion of the sequence (Fig. 2). The significant results from the Wall's B and Q tests (Table 1) support the suggestion that there has been balancing selection on the 5' portion of the PgiC1 sequence in F. ovina (cf. [47]). In addition to the significant B and Q tests, signals of balancing selection were also detected at the putative selected site 200. The highest peaks of, respectively, positive Tajima's D and total nucleotide diversity were observed at or around codon site 200 in the sliding window analyses (Fig. 2), and these peaks are a typical signature of balancing selection (cf. [13,14]). No marked peak or plateau for polymorphism or for Tajima's D was observed for the second putative selected site (site 173) in the present study (Fig. 2), in agreement with the previous study [34] which showed a weaker signal of balancing selection for site 173 than for site 200.