A comparative genomics approach revealed evolutionary dynamics of microsatellite imperfection and conservation in genus Gossypium

Ongoing molecular processes in a cell could target microsatellites, a kind of repetitive DNA, owing to length variations and motif imperfection. Mutational mechanisms underlying such kind of genetic variations have been extensively investigated in diverse organisms. However, obscure impact of ploidization, an evolutionary process of genome content duplication prevails mostly in plants, on non-coding DNA is poorly understood. Genome sequences of diversely originated plant species were examined for genome-wide motif imperfection pattern, and various analytical tools were employed to canvass characteristic relationships among repeat density, imperfection and length of microsatellites. Moreover, comparative genomics approach aided in exploration of microsatellites conservation footprints in Gossypium evolution. Based on our results, motif imperfection in repeat length was found intricately related to genomic abundance of imperfect microsatellites among 13 genomes. Microsatellite decay estimation depicted slower decay of long motif repeats which led to predominant abundance of 5-nt repeat motif in Gossypium species. Short motif repeats exhibited rapid decay through the evolution of Gossypium lineage ensuing drastic decrease of 2-nt repeats, of which, “AT” motif type dilapidated in cultivated tetraploids of cotton. The outcome could be a directive to explore comparative evolutionary footprints of simple non-coding genetic elements i.e., repeat elements, through the evolution of genus-specific characteristics in cotton genomes.


Background
Microsatellites are DNA structural elements in which a short sequence pattern (motif ) is repeated by various numbers and ubiquitously presented in genomes of eukaryotes and prokaryotes. Various mechanisms like replication slippage, unequal crossover, realignment after disassociation of replicating strands, mispairing and base substitution are causative to variations which ultimately lead to extensive polymorphism in microsatellites [1,2].
Replication slippage events are majorly responsible for extensive repeat length variations, and mispairing of replication strands is key determinant for base substitution in DNA sequence [3]. Moreover, stable wobble mispairing and inefficiency of mismatch repair system could incorporate point mutations anywhere in genome sequence [4]. These processes could impinge on microsatellites and generate repeats with occasional mismatch, motif imperfection, in their repeat units.
During the replication process, insertion or deletion in repeat motif units due to reduced strand specificity or slippage errors could generate length variations. Various models comprehend and illustrate the mechanism of replication slippage event [5]. A single unit repeat is inserted or deleted at a time according to stepwise mutation model [6], while repeat unit variations were found associated to a distinct probability distribution under specific assumptions [7]. Similarly, another model proposed generation of microsatellites as 3′ extension of retrotransposons in a way similar to polyadenylation of mRNA [8]; while, idea of probable generation and insertion of microsatellites in 5′ region and inside the mobile elements further extended the model [9]. On contrary, the mechanisms underlying mispairing and interruptions leading to single nucleotide changes are poorly understood [10].
Non-coding DNA elements, including microsatellites, also experience various evolutionary forces and show high mutation rates (10 −2 -10 −6 ) in response to selection forces as these usually go unnoticed [11]. Thus, higher mutability and hyper variability of microsatellites could be key determinants of a dynamic equilibrium state in which rapid loss or conservation of non-coding DNA elements exist in species among or within a clade [12]. Various studies reported microsatellite conservation over longer time [13], and recently a study reported conservation up to 450 million years ago (MYA) in vertebrates [14]. Similarly, evolutionary forces might regulate the mechanism which drives fate determination processes of noncoding or coding DNA elements. Thus, conservation and/or deterioration of DNA structural elements over a geographic time period determine impact of species-evolving processes like speciation, diversification, domestication and even duplication through polidization in plant genomes.
Non-coding elements constitute a major proportion of DNA in all forms of life and play a crucial role in modulating coding regions of DNA. Occurrence and applicability of microsatellites in coding sequence (CDS) have elucidated their functional importance [15]. Since microsatellites are implied to regulate biological functions [16], mutations in coding microsatellites could be informative and even point mutation might induce deleterious effects. However, our brief understandings about mechanisms underlying motif imperfection, either through base substitutions or point mutation, cause an eminent impediment in this regard.
In this report, genome-wide motif imperfection pattern were discerned in 13 plant genomes of diverse origins. The relationship between repeat length and degree of motif imperfection with its impact on the genomic abundance of imperfect microsatellites was determined, particularly for Gossypium species. Moreover, evolutionary patterns of microsatellite conservation and/ or loss among Gossypium species were also established to ascertain structural consequences of whole genome duplication and allopolypolidization events through evolution of cultivated cotton tetraploids.

Genome assemblies of 13 plant species
The genome sequences of thirteen plant species were investigated comprising ten dicots and three monocots. The dicot species belonged to family Brassicaceae (Arabidopsis thaliana and Brassica rapa), Fabaceae (Glycine max and Phaseolus vulgaris), Salicaceae (Populus trichocarpa) and majorly to Malvaceae (Gossypium arboreum, G. raimondii, G. hirsutum, G. barbadence and Theobroma cacao). The three monocots belonged to family Poaceae (Brachypodium distachyon, Oryza sativa and Zea mays). In addition, coding sequences of annotated genes were also searched for imperfect repeats among four Gossypium species. The genome assemblies and CDS sequences of Gossypium species were retrieved from Cottongen [17], while genome sequences of other nine plant species were obtained from NCBI Genome Portal [18]. The name of each species was abbreviated to four letters where first capital letter denoted to genus and trailing three letters as species name.

Imperfect microsatellites identification
The SciRoKo (v3.4) program utility [19] was used with default imperfect search parameters to identify imperfect microsatellites of varying motif length from 1 to 6 nucleotide (nt) that were designated as 1-nt, 2-nt, 3-nt, 4-nt, 5-nt and 6-nt among 13 plant genomes. Imperfect search criteria regarding repeat length and mismatch penalty were modified, as previously described [20]. Maximum number of successive mismatch was set to '3' , and minimum repeat length was set to 15 nt. Compound microsatellites were excluded where maximum distance for association between two repeats was less than 100 nt. For motif standardization, each microsatellite motif underwent a two-step complete standardization procedure. In the first step, repeat motifs like "AG" and "GA" were all categorized as "AG" and termed as partially standardized motifs. Subsequently, in the second step, reverse complement sequence of motifs was determined and all microsatellites were assigned to completely standardized motifs. Eventually, motifs like "AG", "GA", "TC" and "CT" were all designated to a single group "AG"" for further analysis. Such complete motif standardization resulted in 2, 4, 10, 33, 102 and 350 group categories for 1-nt, 2-nt, 3-nt, 4-nt, 5-nt and 6-nt repeat motifs, respectively.

Motif imperfection and repeat length analyses
The genomic abundance of imperfect repeats was ascertained with varying mismatch counts and repeat lengths. Initially, short length repeats were selected and motif imperfection was employed to discriminate between perfect (with no mismatch) and imperfect microsatellites (with mismatch >0) among all genomes. It was noticed that loci with repeat length < 20 nt did not carry mismatch, thus were excluded from further analysis. While, repeats <25 nt had only '1' mismatch per locus and there were up to '2' mismatch in repeat loci <30 nt. The two datasets were mined separately, one having genomic abundance of microsatellites with '0' or '1' mismatch (no or low degree imperfection). While, other comprised of exactly '1' or '2' mismatch (low or higher degree imperfection) for repeats of length 20 to 24 nt (dataset I) and 25 to 29 nt (dataset II), respectively.
Thereafter, three different analytical strategies were employed. Firstly, principal coordinates method [21] was used to conduct generalized discriminant analysis on both datasets separately. The analysis generated canonical axes scores which were used to determine correlation among the variables, and statistical significance was estimated by permutation (n = 9999) method implemented in 'CAP' program [22]. Moreover, the R package [23] was also used to test the significance of canonical correlations by add-on package 'CCP' [24]. Secondly, longer microsatellites of length 35,40,45,50,55,60,65,70, 75 and 80 nt were targeted with different levels of imperfection (1, 2, 3 and 4 mismatch per loci). The relationships between motif imperfection and varied length repeats were estimated using permutational analysis of variance PERMANOVA [25] by calling "adonis" function under package vegan [26], and statistical significance of F-statistic was measured (0.05) after permutations (n = 999) in R environment. Thirdly, the genomic abundance of imperfect microsatellites was related to mismatch counts and repeat length through t-statistic and significant difference were determined (0.05). A linear regression model was fitted to mismatch count and repeat length in each Gossypium species.

Transposable elements (TEs) distribution of perfect vs imperfect microsatellites
Perfect (no mismatch) and imperfect repeats (mismatch ≥1) were compared for relatedness to nearby intact TEs. Each chromosome was considered as independent segment, and TEs abundance was determined among the four cotton genomes in vicinity of microsatellites (500 nt flanked region on both sides) for both sets of repeats. Then, a linear regression model was fitted to the repeat density and TEs abundance in each Gossypium species.

Motif imperfection in coding region of Gossypium species
Annotated coding sequences were searched for imperfect repeats under default criteria. All publically available annotated single nucleotide polymorphism (SNP) markers sequences [17] for cotton genomes were downloaded and employed for functional characterization of imperfect microsatellites in coding region.

Microsatellites conservation among Gossypium species
For conservation analysis, microsatellite libraries were developed separately for each species containing repeats flanked with 350 nt on both sides. Because of the drawbacks pertained to sequencing homo-polymers, 1-nt repeats were excluded. Moreover, all compound and overlapping sequences were filtered out and Basic Local Alignment Search Tool (BLAST) was used to validate uniqueness of each locus by NCBI BLAST 2.2.31 [27] through self-BLASTing each library and discarding false positives. A custom perl script was used to hard mask repeat sequences for each locus, and genome to sub-genome (diploid vs sub-diploid in tetraploids) BLAST searches was conducted among libraries. The output was filtered for algorithms like alignment ≥50% of query cover, identity ≥70% and E_value 10 −10 . Since a whole genome duplication (WGD) event was reported through cotton tetraploids' evolution [28], duplication of each progenitor loci was allowed up to two physical positions for pair wise comparisons (A 2 vs A T & D 5 vs D T ) among homologous chromosomes. The results were validated in reciprocal BLAST searches. Thereafter, conserved microsatellites proportion was employed to develop an intuitive diagram for genome-wide microsatellite conservation by motif size using circos tool [29].

Estimate of microsatellite decay during paleopolyploidization
The WGD event involved through paleopolyploidization of cultivated tetraploid species was estimated with varied divergence time [30]. A representative median divergence time as~6 MYA was used. Assuming steady loss of microsatellite loci over the period of divergence time, an exponential fitted decay function was employed to estimate "comparative exponential decay" of microsatellite in G. hirsutum and G. barbadence. On contrary, the "relative decay" estimates of varying motif repeats in tetraploids were determined by comparing proportion of conserved repeats. Thereafter, "relative decay" and "comparative exponential decay" were fitted to an exponential decay function. All non-parametric test statistics were measured and tested for significance (0.05) in R environment.

Microsatellite relative abundance
The repeat density patterns in cotton diploids exploited evolutionary footprints regarding impact of paleopolyploidization event on it in three different ways. Firstly, proportional abundance of microsatellites in diploid and tetraploid Gossypium species was estimated by comparing their relative abundance in T. cacao, a close relative belong to Malvaceae. Secondly, the relative abundance of microsatellites according to motif kind and size in four cotton genomes was determined individually. And lastly, comparative abundance of standardized 2-nt motif was analyzed in four cotton genomes. All nonparametric test statistics were measured and tested for significance (0.05) in R environment.

Frequency and distribution of microsatellites among monocots and dicots
Microsatellite distribution of varying motif length (1-6 nt) was examined in ten dicots and three monocots. Generally, 2-nt and 3-nt repeats were found in higher percentage in dicots and monocots (Table 1), respectively. While, Atha exhibited almost equal proportion of three motifs (1-3 nt). A plentiful abundance of 3-nt microsatellite was observed in Osat (33.67%) and the trend persisted among Bdis and Zmay. Among dicots, the proportion of 2-nt repeats prevailed and Brap featured the highest proportion of it (39.67%). However, frequency of 5-nt microsatellites predominated and featured specific to four Gossypium genomes. Reckoning to the four cotton genomes, a drastic reduction in percentage of 2-nt repeats was observed in tetraploids (Ghir and Gbar) up to 11.95%. While, density of microsatellites with short motif repeats decreased, those with longer motifs (4-6 nt) exhibited incessant abundance in tetraploids compared to their progenitors (Table 1).
Our data depicted differential variations in repeat density (per Mb) and frequencies of imperfect microsatellites related to genome sizes among 13 plant genomes (Additional file 1: Table S1). Density of imperfect microsatellites varied from 15.61 to 49.07%, and an eminent variation pattern was observed in monocots compared to dicots. Lower repeat density was noticed for species with genome sizes bigger than 1 Gb (10 9 bases) like Zmay, Ghir, Gbar and Garb and the trend persisted except for Bdis. All species depicted differential variations in imperfection percentage, and species belonging to same genus (Gossypium) exhibited varied extent of imperfection. Moreover, differential variations in frequency of imperfect microsatellites was observed among chromosomes of 13 genomes (Additional file 2: Table S2). The four Gossypium species contained substantial variations among cognate homologous chromosomes.

Motif imperfection and repeat length relationship
The average length of imperfect microsatellites increased over perfect ones among 13 plant species (Additional file 3: Table S3). Based on significant canonical variations observed for both datasets (dataset I and II), the genomic abundance of microsatellites was found related to variations in repeat length and motif imperfection. The results showed a significant relationship between perfect and '1' mismatch repeats where first squared canonical correlation to be 0.958 (p = 0.0002). Likewise, a significant canonical correlation was determined with first squared canonical variation to be 0.926 (p = 0.0008) for dataset II in which '1' and '2' mismatch repeats of varied lengths (25,26,27,28, and 29 nt) were included. In dataset I, '1' mismatch repeats explained lesser canonical variation while '2' mismatch repeats commuted in dataset II (Fig. 1). These findings affirmed that repeat length and motif imperfection were somehow modulated in subordinated ways.
Degree of motif imperfection (mismatch count) could be another factor whose relationship with repeat length might determine genomic abundance of imperfect microsatellites. Outcome of PERMANOVA analysis suggested a significant relationship among mismatch count, repeat length and frequency of imperfect microsatellites. The variation in genomic abundance of imperfect microsatellites was significantly related to repeat length (F = 8.050, p = 0.0029) and motif imperfection (F = 1.495, p = 0.0055). These findings speculated existence of an evolutionary mechanism which determined length and motif heterogeneity of microsatellites and the pattern prevailed among the studied set of genomes. Pair-wise Pearson correlation estimates were calculated by controlling motif imperfection (Additional file 4: Table S4a-b) for different repeat lengths (35,40,45,50,55, 60, 65, 70 and 75 nt). The results established strong relationships among different length microsatellites carrying varying mismatch ( Table 2). Genomic abundance of '1' mismatch repeats was found more related to '2' mismatch repeats, and the relationship maintained with '3' mismatch repeats. The scenario became more stringent for pair-wise comparisons between '2' mismatch and '3' mismatch repeats. Thus, the genomic abundance of loci with lower degree of imperfection (1 mismatch) was related to varying repeat lengths. Moreover, repeats with low imperfection (1 mismatch) seemed to modulate genomic abundance of repeats with moderate imperfection (2 mismatch). Similarly, repeats with higher imperfection  The repeat length presented different lengths while frequencies of repeats harbored 1 mismatch (low imperfection), 2 mismatch (moderate imperfection) and 3 mismatch (high imperfection) were compared among 13 plant species. All estimates were significant while underlined were non-significant at 0.05 (3 mismatch) were more inflected by repeats which entertained '2' mismatch. These findings suggested a significant role of motif imperfection in determining length of repeats.

Motif imperfection in Gossypium genomes
To draw deep insights into evolutionary footprints, subsets of mismatched repeats in four Gossypium species were considered for following analyses. Linear regression between repeat length and motif imperfection was determined and a prominent relationship was observed in each case (Fig. 2), but with varied degree of variation accounted in the model (Additional file 5: Table S5). The genomic abundance of intact TEs was determined in 500 nt flanking region (both sides) of perfect and mismatched microsatellites (Additional file 6: Table S6). The two sets of repeats were compared for relatedness between repeat frequency and intact TEs, and significant relationships were observed (Additional file 7: Figure S1a-d).
Generally, fewer TEs were observed in Grai and D T subgenomes; while, higher abundance of TEs discerned for Garb and A T sub-genomes of tetraploids. Moreover, imperfect repeats were more colligated to genomic abundance of TEs than perfect ones except in Garb; where perfect repeats were more related than mismatched ones. This relatedness was further affirmed by Pearson correlations and significance determined at 0.05 (Table 3). Correlation values of imperfect repeats were also expectedly higher than perfect ones, but not for Garb in which perfect repeats were most likely imbedded in TEs.
Differential abundance of imperfect repeats along with intact TEs distribution was determined among individual chromosomes. Interestingly, the frequency of imperfect microsatellites and intact TEs for all species was found at par (Additional file 6: Table S6). The higher abundance of intact TEs was observed in Garb than Grai, while vice versa was true for imperfect repeats. Chromosomes A05, A08, A10 and A11 exhibited higher density of both repeat elements, while a mixed pattern was observed for D03, D07 and D12 among tetraploids. Structural anomalies of chromosomes in two genetically distant progenitors, evolutionary processes, biased selection forces, mutations, deletions and translocations of larger DNA segments could be causative to such abrupt differentiations among tetraploids and diploids. Thus, comparative distributive pattern and its relatedness suggested more likely presence of intact TEs wherever a mismatch found in a microsatellite.

Motif interruptions in coding sequence
Coding sequences of Gossypium species were also searched for imperfection and results were summarized in Additional file 8: Table S7. Generally, tri-and hexa-motif repeats prevailed, while only tri-motif repeats accounted for >70% of microsatellites in CDS regions (Table 4). It was expected due to triplet and degenerate nature of codons. The motif imperfection in Grai repeats was elevated and slight increment in average repeat length was noticed. The tandem repeat density was reduced than genomic ones, while the higher proportion of genes in Ghir harbored repeats in exons (Additional file 8: Table S7). Overall, the motif imperfection mechanisms targeted triand hexa-repeats in genomic and coding sequences respectively (Additional file 9: Figure S2).

Long motif repeats well conserved in Gossypium species
Our data showed predominant abundance of large microsatellites in Gossypium species compared to other dicots and monocots. Microsatellite conservation pattern elucidated the possible influence of paleopolyploidization event and traced out the evolutionary footprints in Gossypium species. The microsatellites proportion of each A T and D T sub-genome in two cultivated species was ascertained which retained from the progenitors (Fig. 3). The shorter repeats were found less conserved, while repeats with long motifs remained intact as in progenitor species (Additional file 10: Table S8). Among short repeats, the 2-nt microsatellites were found more deteriorated in tetraploids where Gbar retained the lowest proportion from two diploids. On the contrary, 3-6 nt repeats were more conserved, while 5-nt repeats were highly conserved (56.86-78.26%) in both tetraploids. Overall, Ghir retained higher proportion repeats from Garb (66.58%); while, slightly lower proportion retained (61.39%) from another progenitor. Since microsatellites were more deteriorated through series of evolutionary events in Gbar, thus fewer loci retained intact (37.83-43.99%).

Estimation of microsatellite decay
The conserved microsatellite proportion and divergence time were fitted to an exponential decay function and microsatellites decay rate of varied length motifs was calculated among Gossypium species. A static decay was observed in Ghir for loci from both progenitors except 2-nt repeats; of which D T originated repeats were lost significantly faster (Friedman statistic = 48.44, p = 0.00001) (Additional file 11: Figure S3a). Whereas in Gbar, 2-nt repeats of D T origin were highly distorted and lost due to significantly faster decay (Friedman statistic = 55.083, p = 0.00001) (Additional file 11: Figure S3b). In comparison, Ghir retained higher proportion of microsatellites from diploid species than Gbar. Similarly, large motif repeats (3-6 nt) were more evolutionary stable and conserved compared to 2-nt repeats in both A-genome and D-genome of Gossypium lineage. Overall, repeats loci in Ghir depicted slower decay of repeats than Gbar (Fig. 4a), while a significant faster decay (Friedman statistic = 18.762, p = 0.0021) of Grai repeats was observed in tetraploids compared to Garb microsatellites. Considering both "relative decay" and "comparative exponential decay", all repeat motif exhibited faster decay in Gbar than Ghir (Fig. 4b).
Biased distribution of dinucleotide repeats in Gossypium species A WGD event followed by allopolypolidization could be causative to observed evolutionary pattern of microsatellite conservation in Gossypium tetraploids. The relative abundance estimates of four Gossypium species' repeats relative to T. cacao (as of their closest phylogenetic ancestor) further substantiated the microsatellites decay pattern in Gossypium lineage (Additional file 12: Figure S4a-d).

Discussion
The study provided a comprehensive account of microsatellite imperfection in 13 plant genomes and explored motif imperfection and repeat length relationships in Gossypium lineage. The microsatellite conservation pattern and evolutionary footprints of non-coding repetitive DNA elements were also discerned in closely related species of Malvaceae family. Genomic abundance of imperfect microsatellites is well studied in simple organisms like viruses [31], mushroom [32], complex animal [33], insect [34], human [35] and even in green plants [36]. However, a clear understanding of imperfection mechanism, its role in genesis, preserving sequence variations in non-coding repeat elements and their potential impact on gene regulation are not well understood. As focus on evolution of microsatellites got reincarnated with the advent of high throughput sequencing technology [37,38], this study emphasized on role of motif imperfection in microsatellites stability. Plant species were emphasized as microsatellites are being extensively employed in revealing diversity, heterozygosity and exploiting phenotypic variations in plants [39,40]. Moreover, microsatellites have also been employed to unravel polyploidy in plants [41]. Evolutionary dynamics of microsatellites were revealed in genus Gossypium, a best model to investigate ploidy increase impact on microsatellite imperfection and conservation.
Various tools are available for imperfect repeats search, but SciRoKo [19] was preferred since it provides a utility to efficiently digest large genome datasets into 50 Mb chunks. Moreover, assessing efficiency of various tools was beyond the scope of this study. Our results witnessed predomination of 3-nt and 2-nt repeats in Fig. 3 Intuitive diagram shows conserved microsatellite proportion based on motif length. Motif repeats like 2-nt (dinucleotide), 3-nt (trinucleotide), 4-nt (tetra-nucleotide), 5-nt (penta-nucleotide) and 6-nt (hexa-nucleotide) were compared for conservation among Garb (G. arboreum), Grai (G. raimondii), Ghir (G. hirsutum) and Gbar (G. barbadence) monocots and dicots respectively. Many researchers have reported 3-nt repeats' abundance as characteristics in monocots [42,43]. Similarly, predominate abundance of 2-nt repeats has been reported in dicots [44,45], but paramount abundance of other motifs is not out of the scope. The resultant fluctuations might be owed to variations in search parameters. Appropriate parameters were employed for imperfect microsatellites search in all studied plant genomes (Table 1 & Additional file 1:  Table S1). Likewise, the distribution and abundance of imperfect repeats varied among the studied species, and it also varied among homologous chromosomes of species (Additional file 2: Table S2). Previous studies supported varied distribution of imperfect repeats among different species of genus Drosophila [46]. The trend was not affected by varying quality of 13 genome assemblies as a consistent pattern of motif imperfection was observed.
The genomic abundance of imperfect microsatellites was intricately related to degree of motif imperfection and repeat length variations. Previously, repeat unit variations were found linked to repeat length and mutations [47,48], while an equilibrium state was proposed to regulate abundance of mismatched repeats [1]. A varying stability of mismatched repeats due to sequence interruptions has been experimentally tested [49]. While, a study determined relationship between mutation and repeat units and explained why longer repeats likely to undergo contractions and short repeats experience expansion in case of slippage mutations [50]. The longer repeats (>40 nt) with no mismatch or lower imperfection (mismatch 1 and/or 2) were more abundant than those with higher degree of imperfection (Additional file 4: Table S4a-b). Moreover, a significant relationship between mismatch count and repeat length existed among Gossypium species (Additional file 5: Table S5). Generally, the longer repeats do not remain stable and immediately split to give shorter repeats as these undergo contraction. However, our results predicted longer repeats were more likely to harbor mismatch, but these repeats were less prone to contraction as motif interruptions could impart stabilizing effect to such repeats [51]. Furthermore, a low to moderate positive correlation between microsatellites and TEs was recurrently reported [52,53] but not between imperfect microsatellites and TEs. Due to sequence interruptions, the longer repeats were found more related to TEs abundance except for Garb, where perfect repeats were more likely to be present in the vicinity of TEs (Table 3). Thus, it can be speculated that microsatellite genesis, prolonged stability, length variations and motif imperfection of mismatched repeats could be better comprehended when the replication slippage errors, polyadenylation of 5′ and 3′ regions of TEs are considered simultaneously.
Recent studies reported that 5-nt repeats were predominated in Garb and Grai [54,55]. However, another study reported 6-nt repeats prevailed in Ghir [56]; while our results affirmed that 5-nt repeats were consistently frequent Fig. 4 Comprison of microsatellite decay between two cotton tetraploids compared to repeats originated in diploids (Garb and Grai). a Relative loss of all SSRs among A T and D T subgenomes of tetraploids (Ghir and Gbar). The black bar within box represens median and dotted horizontal line represent y = 0, while significance was denoted as lowercase letter. b Comparing relationship of microsatellite relative decay and comparative exponential decay between Ghir and Gbar in four sequenced Gossypium genomes. Although the different search parameters generated variations, our results were in accordance to all previous reports. Previously, phylogenetic and molecular studies have suggested that genus Gossypium diverged from close relatives during Miocene about 10-15 MYA [57]. The two progenitor diploids; G. arboreum and G. raimondii, underwent a series of rapid speciation and diversification. While allopolyploids accomplished domestication and diversified distribution in mid Pleistocene about 1-2 MYA [58]. Proportion of conserved microsatellites strongly suggested rapid loss of shorter repeats (especially 2-nt repeats) and relative abundance analysis evoked preservation of higher repeats in four sequenced cotton species. It was noted that 5-nt repeats were the second more abundant after 2-nt repeats in T. cacao. Thus, it could be speculated that through the series of evolutionary landmarks in Gossypium evolution; WGD, allopolypolidization and speciation, 2-nt repeats were drastically lost and 5-nt repeats prevailed until Gossypium early diploids evolved. The 2-nt repeat density was higher in cotton diploids than tetraploids, since the loss of shorter repeats could be a gradual process (Table 1).

Conclusion
This study presented a comprehensive view of motif imperfection pattern in 13 genome assemblies representing five taxonomic families in plant kingdom. It determined the pattern of microsatellite motif imperfection and decay in complex higher plants exhibiting polyploidy and laid foundation for further studies; as plenty of plant genome sequences available to question evolutionary theory about origin of life. The results explicated evolution and stability of motif-interrupted repeats, its relationship with repeat length and its impact on their genomic prevalence in monocot and dicots. Moreover, it exploited clues to evolutionary footprints for predominate abundance of 5-nt repeats, which appeared as a characteristic in genus Gossypium. Likewise, 2-nt repeats experienced accelerated decay in Gossypium tetraploids. When compared with a close relative T. cacao, "AT" repeats started being less abundant in geographic time while Gossypium lineage was undergoing paleopolyploidization processes through the evolution of today's domesticated cotton. Characterization of "AT" motif decay hotspots and the impact of motif decay on modulation of gene regulatory elements is mandatory to fully comprehend the basic underlying mechanisms.