POSTER NO: 51 Statistics of trinucleotides in coding sequences and evolution
1Fumihiko Takeuchi, 1Yasuhiro Futamura, 2Hiroshi Yoshikura, 1Kenji Yamamoto Our aim is to give measurements indicative of evolutional stages of the species. Two types of statistics of trinucleotides in coding regions are analyzed for 27 species. The first one is the codon space, the nucleotide ratio for each of the three codon positions. We apply principal component analysis on this space and extract two principal components faithfully describing the original distribution of the codon space. The first principal component corresponds to the GC content. The second principal component classifies the species into three evolutional groups, Archaea, Bacteria and Eukaryota. The second statistics is the real and theoretical frequency of amino acids. The real frequency of an amino acid in a coding sequence is its frequency in the translated protein. The theoretical frequency is the expected frequency calculated from the ratio of nucleotides. We introduce the discrepancy between these two frequencies as an index of nonrandomness of nucleotides in the sequence. This index of nonrandomness divides the species into two groups: eukaryotes having smaller nonrandomness (i.e., being more random) and prokaryotes having higher nonrandomness. |