Bioinformatics Lab. English の履歴(No.3)


Welcome to Bioinformatics Laboratory, Information Engineering, Niigata University.

Research Topic

It is one of the most important tasks of life science to unveil unknown basic knowledge from a large amount of accumulated information of genomic sequences. With the increasing amount of available sequences, novel tools are needed for comprehensive analyses of species-specific sequence characteristics for a wide range of genomes. An unsupervised neural network algorithm, self-organizing map (SOM), is an effective tool for clustering and visualizing high-dimensional complex data on a single map, and we modified the SOM for the present genome analyses by developing a Batch-Learning SOM (BLSOM).
The alignment-free clustering method, BLSOM, can characterize even one million genomic sequences simultaneously on a single map and thus was most suitable for analyzing vast numbers of sequences obtained by high-throughput DNA sequencing methods currently available. This was impossible with the conventional phylogenetic analyses based on sequence homology searches.

1. Comparative genome analyses for unveiling genome signature.

We found that BLSOM could classify genomic sequence fragments according to species without any information other than oligonucleotide frequencies in a wide range of genomes (Fig. 1). BLSOMs could recognize and visualize, in most sequence fragments, species-specific characteristics of oligonucleotide frequencies, permitting species-specific classification of sequences without any information regarding species. The BLSOM, which can systematically characterize species-specific genome signature of all prokaryotes and eukaryotes analyzable, proves a new powerful bioinformatics strategy to study biodiversity and molecular evolution.

2. A novel bioinformatics strategy for unveiling microbial diversity and protein functions of metagenome sequences

Metagenomics studies of uncultivable microorganisms in environmental and clinical samples should allow extensive surveys of genes useful in medical and industrial applications. Traditional methods of phylogenetic assignment have been based on sequence homology searches and therefore inevitably focused on well-characterized genes, for which orthologous sequences required for constructing a reliable phylogenetic tree are available. However, most of the well-characterized genes are not industrially attractive. The present alignment-free clustering method, BLSOM, is the most suitable method for this purpose. When we consider phylogenetic classification of species-unknown sequences obtained from environmental and clinical samples, BLSOMs have to be constructed in advance with all available sequences from species-known prokaryotes and eukaryotes, as well as from viruses and organelles. Using high-performance supercomputers, sequences were clustered (self-organized) on BLSOM according to phylotypes with high accuracy. By mapping a large number of environmental genomic sequences on this large-scale BLSOM, we could predict phylotypes of these environmental sequences. Because BLSOM does not require orthologous sequence sets, the present alignment-free method could provide a new systematic strategy for revealing microbial diversity and the relative abundance of different phylotype members of uncultured microorganisms including viruses in environmental and clinical samples.

3. Application of BLSOM to medical topics

Influenza A viruses cause a significant threat to public health as highlighted by the recent introduction of the swine-derived H1N1 virus (pandemic H1N1/09) into human populations. By analyzing oligonucleotide and codon frequencies with BLSOM, we can analyze and compare all influenza A virus genome sequences registered in the public DNA Databanks on one plane. Separation according to host animal, subtype and epidemic year could be efficiently visualized (Fig. 2). Notably, H1N1/09 strains have oligonucleotide and codon compositions clearly distinct from those of seasonal human flu strains. This enabled us to make inferences about directional changes of H1N1/09 sequences in the near future and to list codons and oligonucleotides with the potentiality of reduction in H1N1/09 sequences. The strong visualization power of BLSOM also provides surveillance strategies for efficiently detecting potential precursors to pandemic viruses.

Fig1
Fig2

Publication

Please see this page for recent publications.

Member

Associate Professor Takashi Abe
E-mail: takaabe{{at}}ie.niigata-u.ac.jp
Please see this page for detail.