BLSOM

Bioinformatics Lab. English

Batch-Learning Self-Organizing Map (BLSOM) program for multithreaded environment on Linux

An unsupervised neural network algorithm, Kohonen's Self-organizing Map (SOM), is a powerful tool for clustering and visualizing high-dimensional complex data on a two-dimensional map (Kohonen, 1982 and 1990; Kohonen et al., 1996). On the basis of Batch-Learning SOM (BLSOM), we have developed a modification of the conventional SOM for genome sequence analyses, which makes the learning process and resulting map independent of the order of data input (Kanaya et al, 2001; Abe et al, 2003). We used the BLSOM for phylogenetic classification of metagenomic sequences obtained from mixed genomes of environmental microorganisms by analyzing tetranucleotide frequencies (Abe et al, 2005 and 2006) and protein function prediction of metagenomic sequences by analyzing oligopeptide frequencies (Abe et al, 2009).

Please download an executable program on Linux from the followings.

executable program

Please check the README file for detailed execution, after extracting the file.

Install BLSOM on Linux

  1. Download the BLSOM.tar.gz
  2. Unzip file.
cd <TheDirectoryYouPutTheTarball>
tar zxvf BLSOM.tar.gz
cd BLSOM

In BLSOM directory, there are 5 executable files, run.csh, PCA, makeweight, BLSOM and PLOT. The param.dat is parameter file for performing BLSOM.

Quick Start

The commands required for simple execution are as follows.

./run.csh  <Your input file>  <Number of threads>

Sample command:

cd BLSOM
./run.csh  input.dat  8

All executable files and param.dat put under same directory.

Reference

  1. Kanaya S, Kinouchi M, Abe T, Kudo Y, Yamada Y, Nishi T, Mori H, Ikemura T (2001) Analysis of codon usage diversity of bacterial genes with a self-organizing map: characterization of horizontally transferred genes with emphasis on E. coli O157 genome. Gene, 276:89-99.
  2. Abe T, Kanaya S, Kinouchi M, Ichiba Y, Kozuki T, Ikemura T (2003) Informatics for unveiling hidden genome signatures. Genome Research, 13:693-702.
  3. Abe T, Sugawara H, Kinouchi M, Kanaya S, Ikemura T (2005) Novel Phylogenetic Studies of Genomic Sequence Fragments Derived from Uncultured Microbe Mixtures in Environmental and Clinical Samples. DNA research, 12:281-290.
  4. Abe T, Sugawara H, Kanaya S, Ikemura T (2006) Sequences from almost all prokaryotic, eukaryotic, and viral genomes available could be classified according to genomes on a large-scale Self-Organizing Map constructed with the Earth Simulator. Journal of the earth simulator, 6:17-23.
  5. Abe T, Kanaya S, Uehara H, Ikemura T (2009) A novel bioinformatics strategy for function prediction of poorly-characterized protein genes obtained from metagenome analyses. DNA Research, 16:287-298.
  6. (Review) Iwasaki Y, Abe K, Wada K, Wada Y, and Ikemura T. (2013) A Novel Bioinformatics Strategy to Analyze Microbial Big Sequence Data for Efficient Knowledge Discovery: Batch-Learning Self-Organizing Map (BLSOM). Microorganisms, 1:137-157.
Counter: 7042, today: 1, yesterday: 1