PEMS4HGT の履歴(No.2)


Bioinformatics Lab. English
About PEMS

Application example of PEMS: Detection of HGT candidates

How to use PEMS for detecting HGT candidates in microbial genomes and predicting their origin by using large-scale BLSOM results.

By applying our software PEMS, you can detect HGT candidates against your private genome sequence.

PEMS requires Microsoft windows 7 or more and Microsoft .NET Framework 4.0 or better runtime environmet.
Please check PEMS user guide for PC performance end environment.

The analysis procedure is as follows.

1.Segmentation of your private genome sequence by a 5-kb window with a 1-kb step.

You can fragment the genome sequence using software such as EMBOSS splitter (
ex.: http://bioinfo.nhri.org.tw/cgi-bin/emboss/splitter) (Fig 1).

Fig1

Figure 1. Input form of EMBOSS splitter

For example, if you want segments by a 5-kb window with a 1-kb step, execute after inputting the following options.

  • 5000 on “Size to split at” in Fig. 1.
  • 4000 on “Overlap between split sequences” in Fig. 1.

After execution, save the output results to a file.
At that time, please set the file extension to “.fa”, “.fna” or “.fas”.

Please refer to the file created by E. coli K-12 strains (accession number: U00096) in the “sampledata” folder.

2. PEMS is executed using the prepared genome sequence seqments data created in FASTA format as input data.

Here, the simple execution method is introduced. Please check PEMS user guide for the detailed execution method.

Fig2

Figure 2. PEMS input screen

  1. Click “Multi Fasta” for input of the prepared genome sequence segment file ( “1 “ in Fig. 2).
  2. PClick Threshold for setting threshold value.
  3. Change threshold value from “40” to “0”.
    This threshold means the percentage against the most abundant taxonomic rank when microbial genomic segments were mapped into taxonomic territories.
  4. Click “start”
    At that time, create a folder to save the output files and specify prefix of the output files.

3.Output files

The two files used mainly are described below.

  1. [PREFIX_Top.txt]:Taxonomic assignment results of the Kingdom/Phylum/Genus in each sequence segment.
  2. [PREFIX_Hist.txt]:The counting result of the number that have been assigned to each category in each Kingdom/Phylum/Genus.
  3. Detailed results for each taxonomic rank are output to "PREFIX_Alphaproteobacteria_All.txt" in the case of Alphaproteobacteria. If you want to improve futher, you can change the assignment criteria using these files.

Please see the README for details of the output file.

Reference

  1. Takashi Abe, Shigehiko Kanaya, Makoto Kinouchi, Yuta Ichiba, Tokio Kozuki and Toshimichi Ikemura. Informatics for unveiling hidden genome signatures. Genome Research, 13, 693-702, 2003.
  2. Takashi Abe, Hideaki Sugawara, Makoto Kinouchi, Shigehiko Kanaya and Toshimichi Ikemura. Novel Phylogenetic Studies of Genomic Sequence Fragments Derived from Uncultured Microbe Mixtures in Environmental and Clinical Samples. DNA Research, 12, 281-290, 2005.
  3. Takashi Abe, Shigehiko Kanaya, Hiroshi Uehara and Toshimichi Ikemura. A novel bioinformatics strategy for function prediction of poorly-characterized protein genes obtained from metagenome analyses. DNA Research, 16, 287-298, 2009.
  4. Hiroshi Uehara, Yuki Iwasaki, Chieko Wada, Kennosuke Wada, Toshimichi Ikemura and Takashi Abe. A novel bioinformatics strategy for searching industrially useful genome resources from metagenomic sequence libraries. Genes & Genetic Systems, 86, 53-66, 2011.
  5. Ryo Nakao, Takashi Abe, Ard M. Nijhof, Seigo Yamamoto, Frans Jongejan, Toshimichi Ikemura, Chihiro Sugimoto. A novel approach, based on BLSOMs (Batch Learning Self-Organizing Maps), to the microbiome analysis of ticks. ISME Journal, 7, 1003-1015, 2013.
     
    Counter: 2454, today: 4, yesterday: 2