Home Indel Regions Conserved Blocks Download Help Contact

                 

ONLINE HELP



CHAPTER 1: Getting Started with SeqFIRE

What is SeqFIRE?

SeqFIRE is a user-friendly web application for the identification and extraction of indel and conserved regions from multiple sequence alignments. The output is provided in several different formats useful as input for further analyses, such as phylogenetics. Users do not need to install any prerequisite software in order to use SeqFIRE. It can be accessed online at the URL www.seqfire.org.

The program comprises six tabs as follows:

Home: home page of the program
Indels: input page for the indel module
Conserved blocks: input page for the conserved region module
Download: link for downloading SeqFIREprep, a stand alone version for multiple data analysis
Help: links for Wiki online help, manual in PDF, and some useful FAQs
Contact: credits and e-mails of the developers


[TOP]

Input File

SeqFIRE consists of two different modules: an indel region module and a conserved region module. Both modules require a protein alignment in FASTA format as input. So, if you have unaligned protein sequences, you will have to align them first. On the Help tab, we provide links to some good alignment program, such as Muscle, MAFFT, ProbCons, or Prank. These are all easy to use advance iterative alignment programs that give good quality alignments for even quite divergent sequences. However, outputs from any other alignment programs are fine as long as they are in FASTA format.


[TOP]

The Indel Region Module

Here, I would like to give you an idea of how the SeqFIRE indel region module identifies and extracts indel regions based on default or user defined criteria. These criteria are used to build consensus sequences, which are then used to define indel regions.

Input: Sequences can be input either by copy and paste into the input box or by uploading a file using the “Choose File” button. If you want to run SeqFIRE with our example, click the “Load example alignment” button, and the example protein alignment will appear in the input box.

Parameters: SeqFIRE adjustable parameters to identify indel regions. The parameters are as follows.

Amino acid conservation threshold: The percentage sequence similarity required for an alignment position to be included in the consensus. Default is 75% similarity.

Amino acid substitute group: Amino acid substitution matrices based on protein evolution. SeqFIRE provides six alternative matrices: PAM60, PAM250, BLOSUM40, BLOSUM62, BLOSUM80, and NONE. Default is NONE, in which case all amino acid differences are weighted equally.

Inter-indel space: The minimum number of consensus sites required between two indels, in order for them to be treated as separate indels. Default is three sites.

Detect partial sequence: Allows the program to search for indels in the N- and C-terminal ends of an alignment, in case some sequences are incomplete. Default is “yes” (allows).

Twilight treatment: For use with highly divergent sequences. This sets use the similarity score cut off at 30%. The twilight zone concept is based on the observation that proteins with similarity as low as 30% can still have the same structure. Default is “no”

Once all parameters are set, click the FIRE button to begin analysis.

Note: if no parameters are selected, the program will still execute but using default parameters.


[TOP]

Output from the Indel Region Module

This module provides several different outputs. These are an indel list, annotated alignment, and indel matrix in NEXUS format. These can be viewed by scrolling down the page or by using the links at the top section of the output page. You can download any or all of these results by right-clicking at the links and saving the linked file.

Moreover, the result page also shows the original protein alignment with the indel and conservation profiles in JalView.


[TOP]

The Conserved Block Module

The SeqFIRE conserved block module identifies and extracts consrved regions from a multiple protein sequence alignment. This can be done at various levels of stringency, depending on the parameters chosen. At low stringency, the program extracts conserved sequence blocks that can be used for phylogenetic analysis of high divergent sequence or for deep taxon phylogeny. At higher stringency, the program can be used to identify highly or universally conserved motifs ("signature sequences"). These can be useful diagnostics or PCR primer design.

The conserved region module is accessed by clicking the “Conserved regions” link or the “Conserved blocks” tab at the top of the SeqFIRE homepage. Similar to the indel region module, files in FASTA format can be uploaded either by direct copy/paste or the “Choose File” button. There are five adjustable parameters for determination of the conserved regions. The first two parameters are identical to the indel region module parameters.

Amino acid conservation threshold: Similarity in percent use as a conserved position cutoff to calculate the consensus profile. Default is 75% similarity.

Amino acid substitute group: Amino acid substitution matrices used to calculate the conservation profile. The program provide 6 alternatives for user: PAM60, PAM250, BLOSUM40, BLOSUM62, BLOSUM80, and NONE. Default is NONE (no matrix use).

Minimum size of conserved block: Minimum adjacent alignment positions for a conserved block. Default is three sites.

Maximum size of non-conserved block: Maximum number of non-conserved (below the similarity threshold) alignment positions allowed in a conserved block. Default is 15 sites.

Maximum percentage of gaps allowed in a conserved column: some gapped positions in alignment might be informative sites for phylogenetic reconstruction. This criterion is a cut off to tell the program keep or discard this gapped positions from the conserved block. Default is 40%.


[TOP]

Output from the Conserved Block Module

The general format of the conserved block output page is similar to the indel region output page. Results can be viewed by scrolling down the page or using the links at the top of the page. In addition, the alignment is shown with the conserved blocks indicated in JalView.


[TOP]


CHAPTER 2: Working with Multiple Dataset

This chapter is for users who want to run SeqFIRE with large amounts of data (several sequence alignments). We provide a batch mode analysis for both the indel region and conserved block modules. In order to use the batch mode, you have to prepare the input files in the format that SeqFIRE can read. To do this, we provide a small Python program called SeqFIREprep, which you can download from SeqFIRE web. SeqFIREprep can process both input and output from SeqFIRE: it merges multiple input alignment files into a single large input file, and can also be used to separated the results into several output files afterward the analysis.

SeqFIREprep Installation

Users need Python interpreter in order to run SeqFIREprep. You can download Python interpreter from the official Python website. SeqFIREprep works well on Python interpreter from version 2.6 to 3.2.

Installation of SeqFIREprep is very easy. After you installed Python interpreter, you just copy SeqFIREprep into a folder that you can access. Then, SeqFIREprep is ready to use via a command line.


[TOP]

Preparation of the Input Data

Once the terminal or command prompt is launched, just move to the directory where SeqFIREprep was installed. Then type the command

>>> python seqfireprep.py

You will see the menu as shown below.

SeqFIREprep version 1.0 beta
      ----------------------------

      1 making a batch file (merge multiple input files)
      2 splitting an output

      Please enter your choice (1 or 2):

Then, type 1 and hit return. The new menu will appear as in the following.

Making a batch file
      -------------------
      +---------+
      | EXAMPLE |
      +---------+

      Mac OS X: /Users/sam/Documents/target/
      DOS : C:\Documents\target\

      Please ENTER the path of the target folder (see example):

Once you type the destination folder, SeqFIREprep will read all files in that folder, and combine those files into a single input file, called “batch.fa”. The format of SeqFIREprep input file begins with the individual alignment filename in the first line, flanked with ‘==seq==’ and ‘==fire==’ This is followed beginning on the next line by the alignment. .


[TOP]

Using the Batch Mode

In order to use the batch mode for the indel region and conserved block modules, the user has to click on the batch button that appears above the input box. This will turn the batch mode on.

Users can upload the batch file or copy and paste into the input box. Then, choose the parameters, like a normal analysis. Those parameters will apply for all multiple alignment data.


[TOP]

Separation of the Output

Once the terminal or command prompt is launched, just move to the directory that SeqFIREprep is installed. Then, type the command as follows.

>>> python seqfireprep.py

You will see the menu of SeqFIREprep (same as in the previous section).

SeqFIREprep version 1.0 beta
      ----------------------------

      1 making a batch file (merge multiple input files)
      2 splitting an output

      Please enter your choice (1 or 2):

Then, type 2 and hit return. The new menu will appear as in the following.

splitting an output
      -------------------
      +---------+
      | EXAMPLE |
      +---------+

      Mac OS X: /Users/sam/Documents/target/outfile.txt
      DOS : C:\Documents\target\outfile.txt

      Please ENTER the batch file with path (see example):

After you enter the result file (you get it from SeqFIRE web), SeqFIREprep will separate the result into several files and name the file as same as the input file.

HOME | TOP


© Copyright 2011 by SeqFIRE Development Team.