This guide will cover the download, installation, and local use of the Prodigal 2.6.3 (Hyatt et al. 2010) for the prediction of ORFs from prokaryotic sequences.

A Linux based system with Miniconda will be required for this guide and some high computing power is suggested, such a system might exist for your institution or can be purchased through an online cloud computing provide. Follow my guide on cloud based VM setup here: https://scottc-bio.github.io/guides/Virtual-machines-for-bioinformatics.html

A basic understanding of LINUX such as creating and moving directories etc. is assumed for this guide.

1 Installing Prodigal

Create a conda environment to install.

conda create -n prodigal_env -c conda-forge -c bioconda prodigal
conda activate prodigal_env

2 Running Prodigal

If you have sequences beloning to a single genome, such as a genome assembly, then run prodigal as follows:

prodigal -i contigs.fa -a proteins_out.fa -d genes_out.fa -f gff -o genes_out.gff

If you have shotgun metagenomic contigs, then simply add the ‘-p’ argument.

prodigal -i contigs.fa -a proteins_out.fa -d genes_out.fa -f gff -o genes_out.gff -p meta

Both modes will provide the following outputs:

  • proteins_out.fa - The amino acid sequences of predicted ORFs
  • genes_out.fa - The nucleotide seqeunces of predicted ORFs
  • genes_out.gff - Gene coordinates in gff3 format for predicted ORFs

The good thing is that the sequences headers for all outputs will contain the contig name it was identified from which is useful for downstream processes.

References

Hyatt, Doug, Gwo-Liang Chen, Philip F. LoCascio, Miriam L. Land, Frank W. Larimer, and Loren J. Hauser. 2010. “Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification.” BMC Bioinformatics 11 (1): 119. doi:10.1186/1471-2105-11-119.