This guide will cover the download, installation, and local use of the Prodigal 2.6.3 (Hyatt et al. 2010) for the prediction of ORFs from prokaryotic sequences.
A Linux based system with Miniconda will be required for this guide and some high computing power is suggested, such a system might exist for your institution or can be purchased through an online cloud computing provide. Follow my guide on cloud based VM setup here: https://scottc-bio.github.io/guides/Virtual-machines-for-bioinformatics.html
A basic understanding of LINUX such as creating and moving directories etc. is assumed for this guide.
Create a conda environment to install.
conda create -n prodigal_env -c conda-forge -c bioconda prodigal
conda activate prodigal_env
If you have sequences beloning to a single genome, such as a genome assembly, then run prodigal as follows:
prodigal -i contigs.fa -a proteins_out.fa -d genes_out.fa -f gff -o genes_out.gff
If you have shotgun metagenomic contigs, then simply add the ‘-p’ argument.
prodigal -i contigs.fa -a proteins_out.fa -d genes_out.fa -f gff -o genes_out.gff -p meta
Both modes will provide the following outputs:
The good thing is that the sequences headers for all outputs will contain the contig name it was identified from which is useful for downstream processes.