This guide will cover the download, installation, and local use of the SignalP 6.0 tool (Teufel et al. 2022) for the prediction of signal peptides on amino acid sequences.
A Linux based system with Miniconda will be required for this guide and some high computing power is suggested, such a system might exist for your institution or can be purchased through an online cloud computing provide. Follow my guide on cloud based VM setup here: https://scottc-bio.github.io/guides/Virtual-machines-for-bioinformatics.html
A basic understanding of LINUX such as creating and moving directories etc. is assumed for this guide.
You will need the compressed ‘.tar.gz’ file for the fast and slow modes of SignalP 6.0 which need to be obtained by DTU by filling out the forms available through the Downloads section: https://services.healthtech.dtu.dk/services/SignalP-6.0/
DTU will then email a link to the download for each tool. The easiest way to download this is to move to the directory on your LINUX system where you wish to perform the signalp prediction. E.g. where your amino acid sequence fasta file is.
Then use ‘wget’ to initiate the downloads
wget url/of/download/page/sent/to/you/signalp-6.0i.fast.tar.gz
wget url/of/download/page/sent/to/you/signalp-6.0i.slow_sequential.tar.gz
Each download took around 30 minutes each for me.
Unzip both.
tar -xzvf signalp-6.0i.fast.tar.gz
tar -xzvf signalp-6.0i.slow_sequential.tar.gz
This will product a new sub-directory structure for each mode e.g. ‘signalp6_fast/signalp-6-package/’ and ‘signalp6_slow_sequential/signalp-6-package/’.
These each contain the files needed to install signalp and then each contains the respective models for running signalp. The ‘signalp6_fast/’ directory contains the smaller model for running in ‘fast’ mode. This is the most appropriate if you need to screen many sequences at once. The ‘signalp6_slow_sequential/’ directory contains the model for running in ‘slow’ mode, which uses the full model and requires 14 GB RAM, and also ‘slow-sequential’ mode, which runs the full model but sequentially to reduce RAM usage but taking 6 times longer. The default is ‘fast’ if not specified and is generally fine for multiple sequences, if you really want to probe a sequence further then you can use the slower modes.
Anyway, to install the package we only need to install once from one of the directories.
Firstly, create a conda environment to perform the install.
conda create -n signalp6 python=3.10 -y
conda activate signalp6
Move into the ‘fast’ directory where the install files are.
cd signalp6_fast/signalp-6-package/
Use pip to install.
pip install .
Update the numpy version, suggested by the GitHub page for the install (https://github.com/fteufel/signalp-6.0/blob/main/installation_instructions.md)
pip install "numpy<2"
Then return to the parent directory.
cd ../../
An amino acid sequence fasta file can now be used as the input for signalP, for example in ‘fast’ mode:
signalp6 --fastafile example.fasta --output_dir signalp_out --organism euk --format txt --mode fast --model_dir signalp6_fast/signalp-6-package/models/
For ‘slow’ or ‘slow-sequential’ mode, simply change the –mode argument and redirect the –model_dir argument to the ‘signalp6_slow_sequential/signalp-6-package/models/’ directory.
For non-eukaryotic sequences, change the –organism argument to ‘other’. This includes Gram-negative and -positive bacteria and archaea.
There are some other additional arguments that can be used which are explained on the GitHub page: https://github.com/fteufel/signalp-6.0/blob/main/installation_instructions.md
The output directory will contain the following files once complete:
All the individual _plot.txt files for each sequence clutter the directory, so I like store by compressing to a .tar.gz file, and then removing the originals.
tar -czf signalp_out/signalp_plots.tar.gz signalp_out/*_plot.txt
rm signalp_out/*_plot.txt
SignalP is now complete.