This guide will cover the download, installation, and local use of the SignalP 6.0 tool (Teufel et al. 2022) for the prediction of signal peptides on amino acid sequences.

A Linux based system with Miniconda will be required for this guide and some high computing power is suggested, such a system might exist for your institution or can be purchased through an online cloud computing provide. Follow my guide on cloud based VM setup here: https://scottc-bio.github.io/guides/Virtual-machines-for-bioinformatics.html

A basic understanding of LINUX such as creating and moving directories etc. is assumed for this guide.

1 Downloading SignalP 6.0

You will need the compressed ‘.tar.gz’ file for the fast and slow modes of SignalP 6.0 which need to be obtained by DTU by filling out the forms available through the Downloads section: https://services.healthtech.dtu.dk/services/SignalP-6.0/

DTU will then email a link to the download for each tool. The easiest way to download this is to move to the directory on your LINUX system where you wish to perform the signalp prediction. E.g. where your amino acid sequence fasta file is.

Then use ‘wget’ to initiate the downloads

wget url/of/download/page/sent/to/you/signalp-6.0i.fast.tar.gz
wget url/of/download/page/sent/to/you/signalp-6.0i.slow_sequential.tar.gz

Each download took around 30 minutes each for me.

Unzip both.

tar -xzvf signalp-6.0i.fast.tar.gz
tar -xzvf signalp-6.0i.slow_sequential.tar.gz

This will product a new sub-directory structure for each mode e.g. ‘signalp6_fast/signalp-6-package/’ and ‘signalp6_slow_sequential/signalp-6-package/’.

These each contain the files needed to install signalp and then each contains the respective models for running signalp. The ‘signalp6_fast/’ directory contains the smaller model for running in ‘fast’ mode. This is the most appropriate if you need to screen many sequences at once. The ‘signalp6_slow_sequential/’ directory contains the model for running in ‘slow’ mode, which uses the full model and requires 14 GB RAM, and also ‘slow-sequential’ mode, which runs the full model but sequentially to reduce RAM usage but taking 6 times longer. The default is ‘fast’ if not specified and is generally fine for multiple sequences, if you really want to probe a sequence further then you can use the slower modes.

Anyway, to install the package we only need to install once from one of the directories.

2 Installing

Firstly, create a conda environment to perform the install.

conda create -n signalp6 python=3.10 -y
conda activate signalp6

Move into the ‘fast’ directory where the install files are.

cd signalp6_fast/signalp-6-package/

Use pip to install.

pip install .

Update the numpy version, suggested by the GitHub page for the install (https://github.com/fteufel/signalp-6.0/blob/main/installation_instructions.md)

pip install "numpy<2"

Then return to the parent directory.

cd ../../

3 Running SignalP 6.0

An amino acid sequence fasta file can now be used as the input for signalP, for example in ‘fast’ mode:

signalp6 --fastafile example.fasta --output_dir signalp_out --organism euk --format txt --mode fast --model_dir signalp6_fast/signalp-6-package/models/

For ‘slow’ or ‘slow-sequential’ mode, simply change the –mode argument and redirect the –model_dir argument to the ‘signalp6_slow_sequential/signalp-6-package/models/’ directory.

For non-eukaryotic sequences, change the –organism argument to ‘other’. This includes Gram-negative and -positive bacteria and archaea.

There are some other additional arguments that can be used which are explained on the GitHub page: https://github.com/fteufel/signalp-6.0/blob/main/installation_instructions.md

4 The output

The output directory will contain the following files once complete:

  • prediction_results.txt - this is the main results file, for each sequence it has the determined signal, then a column for each type of signal with a likelihood score, and then if a signal peptide is predicted there is a cleavage site position score
  • processed_entries.fasta - This is the fasta of file of sequences with predicted signal peptides, with these removed
  • output.json - A machine readable version of all results
  • output.gff3 - Annotations in GFF3 format for predicted signal peptides and cleavage sites.
  • _plot.txt - For every sequence there is a .txt file with data on how to make plots.

All the individual _plot.txt files for each sequence clutter the directory, so I like store by compressing to a .tar.gz file, and then removing the originals.

tar -czf signalp_out/signalp_plots.tar.gz signalp_out/*_plot.txt
rm signalp_out/*_plot.txt

SignalP is now complete.

References

Teufel, Felix, José Juan Almagro Armenteros, Alexander Rosenberg Johansen, Magnús Halldór Gíslason, Silas Irby Pihl, Konstantinos D. Tsirigos, Ole Winther, Søren Brunak, Gunnar von Heijne, and Henrik Nielsen. 2022. SignalP 6.0 Predicts All Five Types of Signal Peptides Using Protein Language Models.” Nature Biotechnology 40 (7). Nature Publishing Group: 1023–25. doi:10.1038/s41587-021-01156-3.