This guide will cover the local usage of the Phobius 1.01 tool (Käll, Krogh, and Sonnhammer 2007) for the prediction of transmembrane protein topology, plus signal peptides.

A Linux based system with Miniconda will be required for this guide and some high computing power is suggested, such a system might exist for your institution or can be purchased through an online cloud computing provide. Follow my guide on cloud based VM setup here: https://scottc-bio.github.io/guides/Virtual-machines-for-bioinformatics.html

A basic understanding of LINUX such as creating and moving directories etc. is assumed for this guide.

1 Downloading the Phobius 1.01 tool

Like many protein prediction tools, Phobius is hidden behind a licensing agreement and needs a manual download from: https://software.sbc.su.se/phobius.html

This cannot be downloaded on the command line and instead needs to be manually downloaded and moved to your computing cluster. File management tools like WinSCP are useful for this because they allow you to drag and drop files between your local machine and the VM or computing cluster you are using.

Otherwise, the Windows powershell can be opened and prior to connecting to the VM or cluster, the file can be transferred.

scp path\to\download\phobius101_linux.tgz user@ipforyourcluster:~

Notice the backslashes used on Windows for file paths. The path to the file will depend on where you saved the download, and the address for your VM or cluster will change. If it is a VM you have setup the address will likely be ‘root@’ followed by the IP of the VM. If you are on a managed cluster it will likely have the format ‘username@’ followed by the IP, where your username will change.

This will move the file to the root directory of your system, which is perfectly reasonable as we will install phobius system wide in a conda evnironment anyway. But if you want the download file to be somewhere tidier, you can always move it by changing the path after the address of your cluster e.g. ‘:/path/to/desired/location’. Or you can move it after connecting.

2 Installing Phobius

Connect to the system.

ssh user@ipforyourcluster

Can now move the file somewhere else if you haven’t done this already.

mv phobius101_linux.tgz path/to/your/desired/location

Now let’s set up a dedicated conda environment.

conda create -n phobius_env python=3.10 -y
conda activate phobius_env

Install the ‘predector’ stub package which gives the tool needed to register the download.

conda install -c predector phobius -y

Register the downloaded file.

phobius-register phobius101_linux.tgz

Now let’s make the phobius function universally available by wrapping. Paste the following code:

echo '#!/bin/bash
perl '"$CONDA_PREFIX"'/share/phobius-1.01-5/phobius.pl "$@"' \
  > $CONDA_PREFIX/bin/phobius

And then give yourself permission.

chmod +x $CONDA_PREFIX/bin/phobius

Now test by bringing up the ‘help’ documentation.

phobius -h

If this brings up the help documentation then Phobius has been installed correctly.

3 Running Phobius

Navigate to wherever your .fasta file of interest is and use the following line to run Phobius.

phobius example.fasta -short > phobius_out.txt

This will produce a simple results.txt file where each line is an input sequence with the following columns:

  • Sequence ID - ID of sequence extracted from fasta headers
  • TM - Number of predicted TM sequences
  • SP - ‘0’ or ‘Y’ indicating detection of a signal peptide.
  • Prediction - The topology of the sequence with numbers indicating amino acid position. i prefix means inside the cytoplasm, o prefix means outside the cytoplasm. So the positions between the end of the i segment and the start of the o segment are the transmembrane residues. In signal peptide mode, n prefix means the positively charged residues of the signal peptide, c prefix represents the cleavage location.

For a longer form output the ‘-short’ argument can be removed.

References

Käll, Lukas, Anders Krogh, and Erik L. L. Sonnhammer. 2007. “Advantages of Combined Transmembrane Topology and Signal Peptide Prediction—the Phobius Web Server.” Nucleic Acids Research 35 (suppl_2): W429–32. doi:10.1093/nar/gkm256.