For fungal community experiments, the ITS region is a popular amplicon target for metabarcoding. The PIPITS pipeline is specifically designed to process ITS1 or ITS2 region amplicon data for fungal metabarcoding experiments and is a very simple pipeline to both install and run.
I would very much recommend visitng the PIPITS Github page: https://github.com/hsgweon/pipits And also reading the paper on PIPITS: https://doi.org/10.1111/2041-210X.12399
The GitHub page is extremely helpful and explains each step in detail. It also highlights how a powerful computing system will be required to run PIPITS with at least 16 Gb memory. Here I summarise the steps for installing and running PIPITS on a high performance computing system or a VM.
It is strongly advised that the quality of sequencing results is investigated prior to any amplicon sequence processing. I have a guide on how to use FastQC for assessing the quality of amplicon sequencing results here: https://scottc-bio.github.io/guides/Metabarcoding-quality-control-with-FastQC.html
Log in to your high power computing platform, or set up a cloud VM (https://scottc-bio.github.io/guides/Virtual-machines-for-bioinformatics.html).
Create a PIPITS conda environment.
conda create -n pipits_env --channel bioconda --channel conda-forge python=3.10 pipits hmmer
The environment is now ready
If you are using a cloud computing service like a VM, then you will need to logout of the VM and transfer the raw read sequencing files in fastq.gz format to the VM using the Windows Powershell or MacOS/LINUX Terminal.
First, compress all raw read sequencing files to a tar.gz file. Notice the backslashes used on the windows command line compared to forward slashes on the LINUX command line of a VM.
tar -czvf rawdata.tar.gz -C "C:\path\to\directory\containing\raw\sequencing\files\*fastq.gz"
Transfer the compressed rawdata file to the VM, will have to enter the VM password for the transfer.
scp "C:\path\to\directory\containing\compressed\file\rawdata.tar.gz" root@ipforyourvm:~
Then connect to the VM as before, and make a directory for to perform the analysis in and a rawdata directory below it.
mkdir process
mkdir process/rawdata
Transfer the .tar.gz file to the new directory.
mv rawdata.tar.gz process
Move into the process/ directory and unzip the compressed rawdata file into the rawdata directory.
cd process
tar -xvzf rawdata.tar.gz -C rawdata/
Activate the pipits environment prepared earlier.
conda activate pipits_env
Create a list of read pairs from the paired raw read files.
pispino_createreadpairslist -i rawdata/ -o readpairslist.txt
Prep the sequences for processing.
pispino_seqprep -i rawdata/ -o out_seqprep -l readpairslist.txt
The next step is to extract the ITS regions from the read data, and this is the most computationally intensive step of the PIPITS pipeline. Using multiple CPUs will massively speed up this process. The ‘nohup’ command will also be used to run the process in the background of the VM without reliance on a connection from a local machine, i.e. can logout of the VM and the process will continue to run. For context, 480 raw read files utilising a single CPU took 7 days to run, but with 6 CPUs it takes less than 48 hours.
Extract the ITS regions utilising 6 CPUs and run in the background.
nohup pipits_funits -i out_seqprep/prepped.fasta -o out_funits -x ITS2 -t 6 > pipits_funits.log 2>&1 &
Can check that the process is running by searching for active processes with the name pipits_funits, can also use the top argument and should see vsearch or hmmer functions using the highest CPU allocations.
ps aux | grep pipits_funits
top
When confident that the process is running, can logout of the VM and close the powershell.
Once complete, the final step of the processing pipeline is to process the ITS sequences into the outputs. Again can run this in the background because it will take a few hours.
nohup pipits_process -i out_funits/ITS.fasta -o out_process > pipits_process.log 2>&1 &
When complete can use an additional command included in PIPITS that prepares an OTU table for FUNGuild analysis which will assign functional classifications to OTUs.
pipits_funguild.py -i out_process/otu_table_sintax.txt -o out_process/otu_table_funguild.txt
The PIPITS pipeline for ITS amplicon raw read sequencing files is now complete. The directories and output files of interest are the following: