This guide will cover the download, installation, and local use of the EukRep 0.6.7 (West et al. 2018) for the classification of metagenomic sequences as eukaryotic or prokaryotic.
A Linux based system with Miniconda will be required for this guide and some high computing power is suggested, such a system might exist for your institution or can be purchased through an online cloud computing provide. Follow my guide on cloud based VM setup here: https://scottc-bio.github.io/guides/Virtual-machines-for-bioinformatics.html
A basic understanding of LINUX such as creating and moving directories etc. is assumed for this guide.
Firstly, create a conda environment to perform the install.
conda create -y -n eukrep_env -c bioconda scikit-learn==0.19.2 eukrep
conda activate eukrep_env
Install with pip.
pip install EukRep
EukRep can be run very simply and quickly to isolate the eukaryotic sequences from a fasta file of sequences.
EukRep -i input_contigs.fa -o euk_seqs_out.fa
By adding the ‘–prokarya’ argument, the prokaryotic sequences can be isolated also.
EukRep -i input_contigs.fa -o euk_seqs_out.fa --prokarya prok_seqs_out.fa
The resulting .fa files will be the sequences predicted to belong to each kingdom. An incredibly quick and easy tool.