This guide will cover the download, installation, and local use of the EukRep 0.6.7 (West et al. 2018) for the classification of metagenomic sequences as eukaryotic or prokaryotic.

A Linux based system with Miniconda will be required for this guide and some high computing power is suggested, such a system might exist for your institution or can be purchased through an online cloud computing provide. Follow my guide on cloud based VM setup here: https://scottc-bio.github.io/guides/Virtual-machines-for-bioinformatics.html

A basic understanding of LINUX such as creating and moving directories etc. is assumed for this guide.

1 Installing EukRep

Firstly, create a conda environment to perform the install.

conda create -y -n eukrep_env -c bioconda scikit-learn==0.19.2 eukrep
conda activate eukrep_env

Install with pip.

pip install EukRep

2 Running EukRep

EukRep can be run very simply and quickly to isolate the eukaryotic sequences from a fasta file of sequences.

EukRep -i input_contigs.fa -o euk_seqs_out.fa

By adding the ‘–prokarya’ argument, the prokaryotic sequences can be isolated also.

EukRep -i input_contigs.fa -o euk_seqs_out.fa --prokarya prok_seqs_out.fa

The resulting .fa files will be the sequences predicted to belong to each kingdom. An incredibly quick and easy tool.

References

West, Patrick T., Alexander J. Probst, Igor V. Grigoriev, Brian C. Thomas, and Jillian F. Banfield. 2018. “Genome-Reconstruction for Eukaryotes from Complex Natural Microbial Communities.” Genome Research 28 (4): 569–80. doi:10.1101/gr.228429.117.