We are going to run a pre-made pipeline, called MIntO, to go from trimmed sequencing reads to fully annotated genomes.
Note: If you lose connection to the server and re-login, these following steps need to be repeated.
Activate the pre-installed conda virtual environment and define some shell environment variables:
source activate MIntO
export MINTO_DIR="$HOME/MIntO"
export PROJ_DIR="$HOME/tutorial"
Go to the folder where we already run some preparatory parts of MIntO:
cd $PROJ_DIR
cd metaG
Look around in the folder! Note the structure of the directories.
Next, define some Snakemake parameters, to use conda environments and limit resources:
SNAKE_PARAMS="--use-conda --restart-times 1 --keep-going --latency-wait 60 --jobs 10 --cores 16 --resources mem=58 --conda-prefix $MINTO_DIR/conda_env --shadow-prefix $PWD/.snakemake"
Copy metadata file from the MIntO repository’s tutorial folder:
cp $MINTO_DIR/tutorial/metadata/tutorial_metadata.txt .
snakemake $SNAKE_PARAMS --snakefile $MINTO_DIR/smk/assembly.smk --configfile assembly.yaml
snakemake $SNAKE_PARAMS --snakefile $MINTO_DIR/smk/binning_preparation.smk --configfile assembly.yaml
First, modify the configuration file, to switch off taxonomical annotation by the pipeline.
sed "s/RUN_TAXONOMY: yes/RUN_TAXONOMY: no/" mags_generation.yaml | grep -v "aae" > mags_generation.yaml.fixed
Now, we can run the binning step:
snakemake $SNAKE_PARAMS --snakefile $MINTO_DIR/smk/mags_generation.smk --configfile mags_generation.yaml.fixed
Like in the last step, we need to modify the standard configuration file for this step, mapping.yaml, to only do the annotation with KEGG:
sed -e "s|/mypath|$HOME|; s|/IBD_tutorial|/tutorial|; s|dbCAN|KEGG|; s|BWA_threads: 10|BWA_threads: 8|; s|BWA_memory: 45|BWA_memory: 10|" $MINTO_DIR/configuration/mapping.yaml | grep -v eggNOG > mapping.yaml
Thereafter, we can run gene_annotation.smk:
snakemake $SNAKE_PARAMS --snakefile $MINTO_DIR/smk/gene_annotation.smk --configfile mapping.yaml
Outputs will be created in $PROJ_DIR/DB
folder
Run gene abundance estimation, using the MAGs:
snakemake $SNAKE_PARAMS --snakefile $MINTO_DIR/smk/gene_abundance.smk --configfile mapping.yaml
Collect outputs from the metagenomics data:
cd $PROJ_DIR
egrep -v " (eggNOG|dbCAN|eCAMI|KEGG|PFAMs)" data_integration.yaml | sed "s/metaG_metaT/metaG/" > data_integration.yaml.fixed
snakemake $SNAKE_PARAMS --snakefile $MINTO_DIR/smk/data_integration.smk --configfile data_integration.yaml.fixed