Parsing annotation¶

Use MEGAN tools to parse the outputs:

source conf_cluster.conf

Create a link to taxonomy database:

ln -s $DB/all.accession2taxid.sorted $wd/deps/KronaTools-2.7/taxonomy/

Extract taxonomic annotation from the blast output:

ktClassifyBLAST $GENE_PRED/diamond.prodigal.idba.$PRJ_NAME.m8 -o $GENE_PRED/id_gene2tax.$PRJ_NAME.tab

sed -i -e "1d" $GENE_PRED/id_gene2tax.$PRJ_NAME.tab
cut -f1,2 $GENE_PRED/id_gene2tax.$PRJ_NAME.tab > $GENE_PRED/temp.id_gene2tax.$PRJ_NAME.tab
mv $GENE_PRED/temp.id_gene2tax.$PRJ_NAME.tab $GENE_PRED/id_gene2tax.$PRJ_NAME.tab

“Meganize” the blast output:

$HOME/megan/tools/blast2rma --format BlastTab -i $GENE_PRED/diamond.*.$PRJ_NAME.m8 -a2seed $DB/acc2seed-May2015XX.abin -o $GENE_PRED/temp_tax_seed.rma

Extract functional information from the rma file:

$HOME/megan/tools/rma2info -i $GENE_PRED/temp_tax_seed.rma -u false -r2c SEED -n true -v > $GENE_PRED/id_gene2seed.$PRJ_NAME.tab

Merge all the outputs in a single table

Prepare python environment:

module load autoload python/3.6.4
source $wd/env/bin/activate

Create output directory:

mkdir -p $OUT

Move to coverage folder and create variables for next steps:

cd $wd/$PRJ_NAME/5_Coverages/
SAMPLES_COV=
for i in $SAMPLE_LIST ; do
        echo $i;
        A="$COV/coverage.$i" ;
        SAMPLES_COV="$A $SAMPLES_COV";
done
cd $wd

Run the “join.rev.py” script to merge the output:

python $wd/join.rev.py $GENE_PRED/id_gene2tax.$PRJ_NAME.tab $GENE_PRED/id_gene2seed.$PRJ_NAME.tab $wd/unique.seed.subsys.tab  $SAMPLES_COV $OUT/join_gene_id_seed.tab