Parsing annotation¶
Use MEGAN tools to parse the outputs:
source conf_cluster.conf
Create a link to taxonomy database:
ln -s $DB/all.accession2taxid.sorted $wd/deps/KronaTools-2.7/taxonomy/
Extract taxonomic annotation from the blast output:
ktClassifyBLAST $GENE_PRED/diamond.prodigal.idba.$PRJ_NAME.m8 -o $GENE_PRED/id_gene2tax.$PRJ_NAME.tab
sed -i -e "1d" $GENE_PRED/id_gene2tax.$PRJ_NAME.tab
cut -f1,2 $GENE_PRED/id_gene2tax.$PRJ_NAME.tab > $GENE_PRED/temp.id_gene2tax.$PRJ_NAME.tab
mv $GENE_PRED/temp.id_gene2tax.$PRJ_NAME.tab $GENE_PRED/id_gene2tax.$PRJ_NAME.tab
“Meganize” the blast output:
$HOME/megan/tools/blast2rma --format BlastTab -i $GENE_PRED/diamond.*.$PRJ_NAME.m8 -a2seed $DB/acc2seed-May2015XX.abin -o $GENE_PRED/temp_tax_seed.rma
Extract functional information from the rma file:
$HOME/megan/tools/rma2info -i $GENE_PRED/temp_tax_seed.rma -u false -r2c SEED -n true -v > $GENE_PRED/id_gene2seed.$PRJ_NAME.tab
Merge all the outputs in a single table
Prepare python environment:
module load autoload python/3.6.4
source $wd/env/bin/activate
Create output directory:
mkdir -p $OUT
Move to coverage folder and create variables for next steps:
cd $wd/$PRJ_NAME/5_Coverages/
SAMPLES_COV=
for i in $SAMPLE_LIST ; do
echo $i;
A="$COV/coverage.$i" ;
SAMPLES_COV="$A $SAMPLES_COV";
done
cd $wd
Run the “join.rev.py” script to merge the output:
python $wd/join.rev.py $GENE_PRED/id_gene2tax.$PRJ_NAME.tab $GENE_PRED/id_gene2seed.$PRJ_NAME.tab $wd/unique.seed.subsys.tab $SAMPLES_COV $OUT/join_gene_id_seed.tab