2.4 Codes

PlanT2T is an open-source platform. The source code is available on GitHub. You can access, download, and modify the code freely. If you have any questions or suggestions, please feel free to Contact us or submit an issue on GitHub.

Help you, help us

PlanT2T is a community-driven project. We will show you how to analyze the genome data step by step. You can also contribute to the project by submitting your code to GitHub. We will review and merge your code into the project. Let’s work together to make PlanT2T better!

2.4.1 Pipeline

The runAllPipeline script is used to run the entire pipeline.

bash 0.runAllPipeline.sh /You/Genome/Pathway/

0.runAllPipeline.sh

#!/bin/bash
ulimit -s 10240000

GenomePathway=$(realpath "$1")

function CheckSoftware() {
    if command -v "$1" >/dev/null 2>&1; then
        sleep 0.05
    else
        echo -e "Error: ${2} can not be found in the PATH environment variable."
        exit 1
    fi
}

CheckSoftware "faSize" "faSize"
CheckSoftware "gffread" "gffread"
CheckSoftware "STAR" "STAR"
CheckSoftware "rsem-prepare-reference" "rsem"
CheckSoftware "faToTwoBit" "faToTwoBit"
CheckSoftware "assembly-stats" "assembly-stats"
CheckSoftware "exec_annotation" "kofamscan"
CheckSoftware "busco" "busco"
CheckSoftware "Rscript" "Rscript"
CheckSoftware "R" "R"
CheckSoftware "python" "python"
CheckSoftware "perl" "perl"
CheckSoftware "quartet.py" "quarTeT"
CheckSoftware "taxonkit" "taxonkit"
CheckSoftware "makeblastdb" "makeblastdb"
CheckSoftware "samtools" "samtools"
CheckSoftware "mysql" "mysql"

cd "$GenomePathway"

# Show metadata.txt
echo "Your form"
echo "----------------------------------"
cat metadata.txt | sed 's/\/public\/workspace\/biobigdata\/project\/Plant2t\/UserUpload//g'

awk -F': ' '
BEGIN { OFS="\t" }
$1 ~ /^ID/ { id=$2 }
$1 ~ /^Formatted Name/ { formatted_name=$2 }
$1 ~ /^Formatted Species ID/ { FormattedSpeciesID=$2 }
END {
    print id, formatted_name, FormattedSpeciesID
}
' metadata.txt > namelist.txt 

namaList=namelist.txt

if [ ! -f "${namaList}" ] || [ ! -s "${namaList}" ]; then
    echo "File not found or empty: ${namaList}"
    exit 1
fi

echo "------------------------------------------"
echo "ID    GeneNamePrefix FormattedSpeciesID" && cat namelist.txt

# Auto increment step number
step=0
function StepCounter() {
    echo -e "------------------------------------------"
    echo -e "Step${step}: $1"
    echo -e "------------------------------------------"
    ((step++))
}

# Wrap each step in a function call
function RunStep() {
    StepCounter "$1"
    bash "$2" "$namaList"
    if [ $? -ne 0 ]; then
        echo -e "Error: $1 failed."
        # rm -rf $GenomePathway
        exit 1
    fi
}

RunStep "00.genomeCheck" "/PlanT2T/00.genomeCheck.sh"
RunStep "01.makePEPCDS" "/PlanT2T/01.makePEPCDS.sh"
RunStep "02.renameGff" "/PlanT2T/02.renameGff.sh"
RunStep "03.teloExplorer" "/PlanT2T/03.teloExplorer.sh"
RunStep "04.rsemIndex" "/PlanT2T/04.rsemIndex.sh"
RunStep "05.genome2bit" "/PlanT2T/05.genome2bit.sh"
RunStep "06.assemblyStats" "/PlanT2T/06.assemblyStats.sh"
RunStep "07.tfIdent" "/PlanT2T/07.tfIdent.sh"
RunStep "08.runInterProScan" "/PlanT2T/08.runInterProScan.sh"
RunStep "09.runKoFamScan" "/PlanT2T/09.runKoFamScan.sh"
RunStep "10.runBUSCO" "/PlanT2T/10.runBUSCO.sh"
RunStep "11.orgDBmaker" "/PlanT2T/11.orgDBmaker.sh"
RunStep "12.centroMiner" "/PlanT2T/12.centroMiner.sh"
RunStep "13.ideogram" "/PlanT2T/13.ideogram.sh"
RunStep "14.getKEGG" "/PlanT2T/14.getKEGG.sh"
RunStep "15.cleanPEP" "/PlanT2T/15.cleanPEP.sh"
RunStep "16.pepStatic" "/PlanT2T/16.pepStatic.sh"
RunStep "17.taxonkitFinder" "/PlanT2T/17.taxonkitFinder.sh"
RunStep "18.genomeStats" "/PlanT2T/18.genomeStats.sh"
RunStep "19.protparamAnalysis" "/PlanT2T/19.protparamAnalysis.sh"
RunStep "20.SaveResultToMySQL" "/PlanT2T/20.SaveResultToMySQL.sh"
RunStep "21.makeBlastDB" "/PlanT2T/21.makeBlastDB.sh"
RunStep "22.JBrowse2" "/PlanT2T/22.JBrowse2.sh"
RunStep "23.downloadFile" "/PlanT2T/23.downloadFile.sh"

echo -e "All done!"

metadata.txt (example)

According to the metadata.txt, the pipeline will automatically generate the namelist.txt file.

Latin Name: Oryza sativa
Common Name: Rice
Cultivar Name: NIP
Haploid Type: Haploid1
Formatted Name: Osa_NIP_H1
Formatted Species ID: Oryza sativa NIP Hap1
Ploidy: 2x
Chromosome Number: 1
BUSCO: 98.1
QV: 58.3
LAI: 25.2
Email: example@gmail.com
Author: Tom
Unit: PlanT2T
DOI: https://doi.org/
Message: This is a test genome and has no biological significance.
Genome Sequencing: Illumina + HiFi + ONT + Hi-C
Genome Survey: Jellyfish v2.3.1 + GenomeScope v2.0
Genome Assembly by HiFi: Hifiasm-0.24.0-r702
Genome Assembly by ONT: NextDenovo v2.5.0 + Canu v1.5
Organelle Genomes Assembly: Getorganelle v1.7.6
Genome Polish: Pilon v1.24 + Nextpolish v1.2.4
Hi-C Scaffolding: Juicer v1.5 + 3D-DNA + Juicebox v1.11.08
Gap Filling: TGS-GapCloser + LR_Gapcloser
Telomere Identification: Screening CCCTAAA by TRF v4.09
Centromere Identification: Screening CENH3 repeat
Polyploid Subgenome Phasing: SubPhaser v1.2
Repeat Annotation: LTR_FINDER v1.1 + RepeatModeler v2.0.1 + EDTA v1.9.3 + RepeatMasker v4.0.9
Gene Model Prediction:  AUGUSTUS v3.2.3 + MAKER v2.31.9 + Trinity v2.13.2 + CD-HIT v4.6
Genome File: /public/workspace/biobigdata/project/Plant2t/UserUpload/73387/example.fa.gz
GFF3 File: /public/workspace/biobigdata/project/Plant2t/UserUpload/73387/example.gff3.gz
PNG File: /public/workspace/biobigdata/project/Plant2t/UserUpload/73387/example.png
ID: 73387

Each step in the pipeline is a separate script. You can find the scripts in the PlanT2T GitHub repository