ChIP-Hub is an integrative web-based/Shiny application for exploring plant regulome. It is a valuable resource for experimental biologists from various fields to comprehensively use all available epigenomic information to get novel insights into their specific questions.
Low enrichment Low read depth No publication Single replicate Missing control Under processing
FRIP SPOT REP QT NSC RSC Selected Filtered
a=$(bedtools intersect -a $ds1 -b $ds2 -u | wc -l ) b=$(cat $ds1 | wc -l) rate=$(bc -l <<< "$a/$b") out=$(echo $bed | sed -e s/.bed$//g) cut -f11 $bed | awk '($1!=".")' | sed -e s/,/\n/g | sort | uniq > $out.gene
This code may take a long time, you can download code and run by yourself.
computeMatrix scale-regions -p 2 --afterRegionStartLength %s --beforeRegionStartLength %s -S %s -R *.bed --skipZeros -o plot.mat.gz
plotHeatmap -m plot.mat.gz --heatmapHeight 11 --heatmapWidth %s --colorMap %s --samplesLabel %s --whatToShow 'heatmap and colorbar' --startLabel start --endLabel end --refPointLabel center -out heatmap.pdf
This code may take a long time, you can download code and run by yourself.
Our data has three levels structure for you to download.
First is folders of all 43 species, each folder contains full information on the corresponding species:
ChIPHub_download/
├── aegilops_tauschii
├── arabidopsis_lyrata
├── arabidopsis_thaliana
…
├── triticum_urartu
├── vitis_vinifera
└── zea_mays
The second level contains experimental information for each species. Take rice(Oryza sativa) as an example:
oryza_sativa/
├── DRP000207
├── ERP108685
├── ERP109752
├──CREs
├──Lastz
├── SRP005296
…
├── SRP300369
├── SRP303912
└── SRP308960
Note that there are two special folders namedCREs
andLastz
. This folders containers OpenChromatin information and comparative genomics information on the corresponding species. Some species have information on regulatory networks, and related data are stored in the corresponding species folder, named after species.regulation.rds
The last level corresponds to the various peak information or signal information contained in the experiment. Take DRP001345 as an example:
DRP001345/
├── hammock
└── signal
Now, you can get access to data with https://biobigdata.nju.edu.cn/ChIPHub_download
January 26, 2022, by Xinkai Zhou
ChIP-seq and complementary assays are powerful methods to measure protein-DNA binding events and chemical modifications of histone proteins at genome-wide level. These technologies have become widely used to study gene-regulatory programs in animals and plants. Accordingly, a tremendous amount of data have been generated by several large consortiums (such as the ENCODE consortium in human[1] and mouse[2], as well as the modENCODE consortium in fly[3] and nematode[4]) or various smaller projects (such as the fruitENCODE project in flowering plants[5]). Several databases[6-8] were recently established for visualization and efficient deployment of public ChIP-seq data by the research community. However, no comprehensive resource is available for plant research. Another major bottleneck in current plant research is the lack of a standardized routine for evaluation and analysis of ChIP-seq data. Therefore, the comparison of data generated by different laboratories is not straightforward, hampering data integration to generate novel hypotheses for further investigation.
To this end, we launched a project at the middle of 2015 to fully reanalyze and expore ChIP-seq datasets in plants. We recently evaluated our analytical framework by an systematic reanalysis of ~100 ChIP-seq datasets for a set of floral regulators and provided a valuable resource to study regulatory circuits controlling floral organ development[9]. After this, we released the full reanalysis results to the public and developed an easy-to-use database and associated data-mining tools in a web-based platform called
ChIP-Hub
(https://biobigdata.nju.edu.cn).
References
[1] ENCODE Consortium, T. E. P. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
[2] Yue, F. et al. A comparative encyclopedia of DNA elements in the mouse genome. Nature 515, 355–364 (2014).
[3] modENCODE Consortium, T. et al. Identification of Functional Elements and Regulatory Circuits by Drosophila modENCODE. Science (80-. ). 330, 1787–1797 (2010).
[4] Gerstein, M. B. et al. Integrative analysis of the Caenorhabditis elegans genome by the modENCODE project. Science 330, 1775–87 (2010).
[5] Lü, P. et al. Genome encode analyses reveal the basis of convergent evolution of fleshy fruit ripening. Nat. Plants 4, 784–791 (2018).
[6] Oki, S. et al. ChIP-Atlas: a data-mining suite powered by full integration of public ChIP-seq data. EMBO Rep. 19, e46255 (2018).
[7] Chèneby, J., Gheorghe, M., Artufel, M., Mathelier, A. & Ballester, B. ReMap 2018: an updated atlas of regulatory regions from an integrative analysis of DNA-binding ChIP-seq experiments. Nucleic Acids Res. 46, D267–D275 (2018).
[8] Mei, S. et al. Cistrome Data Browser: a data portal for ChIP-Seq and chromatin accessibility data in human and mouse. Nucleic Acids Res. 45, D658–D662 (2017).
[9] Chen, D., Yan, W., Fu, L.-Y. & Kaufmann, K. Architecture of gene regulatory networks controlling flower development in Arabidopsis thaliana. Nat. Commun. 9, 4534 (2018).
Q
: How can I contact the researchers?
A
: Please email Dr. Dijun Chen via dijunchen@nju.edu.cn for any related question about ChIP-Hub.
Q
: How regularly is ChIP-Hub updated?
A
: A routine to maintain and update ChIP-Hub has been established. according to our current plan, ChIP-Hub is updated monthly.
Q
: How can I cite this work?
A
: We're preparing our manuscript for this. Please cite the website (Chen et al., biobigdata.nju.edu.cn) at the current stage.
Q
: Could this platform be used to deposit data that are under consideration for publication?
A
: Absolutely. If you'd like to talk, please get in contact with the main researcher, Dr. Dijun Chen, by email.
Q
: Where can I get the data to use?
A
: The meta data and result files can be downloaded by clicking on theDownload
orAction
buttons in the app. Once the paper is finalised, we'd appreciate if you would cite it if you use the data.
Q
: Could the pipeline be applied to other species?
A
: Absolutely. In principle, the whole pipeling can easily be adapted to any species with available reference genomes.
SPOT
: signal portion of tags
FRiP
: fraction of reads in peaks
NSC
: normalized strand cross-correlation coefficient
RSC
: relative Strand cross-correlation coefficient
NRF
: non-redundant fraction
PBC1
: PCR bottlenecking coefficients 1
PBC2
: PCR bottlenecking coefficients 2
QT
: quality tags
RPKM
: reads per kilobase per million mapped reads
CPM
: counts per million mapped reads
BPM
: bins per million mapped reads
RPGC
: reads per genomic content normalized to 1x sequencing depth
MPR
: mapping rate of reads
FUR
: final used reads for peak calling
GC
: GC content
We acknowledge the North-German Supercomputing Alliance (HLRN; https://www.hlrn.de/) and the Center for Information Technology and Media Management (ZIM) at Potsdam University (https://www.uni-potsdam.de/de/zim/angebote-loesungen/hpc.html) for providing high performance computing (HPC) resources.
We would like to thank all the data contributors who make this project possible.
Disclaimer: All orignal data are downloaded from public databases and reanalyzed automatically by our compuational pipeline. You should evaluate the original papers that are integrated by ChIP-Hub before making any interpretation.January 26, 2022, by Xinkai Zhou
ChIP-Hub provides dynamic guidelines through a
Getting Started
button on top of the **Home
** page, offering a quick start for exploring the features of ChIP-Hub. The nearbySamples
andExperiments
buttons provide links to the result page matching the last committed input, listed by samples or experiments respectively.ChIP-Hub also provides a quick search function, users may use the dropdown menus and the search box to search for keywords. The left dropdown menu provides three options to search by samples, experiments or gene names. The right dropdown menu allows users to select species by clicking on the names, the list can be narrowed by typing a name inside the relative input field.
Clicking the
Search
button brings up a table of all available datasets matching the input, which will be shown in theBrowser
page for further manipulations.
The
Overview
tab of theHome
page presents an overview of datasets categorized by species. These statistical results can be filtered by the top-left dropdown list of the plant species, selecting on one or more species will lead to a real-time change of the pie charts and the timeline dot plot. Users can click on the legend of the timeline plot to filter for interested species.The
Recent Study
tab shows the basic information of the two newest references in ChIP-Hub, including the title, authors, PubMed links, numbers of involving samples and experiments. A quick view of the samples or experiments in the article could be achieved by clicking on the corresponding image on the right side. The dataset will be shown on theBrowser
page.
ChIP-Hub provides multiple ways for data browsing. Besides the quick search of the **
Home
** page, users may directly click on theBrowser
button located in the top toolbar. Users may also change the way datasets are classified depending on their needs.The **
Browser
** page shows the results in theSamples
tab by default, but clicking one of the three tabs will bring users to another view. The two dropdown menus along the top of the page can help users to further filter interested species and BioProjects.A data table containing details for each sample are presented in the
Browser
page. The data table include factor, sample type, accession number, reads information, metric score, sample title and attributes. Any of these categories can be used to filter the results, users also can enter a keyword in the top-right search box to display datasets that match the keyword. Click the column names to sort the datasets in ascending or descending order. The statics information of datasets about the current species will be displayed by clicking on the green leaf button.The quality of the datasets in ChIP-Hub is measured by 7 metrics, which could be displayed by clicking on
Plot Metrics
button. ChIP-Hub provide a visualization function of interested samples by WashU EpiGenome Browser.Also, some species have CREs information and networks information. On
CREs
page, users can get Promoters and Enhancers peaks of each related sample. After click “specific” peak, users can directly find their peak in epiBrowser by clicking onVisualize
button.In addition to obtaining the information of Promoters and Enhancers, users can obtain the correlation of CREs sequences of corresponding species on the
CREs
page. Different species are grouped by order(Taxonomic Rank, i.e., class, order, family, genus), and users can filter species according to the number of related peaks.On
Networks
page, the network will be drawn based on the user's selection of TFs and associated genes after clicking theDraw Network
button. Users can also see the names of nodes by clicking theShow Names
button.
The
Tools
button in the top navigation bar will lead users to theTools
page, which includes three tools: Peak Annotation, Overlap Analysis and Signal Plot.To apply these functions, users need to choose desired datasets (rows in the table). Selecting species and project in the dropdown list will narrow the data table, and keywords can be used for further filtering these datasets. Due to the limitation of our computing resources, up to 10 rows (items) are supported in one run.
After data selection, users may click on the
Run Analysis
button to run the analysis pipeline. The settings could be adjusted to satisfy specific needs. The plots, annotation files or tables could be downloaded by clicking the download button.The tool “Peak Annotation” can annotate the location of a set of peaks in terms of genomic features, which may be useful to find potential regulators functioning in TSS-distal regions.
The tool “Overlap Analysis” is useful to explore significant overlap datasets for inferring co-regulation or transcription factor complex for further investigation.
The tool “Signal Plot” can be used to display the distribution of ChIP-seq signal over a given set of genomic regions (such as annotated protein-coding regions).
The overview of samples is presented in the
Samples
tab of theStatistics
page. The detailed statistical results for each factor are displayed in the doughnut chart. The world map of data contribution and timeline of datasets are also shown in this page. Selecting species in the top-left dropdown list will simultaneously influence these interactive visualizations.Similarly, the statistics of experiments are displayed in the
Experiments
tab.
The background and motivations of ChIP-Hub were shown on this page, including the FAQs and abbreviations.
The detailed methods of data collecting, processing and assignment procedures are well organized on the
Methods
page.
Currently, the datasets in ChIP-Hub were collected from ~360 reference literatures. A data table containing details (including PubMed links) for each reference are presented in the
List
tab of theReferences
page.The statistics related to all these references are shown in the
Statistics
tab, including their publishing journals, authors, keywords and the timeline data.
January 27, 2022, by Xinkai Zhou
Metadata of ChIP-seq and DAP-seq samples (equivalent to datasets, accession numbers start with SRX/ERX/DRX) and projects (start with SRP/ERP/DRP) were retrieved from NCBI SRA (https://www.ncbi.nlm.nih.gov/sra), BioSample (https://www.ncbi.nlm.nih.gov/biosample), BioProject (https://www.ncbi.nlm.nih.gov/bioproject) and/or GEO (https://www.ncbi.nlm.nih.gov/geo) databases. ChIP-Hub has an focus on data in “green plants” (i.e., only considering plants in the taxonomy tree with a root ID 33090). Only data generated by Illumina platforms were kept. Firstly, each dataset was associated with publication(s) if available (more than 90% samples can be linked with publications). Then, each dataset was manually curated to determine its investigated factor (i.e., which TF or histone modification mark), its experimental type (whether ChIP or control) and its associated replicates (experiment may have several replicates), based on the metadata and the original publications. Note that it is important to manually check the metadata based on its corresponding publication since some metadata was misannotated in the database. For example, the dataset SRX4063234 in fact contains two different samples, one for ChIP experiment (SRR7142417) and another for control experiment (SRR7142416). In this case, “Run” accessions (start with SRR/ERR/DRR) were instead used as sample accessions (ca. 250 of such cases). For datasets without related publications so far, they were marked as a “unconfirmed” status and would be regularly checked in the future. In general, one experiment may contain replicate samples (i.e., datasets), ChIP sample(s) as well as input control sample(s) and it was designed to investigate regulation of a specific factor (e.g., TF or histone modification) of interest under specific conditions. In the analysis (see the section below), each experiment was processed independently. Furthermore, annotation information for investigated factors was also manually curated. Broadly, factors are grouped into “TFs and other proteins”, “histone-related” or “unclassified”. For TFs, their gene IDs and family information were also determined if applicable. Finally, a meta file was obtained for each experiment after curation, which is served as an input file for the ChIP-seq computation pipeline (see below).
Raw fastq files for each experiment were downloaded from the European Nucleotide Archive (ENA, https://www.ebi.ac.uk/ena) database. If fastq files were not available at ENA, raw data in the SRA format were downloaded from the SRA database and converted into fastq format using the “fastq-dump” command provided by the SRA Toolkit (version 2.5.1). The “–split-files” option was used for paired-end reads. Fastq files were further checked for completeness before submitted to analysis.
Genome sequences and gene annotations were downloaded from public databases. Additional annotation data were also included in the ChIP-Hub database in order to better annotate the regulatory factors and their regulatory networks. Annotation for miRNA genes were obtained from miRBase[1] and their genomic locations were updated (by BLAST) based on current reference genomes. TF family information was retrieved from PlantTFDB[2]. TF DNA binding motifs were downloaded from the JASPAR[3], CIS-BP[4] and PlantTFDB[2] databases and were scanned for occurrences in the genome using FIMO[5]. These data were provided as separated data tracks in the genome browser.
We followed the ChIP-seq data analysis guidelines[6] recommended by the ENCODE project to develop computational pipeline for ChIP-seq and DAP-seq data analysis. The analysis pipeline consists of quality control, read mapping, peak calling and assessment of reproducibility among biological replicates and was used to analyze all annotated experiments a standardized and uniform manner. Specifically, potential adapter sequences were removed from the sequencing reads using the Trim Galore program (version 0.4.1) and the quality of sequencing data was then evaluated by FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Clean reads were mapped to the corresponding reference genomes using Bowtie2 (version 2.2.6; ref.[7]) with parameters “-q –no-unal –threads 8 –sensitive”. The parameter “-k” was set to 1, 2 and 3 for diploid genomes (e.g., Oryza sativa), tetraploid genomes (e.g., Gossypium barbadense) and hexaploidy genomes (e.g., Triticum aestivum), respectively. Redundant reads and PCR duplicates were removed using Picard tools (v2.60; http://broadinstitute.github.io/picard/) and SAMtools[8] (version 0.1.19).
Peak calling was performed using MACS2 (version 2.1.0; ref.[9]). Duplicated reads were not considered (“–keep-dup=1”) during peak calling in order to achieve a better specificity[10]. The shifting size (“–shift”) used in the model was determined by the analysis of cross-correlation scores using the phantompeakqualtools package (https://code.google.com/p/phantompeakqualtools/). The parameter “–call-summits” was used to call narrow peaks. For broad marks of histone modifications (including H3K36me3, H3K20me1, H3K4me1, H3K79me2, H3K79me3, H3K27me3, H3K9me3 and H3K9me1), broad peaks were also called by turning on the “–broad” parameter in MACS2. A relaxed threshold of p-value (p-value < 1e-2) was used in order to enable the correct computation of IDR (irreproducible discovery rate) values[6], because IDR requires input peak data across the entire spectrum of high confidence (signal) and low confidence (noise) so that a bivariate model can be fitted to separate signal from noise[11]. Following the recommendations for the analysis of self-consistency and reproducibility between replicates[11], replicate control samples (if available) were combined into one single control in the same experiment. Peak calling was applied to all replicates, pooled data (pooled replicates), pseudo-replicates (half subsample of reads) of each replicate and the pseudo-replicates of pooled sample using the same merged control as input (if applicable). By default, “reproducible” peaks across pseudo-replicates and true replicates with an IDR < 0.05 were recommend for analysis. Besides, peaks with different statistical thresholds are available upon request. For example, “significant” peaks were defined as a fold-change (fold enrichment above background) > 2 and a -log10 (q-value) > 3; while “lenient” peaks as a fold-change > 2 and a -log10 (q-value) > 2. “Relaxed” peaks without additional thresholding were also provided so that any custom threshold can be applied. All peak-based analyses in the pipeline (including peak overlapping, merging and summary) were performed using BEDTools (v2.25.0; ref.[12]).
Various metric scores were calculated to assess different aspects of the quality of experiments (https://genome.ucsc.edu/ENCODE/qualityMetrics.html and https://www.encodeproject.org/data-standards/terms/). For example, library complexity is measured using the non-redundant fraction (NRF) and PCR bottlenecking coefficients 1 and 2 (PBC1 and PBC2). The SPOT (signal portion of tags) score, characterizing the enrichment of signal for each experiment, was calculated by the Hotspot[13] algorithm by subsampling ten million reads. Fraction of reads in peaks (FRiP), another measure of enrichment, is highly correlated with the SPOT score. NSC and RSC (normalized and relative strand cross-correlation coefficient) are related measures of enrichment without dependence on pre-defined peaks, which were calculated by the phantompeakqualtools program.
For visualization purpose, wiggle tracks (using pooled data across replicates) were generated by DeepTools[14] with the “bamCoverage” program; different normalization methods (including RPKM [reads per kilobase per million mapped reads], CPM [counts per million mapped reads], BPM [bins per million mapped reads], RPGC [reads per genomic content normalized to 1x sequencing depth] and None) were used to generate different types of signal files. ChIP-seq tracks were visualized in the WashU Epigenome Browser[15].
Regulatory elements (in layman's terms, called “peaks”) were assigned to putative target genes based on the following rules. For a regulatory region overlapping with any gene(s) (protein-coding genes or miRNAs), the overlapping gene(s) were considered as its targets. Otherwise, the regulatory element was assigned to its nearest annotated gene within up to N bp, where N is the median size of intergenic regions (N was set to 3000 if the median size exceeded 3000). The start of genes (i.e., the transcription start site [TSS] of protein-coding genes and the 5’ end of miRNA precursors [pre-miRNAs]) was used to calculate the distance. In general, this approach associates a single regulatory element with no more than two genes, with a few exceptions in the case of the regulatory element overlapping multiple genes. This procedure was performed in each species independently.
[1] Kozomara, A., Birgaoanu, M. & Griffiths-Jones, S. MiRBase: From microRNA sequences to function. Nucleic Acids Res. (2019). doi:10.1093/nar/gky1141
[2] Jin, J. et al. PlantTFDB 4.0: Toward a central hub for transcription factors and regulatory interactions in plants. Nucleic Acids Res. (2017). doi:10.1093/nar/gkw982
[3] Khan, A. et al. JASPAR 2018: Update of the open-access database of transcription factor binding profiles and its web framework. Nucleic Acids Res. (2018). doi:10.1093/nar/gkx1126
[4] Weirauch, M. T. et al. Determination and Inference of Eukaryotic Transcription Factor Sequence Specificity. Cell 158, 1431–1443 (2014).
[5] Grant, C. E., Bailey, T. L. & Noble, W. S. FIMO: scanning for occurrences of a given motif. Bioinformatics 27, 1017–8 (2011).
[6] Landt, S. G. et al. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Research 22, 1813–1831 (2012).
[7] Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
[8] Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
[9] Zhang, Y. et al. Model-based Analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).
[10] Bailey, T. et al. Practical Guidelines for the Comprehensive Analysis of ChIP-seq Data. PLoS Comput. Biol. 9, (2013).
[11] Li, Q., Brown, J. B., Huang, H. & Bickel, P. J. Measuring reproducibility of high-throughput experiments. Ann. Appl. Stat. 5, 1752–1779 (2011).
[12] Quinlan, A. R. & Hall, I. M. BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
[13] John, S. et al. Chromatin accessibility pre-determines glucocorticoid receptor binding patterns. Nature Genetics 43, 264–268 (2011).
[14] Ramírez, F., Dündar, F., Diehl, S., Grüning, B. A. & Manke, T. DeepTools: A flexible platform for exploring deep-sequencing data. Nucleic Acids Res. 42, (2014).
[15] Zhou, X. et al. The human epigenome browser at Washington University. Nature Methods 8, 989–990 (Nature Research, 2011).
August 12, 2019, by Dijun Chen
Information of Reference Genomes
Plant species | Tax ID | Common name | Short ID (used for Figures) | Genome release version | Genome size (in Mb) | Average intergenic size (bp) | #miRNA genes | #protein-coding genes |
---|---|---|---|---|---|---|---|---|
Aegilops tauschii | 37682 | Tauschs goatgrass | ata | Aet v4.0 | 4078.89 | 106088 | 86 | 39152 |
Arabidopsis lyrata | 59689 | Lyre-leaved rock-cress | aly | v1.0 | 200.93 | 4113 | 198 | 32667 |
Arabidopsis thaliana | 3702 | Mouse-ear cress | ath | TAIR10 | 119.15 | 2310 | 323 | 27206 |
Arabis alpina | 50452 | Alpine rock-cress | aal | v5.0 | 317.91 | 8123 | 64 | 34220 |
Beta vulgaris | 161934 | Sugar beet | bau | RefBeet-1.2.2 | 520.56 | 15871 | 0 | 26521 |
Brachypodium distachyon | 15368 | Purple false brome | bdi | v3.0 | 271.07 | 5052 | 313 | 34310 |
Brassica napus | 3708 | Rape | bna | v4.1 | 850.29 | 6452 | 88 | 101040 |
Brassica oleracea | 3712 | Savoy cabbage | bol | v1.0 | 385.01 | 9127 | 8 | 35400 |
Brassica rapa | 3711 | Turnip mustard | bra | v1.3 | 297.59 | 5298 | 84 | 40492 |
Carica papaya | 3649 | Papaya | cpa | ASGPBv0.4 | 288.98 | 9940 | 75 | 27769 |
Chlamydomonas reinhardtii | 3055 | Chlamydomonas reinhardtii | cre | v5.5 | 110.59 | 3540 | 40 | 17741 |
Citrullus lanatus | 260674 | Watermelon | cla | v1.0 | 355.25 | 12393 | 0 | 23440 |
Cucumis melo | 3656 | Muskmelon | cme | v4.0 | 417.00 | 11448 | 118 | 29980 |
Cucumis sativus | 3659 | Cucumber | csa | ASM407-v2 | 192.95 | 5119 | 5 | 23780 |
Eucalyptus grandis | 71139 | Flooded gum | egr | v2.0 | 651.05 | 15522 | 2 | 36349 |
Eutrema salsugineum | 72664 | Saltwater cress | esa | v1.0 | 238.95 | 6958 | 77 | 26351 |
Fragaria vesca | 57918 | Woodland strawberry | fve | v1.1 | 206.89 | 3439 | 118 | 32831 |
Glycine max | 3847 | Soybean | gma | Wm82.a2.v1 | 949.18 | 13704 | 671 | 56044 |
Gossypium arboreum | 29729 | Tree cotton | gar | BGI v2.0 | 1541.29 | 35944 | 1 | 40134 |
Gossypium barbadense | 3634 | Sea-island cotton | gba | HAU-SGI v1.0 | 2045.17 | 28144 | 7 | 82099 |
Gossypium hirsutum | 3635 | Upland cotton | ghi | NAU-NBI v1.1 | 2053.61 | 28285 | 72 | 70478 |
Gossypium raimondii | 29730 | New World cotton | gra | v2.1 | 751.27 | 17458 | 294 | 37505 |
Hordeum vulgare | 4513 | Barley | hvu | Hv_IBSC_PGSB_v2 | 4833.79 | 127710 | 52 | 39734 |
Lotus japonicus | 34305 | Lotus japonicus | lja | MG20_v3.0 | 446.89 | 8808 | 291 | 39648 |
Malus domestica | 3750 | Apple | mdo | GDDH13 v1.1 | 709.56 | 12587 | 254 | 45116 |
Medicago truncatula | 3880 | Barrel medic | mtr | Mt4.0v1 | 397.59 | 5360 | 653 | 50894 |
Musa acuminata | 4641 | Banana | mac | v1.0 | 472.96 | 9216 | 1 | 36528 |
Oryza sativa | 39947 | Japonica rice | osa | IRGSP-1.0 | 374.47 | 7145 | 589 | 39049 |
Phaseolus vulgaris | 3885 | French bean | pvu | v2.1 | 532.24 | 15480 | 8 | 27433 |
Physcomitrella patens | 3218 | Moss | ppa | v3.0 | 470.36 | 1821 | 244 | 32926 |
Populus trichocarpa | 3694 | Western balsam poplar | ptr | v3.0 | 422.50 | 7069 | 334 | 42950 |
Prunus persica | 3760 | Peach | ppe | v2.0 | 226.00 | 5504 | 180 | 26873 |
Pyrus bretschneideri | 225117 | Chinese white pear | pbr | v121010 | 500.23 | 36957 | 3 | 10974 |
Rosa chinensis | 74649 | China rose | rch | v1.0 | 518.52 | 10479 | 5 | 39669 |
Setaria italica | 4555 | Foxtail millet | sit | v2.0 | 403.32 | 9211 | 1 | 35831 |
Solanum lycopersicum | 4081 | Tomato | sly | ITAG2.4 | 823.94 | 20989 | 110 | 34725 |
Solanum tuberosum | 4113 | Potato | stu | v4.03 | 773.03 | 17472 | 221 | 39028 |
Sorghum bicolor | 4558 | Sorghum | sbi | v3.0.1 | 704.28 | 17686 | 205 | 34129 |
Triticum aestivum | 4565 | Wheat | tae | IWGSC_RefSeq_v1.0 | 14547.26 | 130533 | 107 | 110790 |
Triticum urartu | 4572 | Red wild einkorn | tur | WheatTU | 4712.41 | 123641 | 1 | 41493 |
Vitis vinifera | 29760 | Grape | vvi | IGGP_12X | 486.20 | 11990 | 161 | 26346 |
Zea mays | 4577 | Maize | zma | AGPv3 | 2066.43 | 49870 | 167 | 39295 |