• How is the genome annotated?

    Most of the v5.1 genes were lifted over from the V3.1 genome (95%). The rest were newly identified in the v5.1 genome, mainly using the expression-based method or de novo prediction method.
    See Montgomery, Tanizawa, Galic, et al. (2020) for more details.

  • Genome version

    The latest version of the Marchantia genome is ver5.1, and the latest release of the annotaion is r1 (MpTak1v5.1r1).

  • Gene identifiers

    Permanent ID (mRNA, miRNA)
    Format: Mp#gNNNNN.M (example: Mp2g01240.2, MpVg00200.1, Mpzg00150.3)
    '#' corresponds to a chromosome number, [1~8 for autosomes, U (reserved for the upcoming female genome) and V for sex chromosomes, z for unplaced scaffolds]. 'NNNNN' represents a number in the chromosome, which is incremented by 10 according to the location in the chromosome, ie. 00010, 00020, 00030. When a new gene is identified between the two existing loci, it will be numbered as 00015, 00018, and so on. 'M' represents transcript variants that derive from the same locus, ie. Mp3g17630.1, Mp3g17630.2, Mp3g17630.3.
    The permanent IDs will be lifted over as much as possible when the genome/annotation is updated in the future.

    Temporary ID (rRNA, tRNA)
    rRNAs and tRNAs are predicted only by computational method, hence they are assigned with temporary identifiers, which will not be lifted over in the future version of the genome/annotation. The temprory IDs have an alphabet letter at the end of the locus identifier (ie. Mp1g00015a.1, Mp1g00015b.1, Mp1g00015c.1, ...).

  • GTF/GFF file for RNA-seq analysis

    Please use MpTak1v5.1_r1_primary_transcripts.gtf/.gff in the download page.
    It contains only the primary transcript of mRNA for each locus (Mp#gNNNNN.1). Duplication such as Mp1g28950 and Mp1g28960 were removed. (Duplication is caused by the v3.1 genes mapped to the same locus in the v5.1 genome.)