Our service: genome annotation

Your advantages

  • considerably higher quality of the gene predictions
  • annotation of genomes at all assembly levels
  • higher consistency across genomes and time
  • annotation of transposons and UTRs
  • short delivery times

Choose from three packages according to your needs

Pricing of Basic package slightly depends on genome size due to the strong differences in computing power.

Basic Standard Enterprise

Protein genes
protein-coding genes
pseudogenes (e.g. from reverse transcription)

RNA genes
tRNAs | rRNAs | spliceosomal RNAs | RNAse-P

Additional annotations

12 months quarterly

Statistics table
summarizing genome assembly and annotation statistics

genome assembly plots (L50, GC content, etc.)
genome annotation plots (genetic code, codon frequencies, intron distributions, exon statistics, functional annotations)

Comparative genomics analyses
plots, comparing customer genome/annotation with data from related species

Upon request
detailed analysis of up to 5 genes
(e.g. across plants, animals, fungi)



RNA genes


Higher accuracy through a novel proprietary approach

With our innovative method, we achieve an unprecedented level of quality for genome annotations. Our iterative approach  leads to continuous improvement of the already annotated datasets. This way we avoid inconsistent and different assignment and naming of homologous genes/proteins in different organisms. 

Protein genes

Protein-coding genes
Annotated by our proprietary software. UTR-regions are determined by using existing RNAseq data or by prediction.

Genes are annotated in full-length independent of potential assembly problems. All genes containing in-frame stop codons or frame-shifts are termed potential pseudogenes.

Transposons are annotated as "potential transposon" independently of whether the in-silico transcripts have uninterrupted or interrupted reading frames. Transposons with fragmented reading frame are termed pseudotransposon.

RNA genes

tRNA genes
Annotated by using tRNAscan-SE.

All other RNA genes are annotated by using Infernal and RNA-family models from the most recent Rfam database.

rRNA genes
All RNA genes that are building blocks of ribosomes.

spliceosomal RNA genes
All RNA genes involved in the building and transformation of the various spliceosomal complexes during the splicing process. This includes the U1-U6 subunits as well as the subunits for splicing AT---AC introns and the subunits for the minor spliceosome (U11/U12).

The RNA part of the telomerase ribonucleoprotein.

nuclear ribonuclease P
Abbreviated as RNAse-P. Reponsible for generation of mature 5'-ends of tRNAs. Should be present in every eukaryote.

1 Gene.

1 ID.


GOENOMICS Universal Gene ID

Unique ID

We follow the strict concept of a universal gene ID, i.e. each gene locus gets a unique ID. In this way, we avoid one of the biggest and most disturbing problem in public databases, where up to dozens of different IDs (accession numbers) are assigned for one and the same gene locus in one organism.

Versioning of corrections | additions

 Corrections and additions are controlled by versioning and not renaming. In our database, we can trace every change back to any point in time.

Updates for customers' annotations

As part of the Enterprise package, updates of genome annotations are provided quarterly.





Sequencing is getting faster -
so is genome annotation!

You provide

a genome assembly in fasta-format
The assembly must not be supercontig or chromosome-scale. Annotating fragmented assemblies, however, might take longer.

a species/tax name
By internal testing we can determine a most-likely taxon but this takes additional computational efforts.

RNAseq data (optional)
If RNAseq data is provided we can enhance the exon annotations with coverage information. Also, UTRs will be based on RNAseq evidence while otherwise only predicted.

You get

a genome annotation in gff3-format
The provided gff3 will be gt-tool-consistent and can directly be used in jbrowse. We can provide gtf/gff-files in other versions upon request.

protein sequences in fasta-format
While these can be extracted from the gff3 by many tools, we provide these to include frame-shifts and in-frame stop codons. These are not revealed by any gff3 parsing tool.

cDNA sequences in fasta-format
Provided with "N" nucleotides at frame-shift and in-frame stop codon positions.

an annotation results table
A table with detailed information about assembly and annotation statistics.