The datasets listed below are used in UMD3.1 JBrowse.
ASSEMBLY
UMD3.1_chromosomes.fa.gz - This file contains FASTA formatted sequences for chromosomes 1-29 (accessions GK000001.2 to GK000029.2), X (GK0000030.2), MT (AY526085.1) and 3286 unassigned scaffolds (accessions that begin with "GJ").
Note: There is a newer assembly in NCBI with the assembly name 'UMD3.1.1'. This assembly was created by NCBI after conducting contamination studies and identifying groups of scaffolds that aren't part of the Bos taurus genome. Please refer to PMC4243333 for more information.
When comparing both the assemblies, the only difference is that these contaminant scaffolds are removed in UMD3.1.1. There are 138 such scaffolds and the list of scaffold names can be found here.
ANNOTATION
Ensembl:
Ensembl79_UMD3.1_genes.gff3.gz - This file contains coordinates for Ensembl (release 79) bovine protein coding genes and non-protein-coding genes. We downloaded the Bos taurus Ensembl GTF and reformatted it for our Jbrowse/Apollo database.
NCBI:
The following files contain data from NCBI Bos taurus Annotation Release 104. We downloaded the GFF file from NCBI FTP site for the Bos taurus genome and formatted it into individual feature type-based files for our Jbrowse/Apollo database.
RefSeq_UMD3.1.1_protein_coding.gff3.gz - This file contains protein-coding genes.
RefSeq_UMD3.1.1_frameshift.gff3.gz - This file contains protein coding genes that are supported by cDNA alignments to the genome assembly, but have translational discrepancies due to assembly errors such as indels.
RefSeq_UMD3.1.1_microRNA.gff3.gz - This file contains microRNA genes.
RefSeq_UMD3.1.1_noncoding.gff3.gz - This file contains other non-protein-coding genes.
RefSeq_UMD3.1.1_pseudogene.gff3.gz - This file contains pseudogenes.
RefSeq_UMD3.1.1_multitype_genes.gff3.gz - This file contains ambiguous genes that have coding as well as non-coding transcripts.
Bovine Official Gene Set version 2:
Bovine_OGSv2_liftOver_UMD3.1_intact_genes.gff3.gz - This file contains the Bovine Official Gene Set version 2, which includes manual annotations submitted by bovine researchers as part of the Bovine Genome Sequencing Consortium project. Genes predicted on the Btau_4.0 assembly were mapped to UMD3.1 using the UCSC liftOver Tool. We are in the process of generating a new Official Gene Set on UMD3.1 using RNA-Seq data from Dominette (the individual whose genome was sequenced).
Bovine_OGSv2_liftOver_UMD3.1_broken_genes.gff3.gz - This file contains genes that did not completely liftOver from Btau_4.0 to UMD3.1 assembly.
BOVINE HAPMAP SNP
BovineHerefordAssemblySNP_UMD3.1.gff3.gz - This file contains Single Nucleotide Polymorphisms (SNPs) from the Bovine HapMap Consortium project, mapped from Btau_4.0 to UMD3.1 using UCSC liftOver Tool.
HolsteinHaplotypeBlock_UMD3.1.gff3.gz - This file contains Holstein Haplotype Block, mapped from Btau_4.0 to UMD3.1 using UCSC liftOver Tool.
BovineInterbreedSNP_UMD3.1.gff3.gz - This file contains the Bovine Interbreed SNP, mapped from Btau_4.0 to UMD3.1 using UCSC liftOver Tool.
BovineRepresentativeSNP_UMD3.1.gff3.gz - This file contains the Bovine Representative SNP, mapped from Btau_4.0 to UMD3.1 using UCSC liftOver Tool.
QTLs
Bovineqtl_liftOver_UMD3.1_QTL.gff3.gz - This file contains Bovine QTL predicted on Btau_4.0, which are mapped to UMD3.1 using UCSC liftOver Tool.
Animalgenome_UMD3.1_QTL.gff3.gz - This file contains QTL for UMD3.1 assembly as provided by Animalgenome.org. We downloaded the GFF3 file and reformatted it for our JBrowse/GBrowse database.
PROTEIN HOMOLOG ALIGNMENTS
The following files contain alignments of protein homologs to the UMD3.1 assembly. Protein sequences were aligned to the genome using Exonerate (protein2genome) via Maker.
Proteins from Ensembl:
Ensembl_Canis_familiaris.BROADD2.67.pep.all_vs_UMD3.1.gff3.gz
Ensembl_Equus_caballus.EquCab2.67.pep.all_vs_UMD3.1.gff3.gz
Ensembl_Homo_sapiens.GRCh37.67.pep.all_vs_UMD3.1.gff3.gz
Ensembl_Mus_musculus.NCBIM37.67.pep.all_vs_UMD3.1.gff3.gz
Ensembl_Sus_scrofa.Sscrofa10.2.67.pep.all_vs_UMD3.1.gff3.gz
Proteins from RefSeq:
RefSeq_Canis_lupus_familiaris_protein_vs_UMD3.1.gff3.gz
RefSeq_Equus_caballus_protein_vs_UMD3.1.gff3.gz
RefSeq_Homo_sapiens.protein_vs_UMD3.1.gff3.gz
RefSeq_Mus_musculus.protein_vs_UMD3.1.gff3.gz