The release of the new NCBI RefSeq track marks a major shift in how we include annotations from NCBI’s Reference Sequence Database (RefSeq) in the UCSC Genome Browser. This new track is a composite track that contains the combined set of curated and predicted annotations from the RefSeq database for hg38/GRCh38. It also contains tracks that break up the annotation set into a few subsets. These subsets include only the curated transcripts (NM, NR, or YP transcripts), only the predicted transcripts (XM or XR transcripts), all of the other annotations from RefSeq that don’t fit into the curated or predicted subsets, and the alignments of the curated and predicted transcripts to the genome. All of the coordinates and alignments in these tracks are provided by the RefSeq group.
This new NCBI RefSeq composite also includes a “UCSC RefSeq” track that is based on our original method of producing the “RefSeq Genes” track. This “UCSC RefSeq” track is built by aligning RNAs obtained from the RefSeq Database to the genome. In the early days of the UCSC Genome Browser, only RNA sequences were provided by RefSeq, so we used BLAT to align them to the genome. This was a good solution in the past, but over time this method has led to some issues with transcripts matching to multiple places and our alignments of small exons or other regions differing slightly from those found in the RefSeq database. This type of minor alignment difference can be seen in the following session, where you can see that the RefSeq Curated (top) and UCSC RefSeq (bottom) tracks place the small fifth exon in transcript NM_001130970 at different locations due to the fact that there are multiple matches to this exon sequence in that region.
The new set of RefSeq tracks differs from the “UCSC RefSeq” track in a few key ways. First, as mentioned previously, the new tracks are based entirely on positions and alignments provided by RefSeq. Second, this track is currently only available for the hg38/GRCh38 assembly. This means that if you obtain the hg38 coordinates for a RefSeq transcript from the UCSC Genome Browser, these coordinates should be the same as those from the entry found at NCBI’s RefSeq Database. Lastly, these new NCBI RefSeq tracks include predicted transcripts, which were absent from our original RefSeq track.
This has been a long and exciting collaboration between the UCSC Genome Browser staff and NCBI’s RefSeq group. We trust that this full complement of tracks from the Reference Sequence Database will be helpful to you, our Browser users. We hope to bring these tracks to more genome assemblies in the future.
If after reading this blog post you have any public questions, please email firstname.lastname@example.org. All messages sent to that address are archived on a publicly accessible forum. If your question includes sensitive data, you may send it instead to email@example.com.