How to make a bigBed file – Part 1

In this blog post, I’ll share the experience a user could be having where they have an existing text-based custom track that could be made into a more shareable bigBed version.

Let’s say the original track is in the bedDetail format that allows for BED12+ columns using tabs to define additional columns. This original track can  be made into a bigBed track to be put in a Track Hub or to be hosted alone and shared across multiple sessions, where the bigBed could act as a universal custom track.  If it were updated at the bigBed hosted location, all the related sessions that referenced the new bigBed remotely-hosted location of the data would have their representations of the data updated as well.

Let’s begin with the idea that Jerry’s Lab would like to host a primers track and share it between sessions for their lab group. The lab has already created a primers custom track in text files that can be updated and uploaded successfully.

The below steps will take Jerry’s lab from this uploading approach, to putting the data in a  shared online location and using a binary-indexed format of the custom track called bigBed. The bigBed is hosted at an online location defined by a bigDataUrl which allows Jerry’s entire lab to see the updated data as new primers are added.  This way each lab member in Jerry’s lab can use their early sessions, but get new data in their views, provided the bigBed is updated with the new information at the URL shared between all the sessions.

For this example, imagine Jerry’s lab is already using a tab-separated bedDetail custom track text file that might look like this:

track name=Primers type=bedDetail description=Primers visibility=2 color=221,55,118
browser position chr5:1405000-1448000
chr5    1413367    1413387    hDAT32061R    0    .    1413367    1413387    221,55,118    1    20    0    catggagtgggccctttcag
chr5    1414322    1414343    hDAT31086F    0    .    1414322    1414343    221,55,118    1    21    0    cctcaagcccaaatgcagctg
...

This track with type=bedDetail can upload  a text file to display BED12 items (http://genome.ucsc.edu/FAQ/FAQformat.html#format1) with an additional 13th column with sequence (making it a bedDetail format: http://genome.ucsc.edu/FAQ/FAQformat.html#format1.7). With bedDetail a user has either the first 4 or 12 columns of data in BED format, and can extend the format with additional fields, such as sequence data here, to enhance the track details pages.

By going to the My Data and Custom Tracks page, the above text can be pasted and will work (provided there are tabs between the columns, some cut and paste interfaces will remove tabs).

When this custom track is added to a session as a text file, it is uploaded one time and does not update further unless there is a new upload. If Jerry’s lab wanted to update the Primers tracks in their sessions, a future upload of the text-based track would be required in each individual session. Once created, the original sessions that have uploaded text data are static. To solve this issue for Jerry’s lab, an option is to make a URL-hosted location of the data and  turn the data into a binary-indexed bigBed format.  In this way the new URL-hosted bigBed could act as a universal custom track across many sessions.

Here are the steps to do that.

1. The first would be to edit the file and remove the top track and browser lines, they will be used again at a later step after the bigBed is created.

chr5    1413367    1413387    hDAT32061R    0    .    1413367    1413387    221,55,118    1    20    0    catggagtgggccctttcag
chr5    1414322    1414343    hDAT31086F    0    .    1414322    1414343    221,55,118    1    21    0    cctcaagcccaaatgcagctg
...

This link is an example of that file for those that want to follow along with the next steps.

curl -O https://data.cyverse.org/dav-anon/iplant/home/brianlee/Lab_Primers.txt

2. Next, in a command-line environment, you can use the UNIX sort command to sort the data in your file and call the file Lab_Primers.txt

sort -k1,1 -k2,2n Lab_Primers.txt > Lab_Primers_sorted.txt

The command creates a new file Lab_Primers_sorted.txt where all the entries are ordered correctly.

3. Next we will acquire the bedToBigBed utility assuming you are using a MacBook

curl -O http://hgdownload.soe.ucsc.edu/admin/exe/macOSX.x86_64/bedToBigBed.

4. Then we will make the bedToBigBed utility executable:

chmod 700 bedToBigBed

5. With this utility we will need a definitions file to explain what each column means. We will get an example that will work with these 13 columns, but we could edit this file or make our own.

curl -O https://genome-source.gi.ucsc.edu/gitlist/kent.git/raw/master/src/hg/lib/bed12Source.as

6. With the bedToBigBed utility and the bed12Source.as file, we can now use the tool to build from the Lab_Primers_sorted.txt file a new Lab_Primers.bigBed file for the hg19 genome, using a URL to find the chromosome sizes for the hg19 assembly.

./bedToBigBed -type=bed12+ -as=bed12Source.as -tab Lab_Primers_sorted.txt http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.chrom.sizes Lab_Primers.bigBed

With the following three optional steps, we can get another tool called bigBedToBed to check the extraction of data from the file:

curl -O http://hgdownload.soe.ucsc.edu/admin/exe/macOSX.x86_64/bigBedToBed
chmod 700 bigBedToBed
./bigBedToBed -chrom=chr5 -start=1419444 -end=1445682 Lab_Primers.bigBed stdout

7. Now we need to host this data somewhere online so that it can be found by the Browser. One option is CyVerse, you can read more about them at this location: http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html#Hosting

8. Once you have an online location to the bigBed (for example: https://data.cyverse.org/dav-anon/iplant/home/brianlee/Lab_Primers.bigBed) you can add it to your sessions. Go to the custom track page and put in a track like the following, where you can use your track and browser lines again, but change type=bedDetail to type=bigBed and use a bigDataUrl:

browser position chr5:1405000-1448000
track name=Primers type=bigBed description=Primers visibility=2 color=221,55,118 bigDataUrl=https://data.cyverse.org/dav-anon/iplant/home/brianlee/Lab_Primers.bigBed

9. Save a session with this bigBed as a custom track. Example: https://www.genome.ucsc.edu/s/brianlee/Primers

10. Now anytime  the file has updates, the session that references this bigDataUrl location of the bigBed data should also update. If  CyVerse is used to host the bigBed data file online, this may require deleting and replacing your file to force a browser to reload (Control-Shift-R) the file to trigger caching to expire. Contact CyVerse directly for help.

Finding your own institution to host  your data is often the best solution as you can then work with your system administrators to have the best experience.

Once you have the bigBed, it is not much more work to take it to the next step and put it inside a Track Hub. Once in a Track Hub, many additional features are possible, such as using a searchIndex feature that allows finding unique named items within your custom track on the search bar or ultimately creating a Public Hub to share your data with the wider community.


This entry written by Brian Lee. If after reading this blog post you have any public questions, please email genome@soe.ucsc.edu. All messages sent to that address are archived on a publicly accessible forum. If your question includes sensitive data, you may send it instead to genome-www@soe.ucsc.edu.