Sunday, September 9, 2012

Prepare custom track of annotation data (or "bed" track)

0 Make sure you have the tabix program installed.
You can download the latest source and compile:
http://sourceforge.net/projects/samtools/files/tabix/

Or if you're using Ubuntu operating system, install it using apt-get:
$ apt-get install tabix
You should have both tabix and bgzip programs available on your computer.

1 Skip this step if your file is BED format.

Run the command bigBedToBed in UCSC genome browser tool set and convert the bigBed file to a bed text file.

2 Compress the BED file:
$ bgzip input.bed

The old file is gone and a new file "input.bed.gz" is there instead.

3 Build tabix index of the compressed BED file:
$ tabix -p bed input.bed.gz

The "input.bed.gz" is untouched but an index file "input.bed.gz.tbi" is generated.

4 Display this file as a custom bed track on WashU Genome Browser.
Put the .gz and .gz.tbi files on the SAME directory on your web server.
Use only the URL to the .gz file to make the custom track.

The BED format used by WashU Epigenome Browser:
  1. chromosome name
  2. start coordinate
  3. stop coordinate
  4. Name (if absent, use dot)
  5. ID (unique non-negative integer)
  6. Strand (+/-/.)

2 comments:

  1. Subject: Viewing ENCODE broadPeak data in epigenome browser.

    Text:
    I would like to view peak-called ENCODE data (BED6+3 format) in the epigenome browser.
    I downloaded a *SigPk.txt.gz from the UCSC FTP site, decompressed it, recompressed it (bgzip). When I run tabix "tabix -p bed .txt.gz" I get the following error:
    [ti_index_core] the chromosome blocks not continuous at line 1564, is the file sorted? [pos 1

    What am I doing wrong? Is there an alternate way to view peak-called data on the browser?

    Also: is it correct that the ENCODE TFBS data in the browser (I use mm9) does not have input data subtracted from it? So I would need to somehow account for the input channel before running the gene plot?

    Thanks in advance,
    Shraddha

    PS: I am posting the same comment on the epigenome browser mailing list. Pardon the duplication
    ----
    Shraddha Pai, Ph.D.
    Post-doctoral fellow
    Krembil Family Epigenetic Research Laboratory (Lab head: Dr. Art Petronis)
    Centre for Addiction and Mental Health, Toronto


    ReplyDelete
    Replies
    1. Many of the files distributed by UCSC Genome Browser contains the "bin" at the first column, as is the case of this file. You must remove this "bin" field so the rest of the file can be used as a wholesome BED file.

      Do it like "cut -f2,3,4,.... ucscfile > yourfile"

      And beware you need to use a UNIQUE INTEGER at the 5th field when browsing a BED track on WashU Browser. Good luck!

      Delete