Saturday, September 15, 2012

Prepare custom long-range interaction track

A sample script for converting certain UCSC ChIA-Pet track files into WashU Browser track format is now available at http://epigenomegateway.wustl.edu/browser/script/, with name "makeTrack_from_ucscChiapet.py".

To use this script, first download a ChIA-Pet track file from UCSC/ENCODE public file directory: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeGisChiaPet/

Here we use "wgEncodeGisChiaPetHct116Pol2InteractionsRep1.bed.gz" as input for the script. Only files ending with ".bed.gz" or in similar format can be processed by this script.

Make sure you have bedSort, bgzip and tabix programs installed on your computer. On a linux computer, run these commands:

gunzip wgEncodeGisChiaPetHct116Pol2InteractionsRep1.bed.gz

python makeTrack_from_ucscChiapet.py wgEncodeGisChiaPetHct116Pol2InteractionsRep1.bed abcd

After these two steps, 2 files will be generated: "abcd.gz", and "abcd.gz.tbi". Follow step 4 below to display this track via the custom track mechanism.

You are likely required to make small modifications to this script so it can process your data with a different format.




0Make sure you have the tabix program installed.
You can download the latest source and compile:
http://sourceforge.net/projects/samtools/files/tabix/

Or if you're using Ubuntu operating system, install it using apt-get:
$ apt-get install tabix
You should have both tabix and bgzip programs available on your computer.

1
Make a text file for your long-range interaction data with following columns:
  1. chromosome name
  2. start coordinate
  3. stop coordinate
  4. information about the interacting region (e.g. chrX:123-456,3.14, where "chrX:123-456" is the coordinate of the mate, and "3.14" is the score of the interaction)
  5. ID (unique non-negative integer)
  6. relative direction of the interacting region
Be sure to make TWO records for a pair of interacting loci, one record for each locus.


As an example, interval "chr1:111-222" interacts with interval "chr2:333-444" on a score of 55, we will use following two lines to represent this interaction:

chr1   \t   111   \t   222   \t   chr2:333-444,55   \t   1   \t   .
chr2   \t   333   \t   444   \t   chr1:111-222,55   \t   2   \t   .


2Compress the text file:
$ bgzip interaction.txt

The old file is gone and a new file "interaction.txt.gz" is there instead.

3Build tabix index of the compressed file:
$ tabix -p bed interaction.txt.gz

The "interaction.txt.gz" is untouched but an index file "interaction.txt.gz.tbi" is generated.

4Display this file as a custom long-range interaction track on WashU Genome Browser.
Place both files ".gz" and ".gz.tbi" on the SAME directory on your web server.
Use only the URL to the .gz file to make the custom track.

No comments:

Post a Comment