WashU EpiGenome Browser: The data hub

Tabular format is deprecated. Click here to read about the new JSON format.

Updated 24/9/2013: start to use the JSON hub format
Updated 6/1/2013: file format indicator "sam" is replaced by "bam"
Updated 12/27/2012
Updated 10/2/2012
Posted on 4/2012

We are very happy to announce a new function, the "data hub".

The "data hub" concept is about organizing custom tracks in a tidy way and the ability of batch upload. User can prepare a data hub to host bedGraph or SAM tracks generated from her experiments, write up some metadata terms, and bring it up on the browser via one single mouse click.

The benefit

Batch uploading (eliminating the labor of manual uploading one each time)
Custom track information are made persistent (encoded in a hub descriptor file)
Track files don't have to be on the same web server
All types of tracks can be annotated by custom metadata
With a datahub you can configure:

Default track display mode (you can have some of the tracks turned on by default while keep the rest hidden)
Default rendering style (this feature is still been worked on but you can control some key styles e.g. color and height)

Submitting a sample datahub

On the toolbox panel click button CustomTK, the Custom track submission panel will be displayed:

Click button "DataHub" to show the contents:

Enter the URL of a sample hub (or click the text as indicated, the URL for hg19 sample hub is http://vizhub.wustl.edu/hubSample/hg19/hub2.txt). Click "Load" button, tracks of the sample hub will be displayed:

As displayed in above screen shot, the hg19 sample hub contains many tracks, including 4 heatmap tracks (in blue), one bed track named "mattress", one long-range interaction track named "fractal globule", and one BAM track named "tempest". It also contains two metadata terms shown as two columns in the metadata color map. The heatmap tracks are annotated by the terms.

Go back to the "Custom track" control panel:

This time select button "Manage". The number in parenthesis tells number of custom tracks that have been registered in the management table, it increments when more tracks are added. The table would look like below:

Creating your own datahub

You need to create a hub descriptor file , place it on your web server, and that's all.

The file is a simple text file. It is line-oriented and tab-delimited. Each line defines a track or a metadata term. Each type of track might require different number of fields to describe itself.

As you've already seen from the custom track panel, five types of custom tracks are available to be included in your data hub:

Quantitative data (bedGraph format, slow and non-restrictive) info↗
Quantitative data (bigWig format, fast and restrictive) info↗
Genomic features or annotation (BED) info↗
Long-range genome interaction (BED-derived format) info↗
Read alignment (BAM format) info↗

Lines starting with "#" are comments. Blank lines are allowed.

None-comment lines are separated into fields by tab and each file type might have different requirement on fields, and the order of the fields must be obeyed.

Note the track style feature has only limited options and is under development. To use it follow this simple example.

Quantitative data (bigWig)

1st field must be bigWig
2nd field is URL to the bigwig file
3rd field is track name, must not contain tab or quotes, or other bizarre characters
4th field is mode string, must be either show or hide
5th field is metadata annotation, should be in the format of "name:attribute" pairs, colon is used to separate "name" and "attribute" in a pair, and multiple pairs will be separated by comma
6th field is optional custom style

Quantitative data (bedGraph)

1st field must be bedGraph
2nd field is URL to the bedGraph file
3rd field is track name
4th field is mode string, must be either show or hide
5th field is metadata annotation same as previous
6th field is optional custom style

Genomic annotation

1st field must be the word bed
2nd field is URL to the bed file
3rd field is track name
4th field is display mode, must be one of hide, density, thin, or full
5th field is metadata annotation, same as previous
6th field is optional custom style

Long-range genome interaction

1st field must be the word longrange
2nd field is URL to the track file
3rd field is track name
4th field is display mode, must be one of hide, arc, trihm, thin, or full
5th field is metadata annotation, same as previous
6th field is optional custom style

Read alignment

1st field must be word BAM
fields 2 to 6 have same requirements as bed track

metadata term:

first field must be the word metadata
second field is term name, terms identical as native metadata terms (such as "Sample") can be used
third field is attributes of the term, multiple attributes will be separated by comma

Finally, always refer to the sample hg19 hub for reference.

WashU EpiGenome Browser

Monday, April 2, 2012

The data hub

No comments:

Post a Comment

we are at

collaborators