Monday, April 2, 2012

The data hub


Tabular format is deprecated. Click here to read about the new JSON format.


Updated 24/9/2013: start to use the JSON hub format
Updated 6/1/2013: file format indicator "sam" is replaced by "bam"
Updated 12/27/2012
Updated 10/2/2012
Posted on 4/2012


We are very happy to announce a new function, the "data hub".

The "data hub" concept is about organizing custom tracks in a tidy way and the ability of batch upload. User can prepare a data hub to host bedGraph or SAM tracks generated from her experiments, write up some metadata terms, and bring it up on the browser via one single mouse click.


The benefit

  1. Batch uploading (eliminating the labor of manual uploading one each time)
  2. Custom track information are made persistent (encoded in a hub descriptor file)
  3. Track files don't have to be on the same web server
  4. All types of tracks can be annotated by custom metadata
  5. With a datahub you can configure:
    1. Default track display mode (you can have some of the tracks turned on by default while keep the rest hidden)
    2. Default rendering style (this feature is still been worked on but you can control some key styles e.g. color and height)



Submitting a sample datahub


On the toolbox panel click button CustomTK, the Custom track submission panel will be displayed:


Click button "DataHub" to show the contents:




Enter the URL of a sample hub (or click the text as indicated, the URL for hg19 sample hub is http://vizhub.wustl.edu/hubSample/hg19/hub2.txt). Click "Load" button, tracks of the sample hub will be displayed:



As displayed in above screen shot, the hg19 sample hub contains many tracks, including 4 heatmap tracks (in blue), one bed track named "mattress", one long-range interaction track named "fractal globule", and one BAM track named "tempest". It also contains two metadata terms shown as two columns in the metadata color map. The heatmap tracks are annotated by the terms.

Go back to the "Custom track" control panel:


This time select button "Manage". The number in parenthesis tells number of custom tracks that have been registered in the management table, it increments when more tracks are added. The table would look like below:






Creating your own datahub

You need to create a hub descriptor file , place it on your web server, and that's all.

The file is a simple text file. It is line-oriented and tab-delimited. Each line defines a track or a metadata term. Each type of track might require different number of fields to describe itself.

As you've already seen from the custom track panel, five types of custom tracks are available to be included in your data hub:
  1. Quantitative data (bedGraph format, slow and non-restrictive) info↗
  2. Quantitative data (bigWig format, fast and restrictive) info↗
  3. Genomic features or annotation (BED) info↗
  4. Long-range genome interaction (BED-derived format) info↗
  5. Read alignment (BAM format) info↗
Lines starting with "#" are comments. Blank lines are allowed.

None-comment lines are separated into fields by tab and each file type might have different requirement on fields, and the order of the fields must be obeyed.

Note the track style feature has only limited options and is under development. To use it follow this simple example.
  1. Quantitative data (bigWig)
    1. 1st field must be bigWig
    2. 2nd field is URL to the bigwig file
    3. 3rd field is track name, must not contain tab or quotes, or other bizarre characters
    4. 4th field is mode string, must be either show or hide
    5. 5th field is metadata annotation, should be in the format of "name:attribute" pairs, colon is used to separate "name" and "attribute" in a pair, and multiple pairs will be separated by comma
    6. 6th field is optional custom style
  2. Quantitative data (bedGraph)
    1. 1st field must be bedGraph
    2. 2nd field is URL to the bedGraph file
    3. 3rd field is track name
    4. 4th field is mode string, must be either show or hide
    5. 5th field is metadata annotation same as previous
    6. 6th field is optional custom style
  3. Genomic annotation
    1. 1st field must be the word bed
    2. 2nd field is URL to the bed file
    3. 3rd field is track name
    4. 4th field is display mode, must be one of hide, density, thin, or full
    5. 5th field is metadata annotation, same as previous
    6. 6th field is optional custom style
  4. Long-range genome interaction
    1. 1st field must be the word longrange
    2. 2nd field is URL to the track file
    3. 3rd field is track name
    4. 4th field is display mode, must be one of hide, arc, trihm, thin, or full
    5. 5th field is metadata annotation, same as previous
    6. 6th field is optional custom style
  5. Read alignment
    1. 1st field must be word BAM
    2. fields 2 to 6 have same requirements as bed track
  6. metadata term:
    1. first field must be the word metadata
    2. second field is term name, terms identical as native metadata terms (such as "Sample") can be used
    3. third field is attributes of the term, multiple attributes will be separated by comma