Thursday, September 1, 2011

Gene Plot

The "Gene Plot" function is developed to serve as a downstream visualization function following Gene set view. 

Let's start by submitting a gene set. As an example, I will use KEGG glycolysis pathway genes by directly submitting the pathway ID:

Voilà, the pathway is on display:

Once the gene set is displayed, the control panel also updates, displaying available operations on this geneset. Operations to modify gene sets are folded into the top blue bar by default, and the "Gene plot" panel is shown:

In this panel the 4 steps to do a gene plot are marked out.

Step 1 is to choose a track with which data will be plotted. You can either select one from the drop-down menu (containing all currently displayed tracks), or go to the genome heatmap and select from option in context menu. As an example I selected a track that is red in color via the context menu:

Step 2 will be choosing a graph type. Currently 3 plotting styles are available. In the following I will show them one by one:

Above screen shot shows first graph type, which is average profile of all genes (or coordinates). If the item is gene, the data of which will be placed from 5' to 3', no matter which strand the gene is on. Particularly above plot uses MRE-Seq experiment on CD4 memory cells, and it shows that 5' end of the genes in glycolysis pathway have high unmethylation signal (on CpGs) in this sample.

Above shows the second plot style with same data. So instead of averaging signals, each gene now has one curve. It displays same pattern as we've seen in first plot. Maybe I should name it as "Spaghetti Plot"?

So the third plotting style, which I believe is most interesting, prepares data into 5 gene parts (3kb promoter, 5' utr, exons, introns, 3' utr), and plot an average profile for each part separately. So this only deals with genes. This plot shows that the 5' UTRs have really high unmethylation signal, and the signal of promoters shows drastic rise towards TSS. Please be noted that previous two graph types do not include information on promoters.

Following are some more details about the Gene Plot function.

There are graph type-specific options in the graph type selection panel. The most important one is "Number of summary points", available for all graph types. Track data of each gene will be summarized into same number of points and be used for plotting. So this number controls granularity or resolution of the plot. If you want to see more details, set this option to larger number. But beware that setting this to big number is likely to yield "URL too long" error in the spaghetti plot with Google Image Charts.

In using this function, you can choose from 3 rendering methods, as listed in Step 3. Above examples used Google Chart Tools, which generated very nice interactive graphs. When you mouse over a data point, a tooltip pops out displaying some more information.

The other methods (Google Image Charts, and server-side rendering using R software) generate plain PNG image. They are fast, and is easier for user to save the image (by right clicking on it, while the Google Chart Tools doesn't allow right click). You are recommended to use Google's rendering services, and the server-side rendering by R should serve as fallback when Google's service is unavailable.

Following is the same spaghetti plot by the other two rendering methods:

Via Google Image Charts:

By R:

Future work:

1. add one additional graph type, which is the kind you've seen so many times in ChIP-Seq papers (following graph is from PMID: 21765417). You're welcome to suggest other graph types.

2. apply Gene plot on custom bigBed tracks