Friday, September 30, 2011

User manual

An online user manual has been written and is now available, which took me quite a while and now Junchen and Brett have started proofreading it. Meantime I still put it up :)

So now each panel in the browser has got a question mark button:


Click it will take you to the corresponding chapter in the manual:


So in the manual page, left column is "Table of contents". Click the column with "<" sign to toggle its display:


And finally, the "Instruction" section at bottom of each control panel have been removed from the browser page.

Let us know any of your comments!

Thursday, September 15, 2011

Gene Plot for custom bigBed track

So now the Gene Plot can be applied to a custom bigBed track, where features inside that track will be used to query data and organize the plot. Let's start by uploading a custom bigBed file.


At navigation bar, go to Tracks > Custom track > bigBed, ready to submit the sample bigBed file:


Click submit, the track will be up there:


This sample track actually contains gene body information, as you can tell by comparing with "gene body" track.

There's a orange-colored button in the custom track table entry. Click it to launch Gene Plot panel. Alternatively, you can right click on the bigBed track image, choose "Gene Plot" option to get that:


Once launched, the Gene Plot panel will pop out from thin air:



Here you see the familiar Step 1,2,3 to make gene plot. But pay attention to the little message above Step 1. It complains that this bigBed track contains too many items that it will only process the first 4000. Well, the intent of applying Gene Plot on a bigBed track is to use ALL items it contains, but before we get powerful enough servers I don't want that to crash the browser service... So please abide this restriction for this current moment. And anyway, this is already a big increase compared with Gene Set View (<300 items).


Choose your favorite graph type and make the plot. The figure will show up in another panel on the top. Once you're done looking at it, click the button at top-right corner to dismiss the panels.

So this is the way to use it. If you have long list of target genes, you can use the bigBed generator to make the file, have it up on browser and run Gene Plot on it. I think it's cool, and hope you'll enjoy it.

However pitfall lurks here. The second plot type, the "spaghetti plot", can be dangerous if applied with Google Chart service with large item number. It will use so much client-side memory that might hang your little laptop (I've done that on my desktop with 8gb RAM). As a primary way of protection, this plot will reject attempts with over 500 items. However you can use R as rendering method to bypass this restriction.




All the same stuff apply to Gene Set View. Following is a quick survey:


Click large orange button to launch Gene Plot on this gene set:


Make a plot:



So that's it. By looking at the stack of platters in above screen shot, you might recognized this is a new style in my browser. I might be using such style more repeatedly in future. Let me know how you think.

Tuesday, September 13, 2011

Somewhat unified style on user interface

So I've been modifying the user interface again, to make them more unified into one style -- contents sharing same panel and paged by tabs. Following is comparison of old/current Gene Set View submission panels:


So the panel of available operations after submitting gene set has been styled in same way, you can see the how it used to look like in previous posts:



Also is the custom track upload panel:


Other places of change include "Genomi juxtaposition" panel, "Correlation" panel, and "Config & Misc" panel. Hope you like such style. I really think this is much more concise than before, and helps you to focus on data but not been distracted. Any comments are welcome.

Friday, September 9, 2011

Get data from Gene Plot

Now the data underlining Gene Plot can be displayed for you. After you run a Gene Plot, two buttons will appear on right of "Make gene plot" button:


Click either one a table holding the data will be created at page bottom. Following is data in text format:


Clicking second button will generate a nice-looking interactive table via the Google Chart service:


Of course, the "get data" function is available for the other 3 graph types.

Tuesday, September 6, 2011

Clustering analysis on Gene Set

The clustering analysis has been used a lot in gene expression microarray studies. Lots of techniques and routines are readily available. This methodology for exploring data structure is also useful in interpreting sequencing based datasets -- just check our Browser!

A lightweight clustering function has been implemented as an additional graph type in Gene Plot, making it another very useful function following up the Gene Set View.

I will take an example to fully demonstrate this feature. First let's run Gene Set View on a gene set.


I'm using glycolysis pathway in human, identified by its ID: "path:hsa00010". But don't hurry up submitting it. Let's change the default gene part by clicking the "change" button:


Options will be shown to determine which gene part will be displayed for genes. Click "custom region..." option and select a 5 KB region around transcription start site, where most interesting stuff lies around (batteries of regulatory elements, mysterious CpG methylation, crazy nucleosome positioning and... )

Then click "use pathway". The view will update in a short moment, into something like below:


In Step 2 of "Gene plot" panel, a forth glyph type with heatmap icon has been added for the clustering function. Click it to reveal its content:


You will notice that Step 3 graph rendering method option is gone. This is because the clustering and heatmap rendering will only be carried out by native code (on our server and on your web browser).

The "Number of data points" option is still there, allows you to control resolution of data. Following it is the clustering method, currently with two available choices: hierarchical and K-means. Each will have its own options ("distance metric" and "agglomeration" for hierarchical, and "number of clusters" for K-means), which are just household parameters to run clustering analysis.

Let's run hierarchical analysis first. Click button "Make gene plot":


This is how hierarchical clustering result looks like, of those sweet 5 KB regions of glycolysis pathway genes, on an MRE-Seq experiment done on CD4 cell sample... The right side is heatmap, where each row is one 5KB region (middle point is TSS, left side is upstream, no matter which strand the gene is on). Darker color means higher MRE signal, indicating higher likelihood of CpGs been unmethylated. Mouse over it for the tooltip:


And on the left is the dendrogram, in horizontal fashion. In addition to looking at the lines and branches, you can sort it out by clicking the branching points:


See that I clicked on a juncture, and that entire sub-tree turns red now. The genes composing this tree are also displayed in a list.

Above is brief intro on hierarchical clustering. This result is sensitive to choice of distance metric, and agglomeration method. Just play with it and you'll see.

Next is the K-means. Run it with same data and search for 3 clusters:


So... the result doesn't look *so interesting*, it might because the data profiles of glycolysis genes are just similar as each other, or I haven't tried good enough? Anyway, the clusters are denoted by buttons on the right side of heatmap, which is clickable for showing genes belonging to this cluster.

That's it. It will be great if you can let me know your opinions, just leave comment below. Enjoy the day!

Monday, September 5, 2011

A glimpse of what's coming

A *new* Gene Plot graph type is in shape now:


Will be available shortly. Enjoy Labor Day!

Thursday, September 1, 2011

Add / remove genes for Gene Set View

Now in the Gene Set View, you can edit existing gene set by adding/removing genes from it.

After submitting gene set, the available operations will be displayed in Gene Set View panel. Unfold the "modify gene set" panel by clicking the blue banner:



In the "Add / remove genes" section, you can add new genes (text area on left), and review/remove existing ones (list on right).



Adding new genes is simple, just enter list of genes or coordinates and press "Add" button, they will be displayed immediately, and will show up at bottom of the list on right.

As example below, I'm adding a genomic interval (chr2:172948000-172969000, the DLX gene cluster), and mef2c gene to existing list of cytochrome P450 genes:



After submission, the view looks like:



Removing genes is also simple. In the right-side list, each gene occupies one row, and is preceded by a button with cross mark. Click the button to mark the gene for removal. That row will also turn into gray background. Click the button again to unmark. As example below, I marked three genes preceding the DLX gene cluster interval:


Click update to actually remove them, and the view will update accordingly.

Gene Plot

The "Gene Plot" function is developed to serve as a downstream visualization function following Gene set view. 

Let's start by submitting a gene set. As an example, I will use KEGG glycolysis pathway genes by directly submitting the pathway ID:



VoilĂ , the pathway is on display:



Once the gene set is displayed, the control panel also updates, displaying available operations on this geneset. Operations to modify gene sets are folded into the top blue bar by default, and the "Gene plot" panel is shown:



In this panel the 4 steps to do a gene plot are marked out.

Step 1 is to choose a track with which data will be plotted. You can either select one from the drop-down menu (containing all currently displayed tracks), or go to the genome heatmap and select from option in context menu. As an example I selected a track that is red in color via the context menu:




Step 2 will be choosing a graph type. Currently 3 plotting styles are available. In the following I will show them one by one:


Above screen shot shows first graph type, which is average profile of all genes (or coordinates). If the item is gene, the data of which will be placed from 5' to 3', no matter which strand the gene is on. Particularly above plot uses MRE-Seq experiment on CD4 memory cells, and it shows that 5' end of the genes in glycolysis pathway have high unmethylation signal (on CpGs) in this sample.




Above shows the second plot style with same data. So instead of averaging signals, each gene now has one curve. It displays same pattern as we've seen in first plot. Maybe I should name it as "Spaghetti Plot"?



So the third plotting style, which I believe is most interesting, prepares data into 5 gene parts (3kb promoter, 5' utr, exons, introns, 3' utr), and plot an average profile for each part separately. So this only deals with genes. This plot shows that the 5' UTRs have really high unmethylation signal, and the signal of promoters shows drastic rise towards TSS. Please be noted that previous two graph types do not include information on promoters.



Following are some more details about the Gene Plot function.

There are graph type-specific options in the graph type selection panel. The most important one is "Number of summary points", available for all graph types. Track data of each gene will be summarized into same number of points and be used for plotting. So this number controls granularity or resolution of the plot. If you want to see more details, set this option to larger number. But beware that setting this to big number is likely to yield "URL too long" error in the spaghetti plot with Google Image Charts.


In using this function, you can choose from 3 rendering methods, as listed in Step 3. Above examples used Google Chart Tools, which generated very nice interactive graphs. When you mouse over a data point, a tooltip pops out displaying some more information.




The other methods (Google Image Charts, and server-side rendering using R software) generate plain PNG image. They are fast, and is easier for user to save the image (by right clicking on it, while the Google Chart Tools doesn't allow right click). You are recommended to use Google's rendering services, and the server-side rendering by R should serve as fallback when Google's service is unavailable.

Following is the same spaghetti plot by the other two rendering methods:

Via Google Image Charts:


By R:



Future work:

1. add one additional graph type, which is the kind you've seen so many times in ChIP-Seq papers (following graph is from PMID: 21765417). You're welcome to suggest other graph types.



2. apply Gene plot on custom bigBed tracks