Go back to the Job List

** Please mention Bio-Job.org when replying to this advertisement **

Research Guidance and Tutoring in Bioinformatics and Genomics for visually impaired student

Posted by: UALR

Posted date: Feb-10-2017

Location: Little Rock

I am looking for informal guidance to speed up my bioinformatics dissertation focusing on analyzing genomic data. 


I am looking for an informal bioinformatics research adviser, trainer and tutor.  I urgently need a publication to remain in the program.  But since I am almost blind I won't be able to submit one on time unless I can get some help.  My vocational rehabilitation agency is giving me funds to pay for assistants and tutors to help me compensating for the shortcomings caused by my visual impairment.


My research involves yeast because my adviser has a yeast lab.  We are interested in understanding how and why caloric restriction extends lifespan.  We are specifically interested in changes in membrane composition because it is a marker for aging.  We would like to improve our understanding of the mechanisms underplaying and driving the aging process so that we can eventually reverse it. 


I am especially looking for help in learning the most relevant Bioconductor packages for analyzing time series microarray, RNA Seq. Chip-Seq. Proteome and any kind of epigenetic, metabolomic and transcription-factor-binding-site data for constructing highly predictive causality-inferring co-expression, regulatory and protein binding networks.  My aim is to use computational methods for predicting a novel lifespan extending intervention for yeast, which I hope that my adviser can biologically validate in his yeast lab.  I have stayed up all night looking for good tutorials, datasets and review articles, which I thought might be good for learning and developing these skills and techniques, but unfortunately my Firefox browser crashed.  This has caused me to have lost all my many open browser windows, in which I had opened this kind of learning material, i.e. tutorials, review articles and sample datasets.  If you’d like a list of articles, which employ the skills and techniques I’d like to gain, I’d gladly repeat this literature search again.  Maybe after having posted this text I will start working on a second post listing the references to the resources I thought could be helpful for us and also make a note for each publication explaining why I think it is useful.  But feel free to refer to and use any information, which has helped you to learn all of this.  For example, I have never seen a regulatory network for transcription-factors and therefore, I am still not sure whether the colorful circular figure 6 of one particular article is actually considered a transcription-factor-binding regulatory network and how to read it.   


I started my dissertation by plotting lots of time series curves from microarray data from yeast.  So far I only learned how to analyze microarray data when the Affymetrix Yeast Genome 2.0 Array chip was used.  But I'd like to learn how to analyze other chip data.  I was hoping after having plotted enough time series graphs from different GEO microarray datasets to see some aging related trends in gene expression pattern at least for some genes.  But unfortunately, I was disappointed because no clear tend became visible for any gene.  So far my time series plots look so random that one can assume that there is no relationship between the temporal gene expression pattern and gene function.  There does not seem to be any difference in the time series plots similarities between genes belong to the same GO-term and the remaining genes.  But if this were the case then co-expression networks would not work because one only connect genes with an edge when their plots are highly correlated.  If we would use this approach given my data almost none of the genes belonging to the GO term would have been grouped together.  Therefore, before trusting co-expression networks I would like to establish, show and prove that there is indeed a relationship between time series curves and gene function.  My analysis may be flawed because I did not use any Bioconductor package to analyze my microarray data.  If you could teach me how to do that then maybe that would make my data look better because I did not exclude all not differentially expressed genes from the analysis because I was not aware of this requirement when I did this work. 


I also would like help in finding a good way to rank the similarities between my different time series plots.  When I used the regular Pearson Correlation in R to group together genes, whose time series curves were correlated by more than 0.85 I found almost no enrichment indicating that those genes, which I had grouped together based on this criterion, where not at all functionally related.  This made me even more skeptical whether co-expression networks could really tell us what we expect, i.e. which genes are working together.  But since so many people are publishing co-expression networks I think something may be wrong with my analysis that I got such counterintuitive results.


According to my understanding the trajectory of the time series plots is determined by the way each particular gene is regulated.  Therefore, I am much more in favor of regulatory networks.  If nobody has done this already I would like to show that genes with more similar promoter regulatory regions have higher correlated time series curves.  That at least sounds logical to me.  But I need help to figure out whether somebody has already done this kind of research.


I need help constructing transcription factor based regulatory networks.  I was hoping to find all components plus instructions to build them all night but unfortunately I could not find enough material.  I would like to learn how regulatory networks for transcription factor binding sites can be constructed.


·       I have read that many of the regulatory and co-expression networks have been constructed based on chip-chip and chip-seq.  Therefore, I would like somebody to teach me how to do that. 


Last week I plotted the time series curves for each of the 5,116 genes on the yeast 2 chips.  In those datasets the transcriptome was measured in 10 to 30 minutes intervals.  Finally I could see something on these plots that I know is true.  Within the first 100 minutes of the cell cycle about half of the genes had either a big peak or a deep valley.  I need help quantifying the exact percentages of these motifs.  The cell cycle has driver and passenger genes.  The driver genes drive the cell cycle forwards across its checkpoints.  The passenger gene time series curves follow the expression pattern of the driver genes.  That is why I would like to use these cell cycle datasets to define functional / regulatory units.  Such units consist of at least one driver gene and its entire passenger gene with similar pattern.  I need help in identifying genes that can be grouped together based on their time series curves.


I was supervised to see the reality of the cell cycle to be reflected in the time series plots for cell cycle data because I could not find definite reality resembling in all those time series plots where the time points of measurements were more than half of a yeast cell cycle apart.  The yeast can divide in 2-3 hours.  Some of the major cell cycle genes, such as RNR1 change by more than 128 fold within the period of one cell cycle.  Therefore, even if there is a linear trend in an absolute reference frame (y=0), we might never see it because it is just chance whether we measure the expression of such cyclical genes when they have reached their maximum, their minimum or any level in between.


We have not been able to make much progress in understanding and manipulating the aging process.  I was wondering for a long time why I could not find and aging related gene expression pattern in my time series plots.  But since I could not find anything despite knowing that there must e something causes aging, I thought maybe we are looking for the wrong thing.  We are primarily looking for trends in the affecting the absolute amount of transcription or translation.  But maybe aging is not caused by such kind of changes in an absolute reference frame.  Maybe aging is caused by relative temporal expression changes between groups of genes with respect to one another.


The gene expression pattern for many human genes might also be cyclical because of our circadian rhythm.  Therefore it could be that the cyclical changes could totally overshadow linear changes.  For our life processes to take place properly the expression of many sub-groups of genes must be temporally tightly controlled and regulated.  For example, for sleep to occur, the eyes must be close.  When I was younger I felt that there was no time gap from the time a fell asleep until I woke up in the next morning.  But now I can tell that lots of time elapsed in between.  This could be caused by a gradually increasing deregulation of gene expression pattern, which must be synchronized.  Therefore, this timely deregulation of these initially totally synchronized processes could serve as a marker of aging.  If their synchronization is completely lost then the life processes, which depend on this synchronization, can no longer take place; thus, causing death.   If this is indeed the case then aging could be reversed by restoring synchronicity.  Therefore, I would like to find out whether the initial synchronicity of cyclical co-expression is lost over time. 


I feel that the cell cycle data is ideal in defining initial groups of co-expression.  Out of the maybe 12-16 time points of measurements for my cell cycle data, maybe I should look, which genes behave like a group in the first 3 time points.  The cell cycle regulating genes will be the driver genes.  But I found just by visual inspection that many proteins of unknown functions related to lipids follow their expression pattern but with smaller variance; thus, having a much smaller range.  I refer to these genes as passenger genes.  I’d like to check whether the synchronicity of the initially very highly synchronized gene expression pattern has declined for the last 3 time points.  If this is the case, then I’d like to check whether the synchronicity of expression is higher in the first than in the last replication.  But for that we’d need new data since I am not aware that such kind of data already exists.


Can values expression between different genes be directly compared on an absolute scale for microarray and RNA Seq. data?  I mean if the measured intensity of gene A is twice as high as for gene B, can I conclude that the expression of gene A is double of gene B?  If there is indeed a relationship between time series curve and gene function then we can only find it if we can properly distinguish between the time series curves with an high enough resolution, which allows to distinguish between their functions.  But from visual inspecting it looks like that the time series curves of genes of the same molecular function or pathway, which must work together, are not more correlated to one another than they are to all the other genes.  I am looking for help to verify this programmatically.  But if this is the case then trying to cluster together genes of the same function within a clique in a network must inevitably fail because most of the genes despite belonging to the same GO term could not be clustered together based in the similarity of the time series curves.


Is it actually generally assumed that the time series curves for genes involved in the same function are higher correlated to each other than they are to the remaining genes?  Why is it actually that people appear to assume that in order to change the rate of a function many if not all of its genes must be change by the same factor?  If I were evolution I would find it much easier to only change the rate of one rate-limiting protein / enzyme than having to regulate all other enzymes of the same pathway.  I assume that there are certain pathways / functions / processes, which rate can be controlled by changing the expression of one or only very few of its rate-limiting enzymes.  But then there appear to be other pathways / functions / processes for which almost all of their enzymes must be changed by about the same factor to affect the overall rate of this particular pathway / function /process.  But we don’t seem to know for most of them whether they can be regulated by only changing the expression of a very few or almost all of their rate limiting enzymes to up-or down-regulate this particular pathway / function / process.  But if this is the case then the concept on which GO term based gene enrichment is based is flawed because the speed of some pathways can be changed by changing the expression of only one of its genes verses many at the same time.  I think it is wrong to conclude that if a pathway, for which more genes are differentially expressed, is more affected than another one, for which must fewer genes are differentially expressed but by a much larger factor.  Who determines which genes will be grouped together into one particular GO-term and on which criteria are such kind of decision based? Every year the definitions for some GO terms are changed.  This means that were wrong.  Then, most likely, many GO terms, which we today believe are functionally or regulatory related, may not be.  How, for example, can it be that genes of unknown function are assigned to a particular GO term?  How is the entity of GO term defined?  What criteria must a group of genes satisfy for being considered to form a particular GO-term?  I think we need to compare expression changes for all genes belonging to a particular GO term and the resulting overall metabolic change of the entire GO-term to determine for each GO-term individually for how many of its members must the expression must change to achieve a particular overall change.  But for such kind of analysis we’d need metabolomic, transcriptomic and proteomic data.  If you can teach me how to do this and where to find such data I’d be interested in conducting such kind of analysis because I expect it to fundamentally change our current understanding of the concept of GO-term enrichment analysis.


I noticed that I have already plotted time series curves for almost one third of the time series datasets I could find for yeast.  Unfortunately, I have already used up all of the datasets, which have at least ten time points.  Therefore, for my dissertation, we might want to consider plotting and analyzing time series curves from different species hoping to find that homologous genes are affected in a similar way in order to generalize our conclusions.  I found that for every dataset the correlation of its expression time series plots is different.  I further found that time series curves for different conditions are most similar to one another if they came from the same experiments and are based on the same microarray chip.  But how can one then effectively compare expression patterns from different experiments if a high time series curve similarity most likely implies that their measurements were obtained in the same experiment?


I have plotted cell cycle time series curves for at least 11 different conditions based on 2 different microarray experiments.  Their time points were taken in 10-30 minute intervals.  I need help to determine whether the expression time series plots for their GO terms are more similar to one another than they are for my other 9 conditions for which the intervals between measuring their expression exceed the length of one yeast cell cycle / replication.  From visual inspection this seems to be the case but I need help in programmatically generating a reproducible percentage.  If my preliminary visual inspections are true then it can be concluded that co-expression networks might be suited for grouping together genes of similar function if their expression measurement intervals is less than 1/4th of the cell cycle but not otherwise because then it will be due to chance whether the expression of a particular gene, for which its cyclical expression component for overshadows any possibly underlying linear time and age dependent trend.  Even if a linear age-associated expression trend undoubtedly exist we might never be able to detect it for any genes with a much stronger cyclical expression component as long as our time points are more than 1/4th of a cell cycle, i.e. approximately 30 minutes, apart because we have no control when in its cyclical expression pattern we happen to measure the expression of such a primarily cyclically expressed genes.  Considering this, I anticipate that it can be concluded that that functional co-expression networks could successfully cluster functionally related genes together if the interval between expression measurements are no more than 1/4th of a cell cycle apart.  This would explain why the changes in the cell cycle is properly reflected in my time series plots for all conditions, for which the interval between measuring the expression were was no longer than 1/4th of the yeast cell cycle, i.e. not longer than 30 minutes, whereas no trend could be identified for any conditions, for which the time points ware more than 30 minutes, i.e. more than 1/4th of the cell cycle apart.


I assume that the synchronicity of the cell cycle for different individual yeast cells is getting lost over time.  How would this affect our measurements and the conclusions we can draw from them?  For our microarray experiments would we have enough yeast cells so that their average cyclical expression pattern would average out with increasing loss of cell cycle synchronicity of individual cells with advancing number of replications?  But if their average expression pattern over time remains the same then those kinds of genes are at risk to appear as not differentially expressed at all. 


Why is it actually important to remove genes not considered as differentially expressed from microarray analysis?  Why can’t we keep them but draw and define them as a flat line parallel to the X axis so that we don’t have to deal with N/A values when comparing the expression of those genes with the same gene but from a different dataset?  Are genes that differ in their expression between all their time points of less than 1.5 considered as not differentially expressed?  Don’t we have to distinguish between housekeeping genes, who are expressed at a very constant not changing level all the time, and genes with so low expression values that they can be considered as being turned off, i.e., not at all expressed?  Aren’t housekeeping genes not at risk for being mistakenly removed from further analysis because they appear as being not differentially expressed (i.e. at very high but constant levels) even though their expression is essential for survival?


Can networks be considered at very complex algorithms, which happen to for, a network, but whose meanings and implications cannot be evaluated in advance until one has compared the implications of each network with our experimentally obtained observation?  Are we not simply taking the networks, which implications best reflect our experimentally obtained results because it is reasonable to assume that the network, which is most consistent with our experimentally obtained observations, will be most suited to predict conditions for which we don’t yet have experimentally obtained data?  But if this is the case one could not claim that a certain network constructing algorithm is necessarily always better than another because it may be more consistent with our experimentally obtained data under some conditions but not yet at all under other conditions?  But if this is the case would it then be correct to conclude that the algorithm for constructing a particular network might be totally unrelated from the regulatory interplay between many transcription factors, but that – almost by chance or coincident – the conclusion obtained for a network constructed in a particular way just happen to be most consistent with our experimental observations?  But can it then be concluded that the network, which is most consistent with our experimentally obtained observations may not be better in modeling the truly occurring interplay between the many aspects of transcriptional, post-transcription, translational, post-translational epigenetic or non-coding RNA based components affecting expression than alternative networks that are less consistent with our experimentally obtained results?  For example, when making a co-expression or regulatory network we have no component reflecting the effect that the length of poly-A-tail has on the overall protein abundance or reflecting the regulation of gene expression by non-coding RNAs even if the prediction from such kind of a network is totally consistent with our experimentally obtained observations in one particular situation whereas it might no longer be consistent with what we see in reality if the situation and conditions have changed?  But would this imply that this entire math and all the steps needed to determine the position and connectivity of any particular gene in a network just happens to group together genes, which are more similar to one another than to the rest, in some respects but not in others?  For example it can be expected that regulatory networks will affect the expression of many even unrelated seeming aspects, functions, pathways, processes affecting only the very few but rate-limiting enzymes of many only distantly related aspects / functions / processes at the same time consistent with the very systemic acting effects of some master key-regulatory transcription factors, such as Yap6 or Tor1 (under YEPD (normal media)) whereas they may totally fail if the condition has changed, e.g. to Caloric Restricting (CR) because the target genes of a particular transcription factor may have been turned off (i.e. may have become totally unresponsive to a particular transcription factor) since their expression is primarily regulated by nutrient sensing whereas genes regulated by another transcription factor have been turned on in response to starvation


I’d like to explore such kind of questions in my dissertation but I need help in understanding and analyzing the data because my low vision is preventing me from developing these skills by having to rely on a too high and hence too time consuming component based on trial and error and starting over fast enough to graduate before my funding runs out.  I’d especially need help in writing and defending my dissertation proposal and making a quick discovery that can be published in order to keep making satisfactory progress.  I am very open to any suggestions or additions of those, who will be working me, because I have learned that many of my tutors will gladly share with me anything they already know well but that it is much harder for them to acquire a new skill or technique only because I need to master it.


If we can figure out how to construct regulatory and co-expression networks based on time-series data of which I have drawn plenty of time series plots solely based on microarray intensities obtained with the Yeast 2 chip from Affymetrix then we would finally have a new dimension we could add to the analysis, results and conclusions by the original publishers of the GEO microarray – and hopefully soon RNA Seq. datasets – available at NCBI. 


It might take me some time and require some patience to teach me a new way of analyzing different kinds of data but once I have mastered it I can apply it in many situations over and over again.  So far, most of my past tutors have taught me most of the analytical techniques I am confident in so far using R but I am also open to alternatives, such as e.g. Cytoscape, MATLAB, Python or Galaxy if that would make it easier because I need to use results quickly but nobody is telling me which techniques and programs I need to use to obtain them as long as they are correct and I can understand them.


The training can take place remotely via TeamViewer and Skype.  I am flexible regarding my availability since I have already completed all y course work and everyone is now expecting me to only focus on my dissertation.  If you would be willing to teach me some – but not all – aspects and components needed for completing a dissertation that is fine too because most of the time I had different tutors, who had their strength and expertise in different areas, since it would probably be too much for a single person to teach me everything.


Although there might be an implicit understanding not to include personal contact information I would like to ask you to email me directly if you are interested in working with me because time is running against me, especially since my text-to-speech software, on which I am depending to access electronic information because I am almost blind, has problems reading out allowed all parts from these often only very short conversations / replies here on this website since I am having a hard time finding them.  For example, due to my eye disease it is very hard for me to distinguish between back, blue and green.  But this kind of distinction is almost necessary to see and access any replies to my posts.  I am having a hard time seeing, which words of the replies are actually clickable hyperlinks.  I have configured my text to speech software to read very well and efficiently in my Gmail and Skype environment.  I can easily distinguish visually between read and unread emails because I have enhanced the settings for font size and contrast to better help me in making this distinction.  But here I am having a very hard time reading and following every reply because it has a green background, which lowers the contrast to the foreground font to such an extent that I cannot read it anymore unless I copy-pasted everyone’s reply into Word and enlarge its font.  Moreover, due to my low vision, it is much easier for me to query emails for terms I’ll remember from reading them, especially when I need to review people’s replies to my questions.  I have not been able to find a way for effectively using the “Control F” function for quickly jumping to keywords I remember since it is very hard for me to find a particular section in a long text solely by using the magnification function of Zoomtext.  If you’d like you are welcome to reply to me here on this website but please email me a copy of your reply since I may not otherwise see it for a long time.  When emailing I can enlarge the font as much as I want but I have not been able to figure out how to enlarge it here on this website.  The light blue clickable functions here on this website are very challenging for me to find because the contrast between the light blue foreground font color and the white background color is already too low for me for using these functions efficiently. 


Also when you are working with me please don’t send me long written explanations because processing written language is much harder for me than dealing with spoken language.  When coding in R I’d prefer to type the code since that helps me to better remember than when only watching somebody else typing but I’d like you to be remotely logged into my computer with Team Viewer so that you can explain me how to best use new functions and packages and to help me to faster troubleshoot error messages since many of them are the result of me having made a typo because I especially have lots of trouble to clearly visually discern between {,[,( for example because all three of them look to me like a vertical pipe since their horizontal dimension, without which they cannot be distinguished from one another, is too small for me to see because of the involuntary movement of my eyes, which causes the horizontal dimension of any kind of written language to look much more blurry and fuzzy to me than the vertical dimension.  The problem is that the harder I try to read with my eyes the more their involuntary movement increases and the harder the reading process will become for me over time.  That is why I’d prefer spoken language.


With regards to my R script for performing GO term enrichment I would need an additional option for redefining my reference genome depending on the genes on the microarray chip and the availability of expression data for them because I recently noticed that the gene set, which serves as standard to compare with any changes in the relative frequency of genes belonging to the same GO-term, which must contain a higher proportion of all the genes given a particular gene set (i.e. genetic background) than could be assumed due to chance alone.  Most of my microarrays only have a little more than 5,000 but much less than 6,000 genes whereas my genetic background consists of all the yeast genes, which is 7,993, and hence the threshold by which the observed proportion of a particular group of genes coming a particular GO-term must exceed the proportion, which would be considered to be due to chance alone, must even be higher than it should be before it has satisfied the criteria for being considered as enriched. 


If you have any other suggestions or ideas what other aspects to explore please let me know.  I am also looking for help in describing, explaining and presenting the findings I hope to make according to what I have outlined above in a very exciting, convincing, professional manner, which can clearly and easily be understood and replicated; thus, hoping to be able to meet the requirements for a peer reviewed manuscript to get accepted for publication soon because I desperately need a publication in genomics, of which I’ll be the first author, in order to remain in good academic standing and for having a much more realistic chance for my funding to get extended beyond June 30th 2017, when my current funding is scheduled to end. 


Literature references of what I hope to learn understand and apply


An example for the kind of causality-inferring networks, for which I hope I can find somebody soon, who can teach me how to construct them and what kind of reasonable conclusions and inferences could be based on such kind of networks, is shown in figure 6 and, which is further explained in the supplementary material of the publication below, based on which I started working on this project in August 2016 by generating lots of time series plots, is listed below:


 Janssens, G. E., Meinema, A. C., González, J., Wolters, J. C., Schmidt, A., Guryev, V., … Heinemann, M. (2015). Protein biogenesis machinery is a driver of replicative aging in yeast. eLife4, e08527. http://doi.org/10.7554/eLife.08527


I have plotted time series curves for any gene under any condition covered by the 2 yeast microarray cell cycle datasets, which I have listed below:


1.    Series GSE8799 Global Control of cell cycle transcription by coupled CDK and network oscillators https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE8799


2.    Series GSE49650            Checkpoints Couple Transcription Network Oscillator Dynamics to Cell-Cycle Progression https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE 49650


The time series plots, which are based on the raw microarray datasets (i.e. Affymetrix .CEL files), gave me some hope that there may be a relationship between the trajectory of the time series plots and its gene function because these plots show very clearly that about 1/4th of the genes have a steep peak within the first 100 minutes of the cell cycle whereas another 1/4th of the genes has a deep valley, i.e. dip within exactly the same time interval, i.e. first half of the cell cycle, whereas the expression of the remaining half of the genes does not seem to be much influenced by the cyclical nature of expression of the genes, which are driving the cell cycle forward, such as RNA1, i.e. highly cyclically expressed cell cycle driver genes, and the surprisingly unexpected many passenger genes, whose expression appears to be regulated by the cell cycle driver genes, whose expression levels can change by a factor greater than 128, in case of the RNA1 gene, within the time interval of a single yeast replication lasting less than 3 hours.  Many of these cell cycle passenger genes are genes of unknown functions some of which are most likely affecting lipids in some way.


Figure 2 of the Nature publication with the title: Global control of cell-cycle transcription by coupled CDK and network oscillators Nature 453, 944-947 (12 June 2008) | doi:10.1038/nature06955; Received 22 November 2007; Accepted 31 March 2008; Published online 7 May 2008 http://www.nature.com/nature/journal/v453/n7197/full / nature06955.html#top shows the time series plots for the topmost prominent 6 genes, whose expression changes most rapidly throughout the cell cycle following a highly periodic cyclical expression pattern.  Its period is exactly the time needed by the east for a single replication, i.e. once cell cycle, lasting less than 3 hours.  To look at figure 2, please click here: http://www.nature.com/nature/journal/v453/n7197/fig_ ta b/nature06955_F2.html.  The title of this figure is FIGURE 2. Transcription dynamics of established cyclin–CDK-regulated genes  Figure 2 is shown below:



The caption of this figure reads: 

Absolute transcript levels (dChip-normalized Affymetrix intensity units/1,000) are shown for the genes CLN2 (a) and RNR1 (b), which are regulated by SBF and MBF, respectively; the Ace2/Swi5-regulated genes SIC1 (c) and NIS1 (d); and the Clb2-cluster genes CDC20 (e) and ACE2 (f). Solid lines, wild-type cells; dashed lines, cyclin-mutant cells.


The solid back line shows the cyclical expression pattern for the 6 most prominent cell cycle driver genes and the dotted shows that the highly cyclical expression is compromised in the CDK4 knockout.  The total length of time, i.e. which is equal to the length of the X-axis, is less than 3 hours and is divided into the duration of each of the 4 subsequently occurring phases of the cell cycle, which take place in the intervals separating the cell cycle checkpoints. 


Although my time series plots clearly resemble the highly cyclical expression nature of the 6 genes, which are most commonly associated with the cell cycle, they don’t show that the cyclical temporal expression pattern of these six 6 cell cycle driver genes is starting to repeat within the short time window depicted by the overall length of the X axis.  Maybe the authors had even more time points, based on which they could generate these 6 time series plots, at which expression is measured than for which provided expression data in the corresponding GEO microarray dataset. 


The time series plots for the 2 biological replicates of the WT from the cell cycle dataset, for which I generated the time series plots first, follows the most extreme, distinct, regulated, controlled and most varying cyclical expression pattern.  Therefore, I am planning to compare the warp-based correlations between the time series plots, which are part of the same GO-term, with all other remaining time series plots.  I really hope that at least the average of the warp-based correlation between the plots belonging to the same GO-term is significantly higher than the warp-based correlation between all genes, which are part of the same GO-term considered as a group and all other remaining genes.  I only learned about using the time warp function for comparing, ranking and quantifying the similarities between their time series plots today.  I am not yet certain whether this time warp-based correlation can be calculated for many time series plots simultaneously or whether this function can only consider 2 time series plots at a time.  I know it exists in form of an R function and I have located its documentation but I cannot yet predict how much trial and error I need before I can apply it in a manner, which would make me feel confident about having used this function appropriately and that the conclusions, which can be drawn from using this time-wrapping R function, are really

Job Title Research Guidance and Tutoring in Bioinformatics and Genomics for visually impaired student
Post Details
Email TFHahn(at)UALR.edu
Employer's Website http://
Job Discipline Job Discipline -> Bioinformatics
Job Classification Job Classification -> Contract Job
Job Type Job Type -> Full-time
Location Little Rock
Key Words Bioconductor, time series, microarray, RNA Seq, Transcription Factor Regulatory Networks
Start Date 08/15/2016
Deadline Aug-15-2016