Here are my suggestions for post-analysis step of the manuscript. They are mostly about analyzing the dynamic behavior of gene expression across different time points and across different virus types.
1. We can find the genes that are selected most frequently among different feature selection methods that showed good predictive performance. If a gene is selected by most of the feature selection methods than that gene is probably important.
2. We can repeat feature selection using different methods on individual time points for individual experiments and then identify the genes that are selected most often in each time point separately.
3. We can find the genes that are commonly selected across different time points.
4. We can analyze which genes are selected in Time 0 and Time 24 phases and compare. This way we can identify genes that are not selected in Time 0 but in Time 24 and vice versa. This type of comparison will enable us to understand the reason for having reduced accuracy when we only use Time=24 features as opposed to TIme=0+Time=24 features. Can it be because at the early periods after being exposed to virus individuals can have different responses but than those responses saturate to a common gene network in later time points? We can repeat this type of difference comparison for other time points as well (e.g. the Time 0 vs Time 36, Time 0 vs Time 72 etc).
5. We can combine datasets for the same virus type (since we have 4 virus types but seven experiments) and repeat some of the feature selection experiments above. This way we can understand if there are any differences caused by working on a different experiment although the virus type is the same. In the next step, we can combine all the datasets and repeat feature selection to identify the genes that are common for all the 4 viruses studied.
6. We can run gene network inference methods such as GENIE3 and build gene networks for individual time points. GENIE3 also has the option to build gene networks by taking multiple time points together.
7. We can build ensembles using the predictions submitted by participants and combine them using different techniques (majority voting, weighted score averaging, rank fusion etc).
Created by Zafer Aydin zaferaydin Another suggestion for post analysis focusing on the permutation analysis performed in order to compute the score?s p-value relative to random prediction.
1. I would like to see the cumulative density functions (CDFs) generated by the permutation analysis for each different score.
2. I would like to see where each contribution falls on the CDFs.