**[Full text of proposal](https://www.synapse.org/Portal/filehandle?ownerId=syn5659209&ownerType=ENTITY&xsrfToken=1EA1466FCA55F7EAE33833333900F1BC&fileName=Idea8.pdf&preview=false&wikiId=414654)** ###Anonymous Review 1 and Authors Response _ **Impact: ** The submitter proposes to generate a test set to on which a program to un-mix a cell population can be tested. They propose to generate a controlled sets of mixtures co-cultured immune-related cells. Assuming that indeed the composition of the cells will have impact on individual cells this indeed be a challenging test set._ _**Feasibility: ** The main hypothesis tested is that the behavior of the tumor as whole critically depends on the organization of different modules (including different types of cells). The submitter merely plans to test a known method (ICA). It is not clear why this is even appropriate method under the assumption of interactions between different cell types. There are no specific preliminary results._ _**Overall evaluation:** There is lot of problems with the proposal - lack of specific preliminary data/results is one._ **Response:** 1) Our goal is to describe immune landscape in the tumoral microenvironment and interactions between immune modules and their impact on patients prognosis and disease progress. We cannot claim that all aspects of tumor biology would depend on it, but currently it is believed that the long-term effect of any cancer treatment depends both on the state of the immune infiltrate in the tumor, as well as the general state of the immune system. 2) In addition to already published results on distinguishing components of tumoral microenvironment by ICA (Biton et al, 2014), we provide preliminary results proving ICA is able to separate mixed in bulk cancer signals in meaningful immune cell groups. In order to provide more accurate validation, we are working on simulated dataset and we would need to access data we requested in the challenge in order to tune deconvolution parameters and to have objective comparison of the methods. It is linked with the fact our model is data driven and it cannot be properly validated without accurate data. 3) In the same manner the reviewer describes our approach, most of the existing and successful methods for deconvoluting cell mixtures are based on existing statistical methods (such as linear regression which is known for more than 130 years). ICA application is connected with tuning a number of its parameters, and there is a number of competing protocols of ICA application which need to be assessed in an objective fashion. The essence of our application is to compare existing supervised (based on pre-existing definitions of cell type biomarkers) approaches with blind deconvolution-based methods which have never been applied before for the task of the project. The datasets we ask to generate will allow to judge which methodology is more promising. 4) Our assumption is that interactions between cell types might distort individual cell profiles but will remain relatively minor perturbations. Therefore, by analyzing the deviations of the components estimated in cell co-cultures with pure cell culture derived components, we will be able to estimate the nature of the cell-cell interactions and the magnitude of the cell interaction effect.   _The unfamiliarity of the submitter with some of recent mixture decomposition methods (e.g. work from Russell Schwartz group) is another._ 5) Thank you for pointing out to the work of this group which is very interesting, and we are indeed familiar with it. However, we should underline that despite similar mathematical formulation of the problem (deconvolution), it is based on a different data type (such as Copy number variation data) and addresses a completely different biological problem (clonal decomposition of a tumor). We now include the reference in the manuscript proposal.   _How the main hypothesis that the behavior of the tumor as whole critically depends on the organization of different modules (including different types of cells) will be tested and how this dataset will be helpful in achieving this goal is not properly described._ 6) Data generated in the challenge will be used for predicting outcome of treatment based on existing prospective studies. We do not aim to explicitly prove that this will be the main determinant of cancer treatment, however in the context of immune therapies the state of immune systems within the tumor is critical for the long-term effect of the cancer treatment: this point of view is shared between many visible researchers in the field.   _Still, I the goal of the proposal aside, the test set should be helpful to the community._

Created by Chloé-Agathe Azencott caz
###Anonymous Review 3 and Authors Response _ **Impact: ** medium._ _**Feasibility: ** medium._ _**Overall evaluation:** This proposal aims to collect benchmark data that would allow to build and evaluate models to decompose cell averaged data containing different cell types into the underlying mixture components and mixing weights. _ _The context of the proposal is cancer and its microenvironment. Overall, it is an important task to know what a tumor sample is composed of, and how e.g. cell type composition might change over the course of a treatment. To this end, the authors propose to generate simulated data as well as experimentally control the mixing process and generate bulk and single cell data from samples with known proportions._ _This is overall a nice idea but leaves open questions in a number of places:_ _1) The sequencing costs are likely over-estimated by a factor 5 the least. Depending on technology (e.g. drop-seq), hundreds of cells can be pooled on a single say hiSeq run. Unless the applicants can show that they need exceptional sensitivity and quantification of thousands of mRNAs that would necessitate unusually deep sequencing, the costs are way too high._ **Response:** We value the reviewer?s input in this matter, and we agree that the cost of sequencing (especially, single cell) can be significantly lower. We do not need exceptional sensitivity or depth coverage for our task but quantifying expression of all mRNAs is necessary as we do not make gene selection beforehand and the model is based on blind deconvolution, we would need 10-20k genes to be profiled.   _2) Given the large number of single cell studies appearing by the day now, the question is whether data such as proposed here still needs to be collected or is out there. If hundreds of cells are drawn iid in a given study, wouldn't all you need to do be to sequence the bulk mixed sample, as you already have the individual expression data and the cell type proportion in the single cell data?_ **Response:** It is true that the single cell sequencing is a great technology that will shed new light on the composition of the tumor microenvironment. However, today there is only one publicly available single cell transcriptomics data set representing tumoral microenvironment at its full complexity (Tirosh et al., 2016) without coupled bulk data. Single cell sequencing by itself does not provide a reliable estimate of cell type proportions (as one can see from the same Tirosh et al paper). In the nearest future due to the cost limitations, most of researcher groups will not be able generate considerable size (>100 cells) single-cell data for many cancer types. Even more importantly, in clinical applications, the state of the art remains the transcriptomics of bulk tumor and we can estimate that this tendency will last for at least the 5 following years. As a matter of fact, all prospective clinical studies contain only bulk data, and most from one singular location in the tumor sample. In our project, we expressed our need to have transcriptomics data coupled to FACS data in order to validate our blind deconvolution-based method. This method would be used to quantify immune Infiltration profile from already profiled bulk tumor samples, that remain the richest and the biggest source of data nowadays.   _3) ICA is a reasonable approach but doesn't really solve the problem in a more principled way than the other approaches mentioned; you still need to set or guesstimate the number of components. Possibly, (nonparametric) Bayesian approaches could provide a more formal way to address this._ **Response:** Thank you for the suggestion. Indeed, our lab uses stability-based criteria to define the optimal number of components. This method is more described in J. Himberg and A. Hyvärinen. Icasso: software for investigating the reliability of ICA estimates by clustering and visualization. In _Proc. 2003 IEEE Workshop on Neural Networks for Signal Processing (NNSP2003)_, pp. 259-268, Toulouse, France, 2003. In addition, our recent (unpublished) study suggests that determining number of components is not the critical part of the methodology because computing more components (in some reasonably wide range) does not ruin definition of most stable components, and even more importantly does not ruin its biological interpretation.   _4) The part on environment interactions needs some more detail -- do you expect e.g. nonlinear interactions based on different proportions of cell types? How would ICA be able to deal with this?_ **Response:** We agree with the reviewer that this part is not described in greater detail in the proposal. In the beginning, we will consider only linear interactions as the ICA is a linear method. With suggested data on co-culturing cell lines, we will be able estimate the deviation of mixture derived components from pure cell culture-based components, which will allow quantifying cell type-to-type interactions.
###Anonymous Review 2 and Authors Response _**Impact: ** Understanding the expression of specific cell type populations within complex tissues is an important problem. With the increasing use of single cell RNA-seq it is less clear how important computational deconvolution of bulk RNA-seq will be in the future relative to experimentally generating single cell RNA-seq data._ _**Feasibility: ** Feasibility was mostly addressed. For the synthetic data in 2.1 it is not clear the criteria for deciding to collect new data or just use public data. For 2.3 in quantifying cell type proportions with FACS it is not clear markers will be available for all cell subtypes or it would even be known all cell subtypes to profile._ _**Overall evaluation:** In the proposal ?Computational Deconvolution of Cell and Environment Specific Signals in Tumor Environment and Their Interactions from Complex Mixtures in Biological Samples? it is proposed to investigate computational approaches to deconvolution of gene expression. Three types of data are proposed to be generated for evaluation of such data:_ _(1) Synthetic data generated based on single cell transcriptomics_ _(2) Transcript profiling of controlled mixtures of co-cultured immune-related cells_ _(3) Transcript profiling of tumor profiles and cell type proportions quantified with FACS _ _The authors have developed an ICA based method for this problem which they will evaluate._ _Deconvolution methods in general have been difficult to evaluate because of a lack of gold standards. This work will create resources to be understand the performance of methods._ _It is not clear how important computational deconvolution of RNA-seq samples will be in the future with the continued increasing use of single cell-RNA seq data. For the third collection it is not clear the relative benefits of the FACS vs. single cell-RNA-seq for estimating cell type proportions._ _For collection aim 2.1 it is not clear the criteria to decide if new single RNA-seq experiments should be conducted or if public data should be used._ _For the problem of estimating potentially unknown cell types and not just proportion of each cell type then comparing methods could become difficult since the problem could become too unconstrained without a single metric that adequately summarizes the relative performance of methods._ **Response:** 1) As we addressed comment of Reviewer 1: It is true single cell sequencing is a great technology that will shed new light on the composition of the tumor microenvironment. However, right now there is only one publically available single cell transcriptomic data set (Tirosh et al.) and there is no bulk data coupled with it. Moreover, single cell technology itself at its present state does not allows estimating cell type proportions explicitly. Due to the cost limitations, many research groups will not be able, at least in the near future, generate considerable size (>100 cells) single-cell data for many cancer types. Even more importantly, in clinical practice, the state of the art remains the transcriptomics of bulk tumor and we can estimate that this tendency will last for at least 5 following years. As a matter of fact, all prospective clinical studies contain only bulk data, and most from one singular location in the tumor sample. In our project, we expressed our need to have transcriptomics data coupled to FACS data in order to validate our blind deconvolution-based method. This method would be used to quantify immune Infiltration profile from already profiled bulk tumor samples, that remain the richest and the biggest source of data nowadays. 2) The advantage of FACs is that it provides the cell proportion information as well as that it has a lower cost compared to the single-cell sequencing. Moreover, for generating correct synthetic data we need to estimate certain parameters ? i.e. correlation structure between presence of different cell types, which is not easy to find in existing public datasets. The use of single cell data is in defining specific signatures of various cell types in the context of the tumor. The advantage of FACs is the cell proportion information as well as lower cost than single-cell sequencing. For generating correct synthetic data we need to estimate certain parameters ? i.e. correlation structure between presence of different cell types, which is not easy to find in existing public datasets. If we understood correctly your remark, FACS has its limitation and moreover deconvolution may not be able to detect the minor subtypes, therefore we will focus on the major actors of immune infiltration at first. 3) We focus on the major components of TME that are relatively well known. We cannot exclude that we will identify cells in unusual/unknown state but our preliminary study shows some components have clear biological interpretation. 4) As it is well pointed, there is no gold standard dataset available right now for objective comparison of algorithms and models of TME deconvolution. The point of this proposal is to provide to us and to the community a dataset which will for some time serve a golden standard for benchmarking the competing methods. We believe our method being unsupervised (and not based on pre-existing definitions of molecular states for distinct cell types) would be different from other methods through its better reproducibility and the practicality of application.   _There were typos throughout the proposal._ 5) We are sorry for this. We hope many of the typos have been corrected now.

Idea 8: Computational deconvolution of cell- and environment-specific signals in tumor environment page is loading…