**[Full text of proposal](https://www.synapse.org/Portal/filehandle?ownerId=syn5659209&ownerType=ENTITY&xsrfToken=1EA1466FCA55F7EAE33833333900F1BC&fileName=Idea4.pdf&preview=false&wikiId=414654)** The authors thank the reviewers for their constructive comments, and the time spent to review this proposal. ###Anonymous Review 1 and Authors Response _ **Impact: ** This proposal proposes to use computational phenotyping based on language samples and cognitive assessments to pre-screen the population for abnormal FMR1 variations. The studied problem is important for public health._ _**Feasibility: ** There are a number of concerns with the proposed outline and statistical analysis, which are detailed below, and which lead one to doubt whether or not the project can be successfully executed. _ _From what I understand, the authors have a sample of 100 women/mothers carring the mutation, and a sample of 100 mothers without the mutation, as mentioned in Section 4.1. However, the following questions arise:_ _1. It is unclear what Table 1 is showing; is the purpose to show that the two samples (FX carriers/controls) are not significantly different in terms of age and education? This needs to be clarified in the text._ **Response:** The FX premutation carriers and comparison group were matched on age and education. We have clarified this point in page 4.   _2. Also, the t-test is not appropriate on ordinal data, and so should not be used on the "education" data in Table 1. The authors should consider using another test, such as the Mann-Whitney U test (also known as the Mann-Whitney-Wilcoxon test); implementations of this test are widely available (R, Python, SAS, MATLAB, etc)._ **Response:** Agreed. We have performed the Mann-Whitney-Wilcoxon test in R. The samples used in the preliminary analysis were matched on education.   _3. What is then the "required data" in Section 2.1? Do the authors plan to collect a new data set? How large will this data set be (since the incidence rates of 1/151 for women and 1/468 for men are so low)?_ **Response:** We are planning to collect data from 20000 participants.   _4. One very unclear thing about the preliminary results is how many samples the authors currently have. They only show preliminary results of the classification on 200 samples. Will they use only these samples to build the classifier to screen hundreds of thousands of people? Table 3 is also confusing: how do they evaluate on the entire US population?_ **Response:** Currently, we have collected language sample from about one thousand participants. In many cases data was collected in multiple time points, which enrich our current dataset to approximately 5000 speech samples. We are also collecting language sample from 300 more participants including both male and female cases, with and without premutation. We have collected genetic data from these participants and we are planning to use these speech samples to train and test the mobile app before releasing it to the public. The data collection is ongoing therefore we have not used the data in the current preliminary results. Table 3 presents an estimation of cost effectiveness of the proposed method to screen the population. Considering the prevalence of premutation, to identify 1000 premutation carriers we need to perfume genetic test on 151,000 individuals. However if we prescreen the population with our proposed method, we only need to perform genetic test on 37,145 individuals.   _5. This model has been tested on women/mothers mostly above 40 (as suggested in Table 1). However, it is mentioned that the purpose of the project is to develop a tool in order to prescreen potential mothers, which are (or at least include) younger women. This leads to the following two (sub)questions:)_ _a) Is the current plan to use the model developed on the current data set (older women) to train the model, and then test on all women (younger and older)? If so, this may not work because the model on older women may not generalise to women of all ages._ _b) If (a) is not the plan, then we go back to point (1) - how do you plan to obtain a data set with young women who carry the mutation? Eventually, if you contain a large enough data set, will you then stratify your model according to age (younger women and older women have different cognitive tests)?_ **Response:** As correctly noted by the reviewer, there is a correlation between cognitive decline and age in FX premutation carriers (Sterling et al. _Brain Cognition_, 2013). Almost all research studies on the phenotypic profiling of FX premutation carriers are based on data collected from clinically ascertained participants.  In most cases, the FX premutation carriers are identified due to having a child with full mutation. Therefore, nearly all studies to date, including this one, contain a bias towards older FX premutation carriers. To address this issue, we are proposing to use the mobile app to reach to a larger group of participants. As we have mentioned in our preliminary analysis, we are planning to match target and control group on several important features, including age and education level.   _6. In the preliminary results, 100 FX permutation carrier mothers are compared to 100 mothers of individuals with autism spectrum disorder. Why not compare to healthy mothers? What would the results look like?_ **Response:** Currently, we are collecting new data from more than 300 participants, including healthy mothers. We have compared linguistic profiles of FX premutation carriers with healthy mothers as well as mothers of children with Schizophrenia. Our analysis showed that FX premutation carriers have more dysfluencies in language and tend to use more short utterances comparing to the both comparison groups. Our data collection process is still ongoing and has not been fully completed. Therefore, we have not included the data from healthy mothers in this stage of the proposal.   _7. The number of case and controls in the preliminary results are balanced, but in real life, there are way more controls than cases. While reporting TPR and FPR, this should be considered. _ **Response:** Agreed. In case of obtaining the larger population- based data, we are planning to use a cost function to reflect the size differences.   _8. In Section 4.1, the third paragraph of page 5, we have the following: _"We then proceeded to use a random forest classifier in order to evaluate the efficiency of the reduced and optimized profiles (Fig. 5B)." I assume this is a typo and rather the authors mean "Figure 4B" instead of "Figure 5B"? In any event, this paragraph proceeds to discuss F1 scores. However, there is no table of F1 scores. If Figures 4A and 4B were used, then which parameter set(s) (with the particular true positive and false positive rates, since presumably a range of parameter sets were used to obtain the true/false positive rates in Figure 4) were used to obtain the F1 scores? This is extremely unclear, and needs clarification._ **Response:** We thank the reviewer for highlighting this point. We have corrected the figure number and we have included tables with accuracy measure including sensitivity, specificity, F1 score and AUC in the revised proposal.   _9. Paragraph 7 on page 4 (Section 4.1) is the following: "We performed independent two-sample t-tests, to evaluate group differences between FX premutation carrier and comparison group samples. A p-value of less than 0.05 was established for statistical significance. When comparing the groups, we found that more dysfluency features occur in FX premutation carriers (11.15) than in the comparison group (5.38)." Where and how are these values (11.15 and 5.38) computed? The authors have provided Table 1, which is not very helpful (and contains p-values which are not significant), but do not provide a table showing these (more important) results. Clarification is needed, and the authors should include a table that supports these results. _ **Response:** These numbers represent the average number of dysfluencies in FX premutation carriers and comparison group. We have added Table 1 with these results.   _10. It is unclear whether the proposed strategy can be realized without a clinical partner, as the real use of the developed prescreening may require a clinical study._ **Response:** We have a strong collaboration with [Marshfield Clinic](https://www.marshfieldclinic.org) for our FX permutation study.  

Created by Chloé-Agathe Azencott caz
Dean, thank you for highlighting this point. Performing follow up genetic tests depends on our ability to receive the IRB approval. Our strategy at this point will be relying on self-reported data. Prior to collecting speech sample, we will ask the participants to answer a few questions about the history of fragile X in their families. All the participants are clinically ascertained cases. Therefore all 100 cases are known to carry the permutation of FMR1. Genetic data for all of these cases were available. As we have mentioned in our proposal, impairment in language occurs in a variety of clinical syndromes. We believe collecting language samples have the potential to facilitate further research on various neurocognitive disorders.
Xiao, thank you for your comments. Currently we are collecting data from more than 3300 participants. Together with the data collected from our previous studies, we will have more than 4000 samples with genetic data. We are planning to use them for training. Table 3 is an estimation of how using computational phenotyping will reduce the cost of population screening. We understand your concern. We try to address this issue with using data collected in our current study.
My main concern is the data collection and experimental design, which is critical for the feasibility of the study. One of the questions/responses was the following: > 3. What is then the "required data" in Section 2.1? Do the authors plan to collect a new data set? How large will this data set be (since the incidence rates of 1/151 for women and 1/468 for men are so low)? > Response: We are planning to collect data from 20000 participants. If data from 20000 particpants are collected, that would work out to around 132 cases for women (if all 20000 were women). This is not a very large sample for case/control studies. How will any "hits" (positive classifications) be checked? Will there be a follow-up genetic study with tese patients? Also, in Section 2.1 is the following: >Task 3: Testing of prototype on target group > >We will select 100 participants to test the prototype. The cost of recruitment and genetic test will be reflected in the budget. Again, if only 100 participants are selected, but the probability of one of them carrying the mutation being 1/151, there is a strong chance that none of these test patients will be carriers of the mutation. Unless all of these 100 participants are known to carry the mutation (but this was not mentioned in the proposal). Overall, I think the idea is good, but am just concerned that the experimental design has not been sufficiently thought through, and that the desired results cannot be obtained through this experimental design.
3. What is then the "required data" in Section 2.1? Do the authors plan to collect a new data set? How large will this data set be (since the incidence rates of 1/151 for women and 1/468 for men are so low)? Response: We are planning to collect data from 20000 participants. How many of them will be collected with genotype information and can be used for training? 4. One very unclear thing about the preliminary results is how many samples the authors currently have. They only show preliminary results of the classification on 200 samples. Will they use only these samples to build the classifier to screen hundreds of thousands of people? Table 3 is also confusing: how do they evaluate on the entire US population? Response: Currently, we have collected language sample from about one thousand participants. In many cases data was collected in multiple time points, which enrich our current dataset to approximately 5000 speech samples. We are also collecting language sample from 300 more participants including both male and female cases, with and without premutation. We have collected genetic data from these participants and we are planning to use these speech samples to train and test the mobile app before releasing it to the public. The data collection is ongoing therefore we have not used the data in the current preliminary results. It seems that only 300 patients are with genetic data. Is that correct? If so, the training data is still far too small for the purpose of the project. 7. The number of case and controls in the preliminary results are balanced, but in real life, there are way more controls than cases. While reporting TPR and FPR, this should be considered. Response: Agreed. In case of obtaining the larger population- based data, we are planning to use a cost function to reflect the size differences. The problem is not on technique side, but the data side. Again, the authors does not have healthy sample with genetic data.
###Anonymous Review 1 and Authors Response _ **Impact: ** The goal of the proposal is noble: accurate screening of a diverse population of individuals for FMR1 mutations at very low cost to the individual (e.g. just needs an app installed on the phone) is of high theoretic impact (accurate screening tests are always welcome), but of low practical impact (see feasibility below). Whether a "substantial scientific discovery" (one of the criteria listed above for proposals) will be made even if the proposal is successful, is unclear -- it seems like the features are hand-engineered in the proposal and no discussion is made of unsupervised learning approaches (likely because the training data is very small)._   _**Feasibility: ** Here I was confused while reading the proposal. From what I understood, to test the feasibility of building a computational phenotyping method for predicting FMR1 mutation status, a group of 100 FX premutation carrier mothers of adolescent and adult children with full mutation FXS were compared to an age- and education-matched comparison group of 100 mothers of individuals with autism spectrum disorder. A classifier was built and tested with cross validation (Fig. 4. There was mention of a Fig. 5 in the text, but I didn't see it in the PDF). On the other hand, Table 3 seems to indicate the classifier was tested on a cohort of millions of individuals -- are these individuals from the wider US population-based studies conducted by UW-M? If so, their 5 minute audio clips were also parsed in the same way, etc? Why then wasn't the larger dataset used to train the model?_ **Response:** As correctly mentioned by the reviewer, we have built the model by using a group of 100 FX premutation carrier mothers of adolescent and adult children with full mutation and an age- and education-matched comparison group of 100 mothers of individuals with autism spectrum disorder. Table 3 shows an estimation of how using the proposed method will decrease the resources needed for population screening. We have clarified this in the revised proposal. Currently, we are collecting data from broader range of participants including both male and female cases, with and without premutation. Data collection is not completed and therefore we have not used them in the preliminary results reported in this proposal. Also we have corrected the figure number in the revised proposal.   _So I am highly unconvinced that a classifier built using just 100 FX premutation carriers and 100 controls from mothers of ASD patients will generalize well to the diverse populations the proposal is aiming to target. How can one claim the 100 controls used are representative of the 'normal' population? How often do other intellectual disorders (besides ASD) get misclassified as FMR1 mutation positive? 100 cases/controls seems far too few for all of the possible ways in which you could imagine a five minute, uninterrupted blurb might be parsed into features (e.g. there seems to be a massive potential for overfitting the small amount of data). The proposal doesn't seem to include any plans to build a classifier with more data, even though this would seem like the most important improvement needed. The PPV of 2.69% for the current model seems incredibly low -- from a practical standpoint, who will pay for further genetic testing when the classifier PPV is so low?_ _As I understand the proposal, the budget will be used to develop an app and test on 100 individuals, before releasing the app to the public. It's unclear whether testing on 100 individuals, even if they are selected from diverse background, would be convincing that it will be an accurate tool for the general public. Furthermore, the poor preliminary results performance suggests a lot more work (or more likely, training data) is needed to build a reasonable model. What are the plans for expanding the training data?_ **Response:** We agree with the reviewer that 100 participants are not representative of the normal population and further training is needed to develop a more robust and accurate model. We believe language dysfluency is an important hallmark of cognitive decline and has the potential to operate as a phenotype to identify FX premutation carriers. As we have mentioned in response to comments 4 and 6 from reviewer 1, we are currently collecting data from more participants including male and female without premutation. We also have access to the speech samples from 700 families affected by other neurodevelopmental disorders such as Schizophrenia and autism spectrum disorder.   _**Overall evaluation:** In conclusion, while the study is well motivated, and the proposed plan is feasible to conduct (it really seems to involve just app design, implementation, genetic and phenotypic testing of 100 individuals and releasing), I'm not convinced at all it will be a useful tool to the wider public because the training data is so very, very, very small for such a complex problem (and the proposal does not mention any plans to greatly expand the training data), and the poor initial results._
###Anonymous Review 2 and Authors Response _ **Impact: ** low._ _**Feasibility: ** might work but some details need to be fleshed out_ _**Overall evaluation:** The idea behind this proposal is that genetic tests for fragile X syndrome (FXS) are expensive/time-consuming and that it?s possible and more cost effective to collect phenotypic data and use ML techniques to pre-screen individuals for more targeted genotyping. The phenotypic data consists of five-minute language samples and questionnaires collected through a Apple HealthKit app. The preliminary results are based on the application of simple ML techniques (Random forests, feature selection via information gain)._ _I think this proposal points out an interesting correlation between easily observable phenotypes and genetic information. My concern is that the field is moving towards a reduction in genotyping costs even for clinical-grade tests and as a results this project might not be relevant other than in the short term. There are already a series of companies that provide targeted clinical genotyping for relatively cheap and Illumina offers clinical WGS 30x for around $5000, roughly the same cost of a brain MRI. As a result, while I find the idea intriguing I don?t think this is a particularly future-proof project._ **Response:** We thank the reviewer for highlighting this point. As correctly noted, the cost of genotyping has decreased in the recent years. However, we believe prescreening the population will reduce the cost significantly. As mentioned in the response to comment 4 of the first reviewer, using the proposed method estimated to lead to a four-fold decrease in the resources needed to detect FX premutation status with follow-on genetic testing. Impairments in language occur in a variety of clinical syndromes, including neurocognitive disorders (e.g. Parkinson?s disease and Alzheimer?s disease). Our framework can potentially be customized to further study these conditions.   _Comments/questions about the ML analysis:_ _- you use cross-validation to pick the model, but did you evaluate performance of the final model on a completely held out test set? Without that, you might risk overfitting to the test set._ **Response:** We agree with the reviewer and performed additional analysis with new external samples. We were able to classify all FX premutation carriers correctly, with improper classification of only two participants in the comparison group, resulting in an overall F1 score of 0.91 and AUC of 0.97. We have included this analysis in the revised proposal. The result provides strong evidence to the phenotypic value of linguistic and cognitive profiles to identify FX premutation carriers.   _- did you perform feature selection on the whole dataset or on each CV split? Again, overfitting is a concern._ **Response:** We have performed feature selection on the whole dataset as well as each CV split. The results were similar but to avoid the possibility of over-fitting to the training values, we have reported and used in-fold feature selection in our proposal.

Idea 4: Phenotyping for FMR1-related neurodevelopmental disorders page is loading…