Looking at the pilot data, it seems you have 2 files - crosswalk and image metadata. But, for SC1, I believe, we will get access to only the image metadata. But will this single file then also contain the labels (cancer or no)? Currently this is not the case and one has to map between the two files as shown in the example preprocessing files. Can you please clarify this?
Created by Amrit Krishnan amrit110 > If the training data is exactly the same for both SC1 and SC2, can we use the clinical information, which is supposed to be only used for SC2, to make predictions in SC1? If yes, what's the difference between SC1 and SC2?
The _scoring phase_ is different in SC1 and SC2. Also, the input data that will be given to your inference method in that phase are different (see [Challenge Dictionary](https://www.synapse.org/#!Synapse:syn4224222/files/)).
> I'm wondering how you would provide invL and invR for previous exams. Based on my reading of the Wiki, it clearly says: "All previous exams correspond to Negative exams both in left and right breasts." Therefore, both invL and invR shall be negative by definition for all previous exams.
That's correct. Technically these two columns are included in the table and you would find that the value for these two columns to be always 0 for prior exams. It's a matter whether you hard code in your method the statement "All previous exams correspond to Negative exams both in left and right breasts." or your method sees that from the values in _invL_ and _invR_. The general idea is that we still provide all the data that we have from past subjects (training set) and prior exams for a given subject (though here _invL_ and _invR_ don't provide more information that the statement that we have made). Thomas,
I'm quite confused with your answers:
1. If the training data is exactly the same for both SC1 and SC2, can we use the clinical information, which is supposed to be only used for SC2, to make predictions in SC1? If yes, what's the difference between SC1 and SC2?
2. I'm wondering how you would provide invL and invR for previous exams. Based on my reading of the Wiki, it clearly says: "All previous exams correspond to Negative exams both in left and right breasts." Therefore, both invL and invR shall be negative by definition for all previous exams.
> does this mean we would have to filter in our preprocessing/data loading depending on which SC we wish to train our model for?
The training dataset exposed to your training Docker container when submitted to SC1 and SC2 is the same. We provide all the data/exams that we have for the subjects in the training set as well as the exams metadata table (see [Challenge Dictionary](https://www.synapse.org/#!Synapse:syn4224222/files/)). When training your method in SC1, you can choose which data you want to use from the wealth of information that has been collected from "past" subjects. We decide to provide you with all the information about the "past" subjects to give you the freedom of choosing which data you want to harness. As you have correctly observed, participants who develop machine learning-based methods may decide to only use data/covariate that are exposed to your trained method during the scoring submission (see Challenge Dictionary).
> Could you also tell if 'invL' and 'invR' will or will not be used as inputs to the prediction model for sub-challenge 2 please?
I'm assuming here that you are referring to the input of the trained model when submitted for scoring. The covariates _invL_ and _invR_ will be provided for all the prior exams of a given subject but not for the exam for which you will make a prediction (see [Challenge Dictionary](https://www.synapse.org/#!Synapse:syn4224222/files/)). @tschaffter
Could you also tell if 'invL' and 'invR' will or will not be used as inputs to the prediction model for sub-challenge 2 please? Yes, sorry I was talking about the training set. so if its the same for SC1 and SC2, does this mean we would have to filter in our preprocessing/data loading depending on which SC we wish to train our model for? > Looking at the pilot data, it seems you have 2 files - crosswalk and image metadata.
The two files are the _exams metadata_ and the _images crosswalk_ files.
> SC1, I believe, we will get access to only the image metadata.
UPDATED ANSWER: You are not mentioning it but I assume that you are talking about the training set. The training set is the same for SC1 and SC2, for which you are given access to the exams metadata.