That was a very useful Webinar! Just near the end, you discussed the HR_FLAG field, and the position of those who left the study early, or did not have events before the cut-off date. I'd like to understand this better. As I understand it now, the training data includes all patients. However, to make predictions, we should only include patients who can be clearly identified in the High-Risk group, or not, because they did have an event before the cut-off date. I'm not quite clear, from the discussion, how we filter patients. Clearly, if the HR_FLAG is set, we know they are included, and are in the group. However, if the HR_FLAG is not set, how do we distinguish between patients who are genuinely not High Risk, and those patients who have left the study early, or for whom there is no event before the cut-off date? I had thought it was the D_OS flag, but, after the brief discussion on the webinar, I'm not so sure. - If a patient has left the study, or there has been no event before cut-off, what will D_OS be set to? - Is the D_PFS_FLAG relevant to this? Could there be patients who stay in the study, have the HR_FLAG=0 and D_PFS_FLAG=0? I'd be grateful for an clarification to this point. As I said, I thought I'd got the picture, but the Webinar has left me uncertain. [I have read the discussion about this, but I'm not sure about how the HR_FLAG with 'Censored' or 'Unknown' works after reading that, and after the Webinar]. It might be easier to put a definitive statement in the wiki, rather than just answering this here.

Created by Peter Brooks fustbariclation
Hi Mi, Submissions will be scored on the binary highriskflag in the tie-breaking metric (BAC). We do not expect half of the samples to be high risk. Different published studies range from having 15% -30% of samples being high risk so your solution to defining a cut off might need to be slightly more sophisticated.
Dear Mike, Regarding the output result: predictionscore and highriskflag. How do you define the highriskflag ? I read "disease progression (or death) within 18 months". So basically it's all D_PFS_FLAG=1 within 18 month. I can simply compute the risk score and set those I consider high risk : D_PFS_FLAG=1 within 18 month Are we evaluated on the highriskflag ? Thanks, Mi
_Dear Peter,_ _I apologize for the delay in this response. I thought we had another forum post that answered this but it did not quite suffice._   Some clarification, D_PFS_FLAG is the basis of the HR_FLAG. * If a patient "left the study" earlie (i.e. before any event) the D_OS will be 0 as will the the D_PFS. * Indeed a patient can have a HR_FLAG = 0 and a D_PFS_FLAG of 0.   A good way to think about he HR_FLAG is in terms of survival analysis metrics. Measures based on contingency tables like balanced accuracy or F1 score will not use any information from these CENSORED patients. If there were 50 patients and 5 had HR_FLAG censored then the sum of all cell's in a contingency table would be 45. Similarly, AUC will not actually use these samples. The integrated time dependent AUC computes this censoring internal to the function as the time "slices" move. This is not custom or unique to this challenge but is the standard way these metrics work. For example the R package [caret](http://topepo.github.io/caret/index.html) computes the contingency tables with all data as input but if you sum the cells you notice the total _N_ is decreased as described above.   In training set samples, participants can do what they like with these samples (excluded, include, down-weight etc). I think some of the confusion is from the term "censored" . All Samples that lack an event are censored in the traditional survival analysis context. These samples are **additionally** censored be fore the time cut of interest.   Regards, Mike

Exclusions and the HR flag page is loading…