According to the wiki 2.3 - Training and Validation Dataset Description The training MMRF data "was collected from bone marrow tumor cells from patients with newly diagnosed active MM" However some patients have multiple samples with progressive numbers, for example in Strelka: MMRF_1049_1_BM , MMRF_1049_2_BM , ... It seems from the MMRF website that sample are collected "every time a patient progresses" and numbered sequentially. Is therefore only sample 1 collected from a newly diagnosed MM patient? In the validation set, are samples only going to be from newly diagnosed patients?

Created by Giovanni Paternostro gpaternostro
Good eye, You guess is correct. One of our collaborators went through and mapped the newly diagnosed samples so that that is all that is referred to in the clinical annotations. By the way the numbering _N_ does not always refer to the first sample so a MMRF_Patnum_2_BM might be the the newly diagnosed. This situation does not exist for any of the validation cohorts. Only newly diagnosed samples were included in the validation studies so no filtering was needed on our end.

MMRF VCF files for Challenge 1 page is loading…