Hi Dear organizers, I?m Hanrui Zhang, a graduate student in University of Michigan. Dr. Yuanfang and I feel honored to participate in the recent 2019 Allen Insititute Cell Lineage Reconstruction DREAM Challenge. While we?re excited to start building our program, we have a minor question about the training data, could you help us on that? When we were assessing the training data in subchallenge 1, we observed mismatches of cell id with cell state in the recording and ground Truth, which might result in mismatch of taxa names. For example, sub1_train_24.nwk looks like: ((((1_2101101201:42,2_0121201121:42):64,(3_2102111000:43,4_2102101000:43):63):64,((5_2100111000:48,6_0100001000:48):54,(7_2100101001:41,8_2100101001:41):61):68):42,9_0200000021:212); and when visualized with DendroPy: ![screen-shot_DendroPy](https://i.postimg.cc/NFQsx9Sz/Screen-Shot-2019-10-26-at-4-06-49-PM.png) While the recording sub1_train_24.txt looks like: cell state 1 0200000021 2 2101101201 3 0121201121 4 2102111000 5 2102101000 6 2100111000 7 0100001000 8 2100101001 9 2100101001 And this might result in negative results in evaluation with RF. We have observed mismatch like this in sub1_train_24,26,31, 58, 68. All the rest of the training data matches well in both recordings and ground truth. We wonder for groundTruth, should we stick to the cell id or the recording? Thank you! Best wishes, Hanrui Zhang

Created by Hanrui Zhang rayezh
Yes all fixes noticed in this thread have been updated as of today! thanks
So has the big train file train_setDREAM2019.txt been updated with the fixes?
Message from Igor finding another mislabeling "I see that some of the taxa may have been mislabeled previously, and that there was a fix. Today I synced my workspace using the synapse -get shell command, and found that the implemented fix may not have propagated to all available files. The test case I'm looking at is in sub1_train_74: https://www.synapse.org/#!Synapse:syn20948842 and https://www.synapse.org/#!Synapse:syn20948930.2 If I'm reading this correctly, the labels for cells 4 and 5 appear to be reversed in one of those files. Not sure if the issue exists in other files, but wanted to raise it anyhow."
Thank you very much!
Hi, cell ids in training set ground truth .nw files have been fixed. Thanks for your patience!
Hi Hanrui, thrilled that you are participating! Thanks for discovering this mistake, it is certainly the cell id coming from the video microscopy reconstruction. We will check and update the training set. Best and good luck Pablo Meyer

Mismatch of taxa names in some training sets in subchallenge1 page is loading…