Hello, In the ground truth tree for the first training instance, the closest point to the twins 8 and 9 (string 2120010021) is the point 10 (string 0112212221). So if this is right, a cell divided, then one of this pair underwent "2" mutations in positions 4, 5, 7, and 8 and a "0" mutation in position 1, and its sibling did the exact reverse: "0" mutations in 4,5,7,8 and a "2" mutation in 1. My expertise is in machine learning, not the laboratory procedure involved in this experiment; my question is this: Is it more likely that this happened, or that there was "noise" in the automated procedure to generate the ground truth tree, so these cells are not actually so closely related? In the first case, we should be using our tools to predict similar clump-and-reverse mutation instances; in the second we should be trying to model error in the ground truth tree generation process. There are a number of similar surprising pairings with reversed multiple mutations in other training cases. Could someone shed some more light about what's going on? Thanks, Phil Rennert

Created by Phil Rennert philrennert
Thanks! This points me in a useful direction. Phil
Hi Phil thanks for your question and the interest in the challenge. You can see in a [previous thread](https://www.synapse.org/#!Synapse:syn20692755/discussion/threadId=6205) that a couple of cells were mislabeled in the gold standard for the training, but it was not due to "noise" in the gold standard, simple mislabeling. Your observation regarding twin mutations describes perfectly how the process generating the barcode works: A trit in state 1 can either mutate to 0 or to 2. So what you observe is that precisely the exact same sites mutating to exactly the opposite option. This phenomenon has probably to do with the bias due to the biological process, and not noise from the gold standard. I hope this clarifies thanks!

subchallenge1: ground truth accuracy? page is loading…