The 2000 column training data files have 4000 columns in them. It appears to hold not only the input training data but also the annotated data. Furthermore, the names appear to be incorrect as the files in Data » training-col2000 all have a prefix of "Annotated-" even though they are not annotated making them inconsistent with the 100 column training data. **Example: ** in Data » training-col2000 » Annotated-table-121516.015892.tsv Column 3 header: ```CrviclCncrAmricnJointCommittonCncrAJCCEdition7GroupStg``` Column 4 header: ```DE:3431313 Cervical Cancer American Joint Committee on Cancer (AJCC) Edition 7 Group Stage \ DEC:3431296 Cervical Cancer Stage Grouping | ncit:C12844 ncit:C3262 ncit:C38027``` This is consistent with all other pairs of columns in all of the files in that folder. Is this a feature of those datasets or an error?

Created by JoshReed
Hello, Sorry about that, it looks like the annotated files wound up in both the "training-col2000" and "training-col2000_annotated" directories. However, this training-col2000 directory also includes the unannotated files. You should be able to find the unnanotated file table-121516.015892.tsv file further down in the directory listing. I'll inform someone with access to the site about the annotated files having been accidentally copied into this directory. Regards, Gilberto

Error in 2000 column training data? page is loading…