Hi I imported mutation date to SAS but some of the institution's ID were not read in completely thus not able to match with the sample ID in sample data.

Created by Ruqin Chen Ruchen
Dear Ruqin, If it helps, every SAMPLE_ID has `GENIE_` in front of it, so you could delete that to shorten the SAMPLE_ID. However, my initial thought is that you shouldn't be truncating any of the SAMPLE_ID's or else they won't match. Since I usually process data through R or python, the characters aren't truncated. I urge you to look into why the variables were truncated and to try to make it such that none of the variables are truncated. Best, Tom
Hi Tom, I was trying to match SAMPLE_identifier in Sample data and Tumor_sample_barcode in Mutatione data. However, Tumor_sample_barcode in Mutatione data was read as character 21 format and SAMPLE_identifier in Sample data was read as character 23 format. So when I matched the two variables, quite a lot data can't be matched. I have to truncate the variables to be both character 21 format to match them. Not sure if that's appropriate. As when I was examining the original file, it seems it should have up to 27 characters. Thanks Ruqin
Dear @Ruchen Can you please clarify what you mean by institutions ID's were not read in completely? By the GENIE convention, samples that are in the clinical file that aren't in the mutation file are samples with no mutations. Best, Tom

Mutation data not reading complete sample ID page is loading…