Dear GENIE team,
The file data_mutations_extended.tx in version 6.0 contains 45 columns instead of the 123 which were available in the same file from version 5.0.
A lot of the missing data is extremely useful for the research community.
I was wondering if this was intentional or if an updated version of the file will be released soon.
Thank you very much,
Artur Veloso
Created by Artur Veloso abveloso Dear @abveloso ,
We understand that these are valuable data fields for you and are glad you find the GENIE dataset useful. The GENIE dataset consists of the genetic variants and clinical information, which are found in the GENIE release. The fields you mention are generated from the genetic data by a publicly-available annotation tool VEP, which we run using vcf2maf. This is one of several publicly-available tools that can be used to annotate genetic variants.
It is helpful to know that you find these fields valuable and we will take this into consideration. However, please refrain from statements invoking the wishes or medical treatment of the patients involved. As described in the GENIE publication, samples in the GENIE dataset were collected as part of routine clinical care, not explicitly for the purpose of generating the GENIE dataset. This data sharing is permitted based on the nature of the patient consent. Dear @thomas.yu,
The columns SIFT and PolyPhen contain measurements that indicate if a mutation affected the functioning of the mutated gene. They are extremely important for researchers to understand if the mutation is functional or just a passenger mutation.
The uncompressed size of the v5.0 file is 0.5GB. That is not a very large file. If file size is a concern, why not just store the file in compressed form?
Please keep in mind that these data come from cancer patients who went through a very painful invasive procedure for sample collection. I imagine the patients would be very disappointed to learn that their data is not being fully utilized because of large files sizes.
I sincerely hope that you reconsider and share the complete file.
Thank you,
Artur Dear @abveloso,
Thanks for the feedback. I will unfortunately be unable to provide the updates in columns for this release, but you bring up a good point about retaining the entire dataset.
Best,
Tom Dear @thomas.yu,
Thanks for your reply.
I would suggest releasing the full dataset since different columns might be useful for different scientists.
On my end, I am missing the columns "Gene", "SIFT", and "PolyPhen". If they could be added to the current release, I would be very thankful.
Thank you very much,
Artur Dear @abveloso,
Thanks for letting me know- this is very helpful. This was intentional due to the ever increasing mutation file size.
That being said, do you have the column names that we took out that are useful to the community? I can work on adding them back in.
Best,
Tom
Drop files to upload
file data_mutations_extended.txt in version 6.0 is missing columns page is loading…