Hi, I don't know how much I can discuss this here since can be easily considered as strategical method, but I think the following parts are enough and can be out there: I get the following in my log file: ``` *** caught segfault *** address (nil), cause 'unknown' ``` Based on few hours of googling, it seems that everyone are suggesting that this is simply memory allocation and addressing issue. Would you please check the following submission (Note that I intentionally start the loop from 3rd index which is explained in the following lines): * submission name: SC1.expresslane.v17vl_forDiscussionBoard * submission ID: 9639059 * log URL: https://www.synapse.org/#!Synapse:syn10959744 I did quite a few submissions in the expresslane to pinpoint the issue and I got to the same conclusion. Reading gzipped VCF files iteratively (without doing anything and only overwriting the previous variable with new values of the new file through vcfR) cause the error. It always successfully read the first one and breaks while reading the second one in (regardless of which sample is first or second). Even deleting objects in R at the end of each iteration and doing garbage-collection does not change the situation.
It is either memory shortage or an internal bug in suggested R package (vcfR). In case of the former: Related threads and discussions: * https://www.synapse.org/#!Synapse:syn10288546 from 54:31 to 55:16 * https://www.synapse.org/#!Synapse:syn6187098/discussion/threadId=2473 * https://www.synapse.org/#!Synapse:syn6187098/discussion/threadId=2540

Created by Mehrad Mahmoudian michelangelo
As an update, I posted this issue on vcfR package [github page](https://github.com/knausb/vcfR/issues/79#issuecomment-334281327) and Brian (the creator and author of the package) is very much responsive. There is a high chance that we can resolve the issue on github and concequently on CRAN in very close future.
Just for some addition clarification. We are supplying the matching .tbi (tabix indexed) file for each .vcf.gz. Most tools line vcfR will take in the .vcf.gz and simply look for a .vcf.gz.tbi in the same folder location. If it finds it there the package uses the .vcf.gz.tbi to load the vcf info faster.
I want to thank @Michael.Mason for the efforts he put into this issue. For those who have the same issue (if any), we finally got it working. vcfR package has an internal bug (not one of those bugs that can turn into feature though). The vcfR does not check the input file and my code was feeding vcfR a tabix file (not strict enough regular expression) and the vcfR was doing something unexpected and trying to access some memory that it does not have access to (?!) and was causing the error. My suggestions to those who have the same problem: * Make sure the file you are trying to read **ends with** `.vcf` or `.vcf.gz`
> Are you imputing just the clinical data like iss and sex or are you trying to impute the vcf data too? I think I should not answer this question in this public channel. But regarding this "segfault" issue I can tell you that it happened way before I want to do anything serious with data. It actually happens while reading the data in as I mentioned before: > Reading gzipped VCF files iteratively (without doing anything and only overwriting the previous variable with new values of the new file through vcfR) cause the error. I also made a specific submission for this purpose that only and only tries to read the data in (and not even storing it any variable) and the rest parts are either commented out or deleted. Fortunately you have my code and environment and the log file, so you can run the code manually on your own computer and see if it works.
It should not matter actually. We list the filtered files in those columns now so they are much smaller. I thought you might be trying to access the larger Mutect files that are not listed but are still there. I have requested my collaborator who set up more advanced threading to look at it. It is possible that the threading is causing your docker agent to have access to less memory, although I don't expect that. Are you imputing just the clinical data like iss and sex or are you trying to impute the vcf data too? You should be able to keep all the vcf data in memory to impute is that what you are doing?
Hi Mike, I'm trying to read in the VCF files named in `RNASeq_mutationFileMutect` and `WES_mutationFileMutect` columns of the `sc1_Validation_ClinAnnotations.csv`. Do you think it is related to the file I'm trying to read in? Should it matter at all?
Hi Mehrad, Is this from using the FILTERED vcf or the original vcfs?
@Michael.Mason @thomas.yu Do you have any idea how this can be solved? it's been more than 24 hours that I've posted this and haven't got any reply from MultipleMyelomaDREAMChallengeAdmins
Since it works fine on my own computer, my hypothesis is that you have allocated a small memory that after the first iteration despite of all garbage-collection, there is no room for second iteration and the code underneath vcfR is not good enough to detect that the available memory is less than the memory it needs to read in the new file (and also do not pre-allocate and check) and as the result it tries to address a memory block that is not at it's disposal. But I would still appreciate if you investigate this matter.

caught segfault page is loading…