Hi I noticed that loading all the vcf files takes a long time. Considering the limited resources of the docker machine and the fact that everyone needs to load the vcf files I think it would be helpful to also offer a table (csv) version of all these variants merged together including annotations. This would reduce a big overhead on the docker machine. thanks DA

Created by exquirentibus veritatem exquirentibus
thanks
sorry should explain what it does. unzips 1 vcf.gz, filters for mutations that 'passed', pops out a .csv of these, and then zips the original file back. Loops for however many vcfs are in the directory.
```{r} library(doParallel) library(data.table) cl <- makeCluster(detectCores()) registerDoParallel(cl) library(devtools) library(tidyverse) #install.packages("R.utils") library(R.utils) #get only passing alterations in vcfs path = "/home/schtuff_file" filename <- dir(path, pattern =".vcf.gz") for(i in 1:length(filename)){ file<-fread(R.utils::gunzip(filename[i]), sep="\t") header<-which(file[,1] == "#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\tFORMAT\tTUMOR\tNORMAL") file<-file[header:.N] colnames(file)<-"stringy" df<-str_split_fixed(file$stringy, "\t", 11 ) outfile <- paste(filename[i], "_all.csv") colnames(df) = df[1, ] df = df[-1, ] df <- subset(df, select = -c(QUAL) ) is.na(df) <- df=="." df<-as.data.frame(df) df %>% select(`#CHROM`, POS, ID, FILTER, REF, ALT) %>% filter(FILTER=='PASS') %>% select(-c(FILTER)) ->df outfile <- paste(filename[i], "_pass.csv") lapply(Sys.glob("*.vcf"), gzip) fwrite(as.data.frame(df), file=outfile, sep = ",") } ```

vcf suggestion page is loading…