I tried to align the samples in syn8612213 with STAR. However, a bunch give "unexpected end of file" errors. I redownloaded syn8620828 to see if the problem replicates and it is not an issue with the initial download, and when I try to read the whole file I get the same error: ``` zcat 1000_CER.FCC7KUNACXX_L5IATCACG.r1.fastq.gz | tail gzip: 1000_CER.FCC7KUNACXX_L5IATCACG.r1.fastq.gz: unexpected end of file ``` Can it be that some fastq files are corrupt? I get the error with these fastq files as well: ``` 732_CER.FCC7BMUACXX_L8IGAGTGG.r1.fastq.gz 751_CER.FCC7KUNACXX_L8IACTGAT.r1.fastq.gz 775_CER.FCC7DRJACXX_L5IACTTGA.r1.fastq.gz 777_CER.FCC7L2YACXX_L7ITAGCTT.r1.fastq.gz 785_CER.FCC7KRCACXX_L7ICGTACG.r1.fastq.gz 786_CER.FCC7L0WACXX_L6IGGCTAC.r1.fastq.gz 791_CER.FCC7L0WACXX_L1ITTAGGC.r1.fastq.gz 797_CER.FCC7BMUACXX_L8IATTCCT.r1.fastq.gz 7015_CER.FCC7KUNACXX_L6ITAGCTT.r1.fastq.gz 725_CER.FCC7L0WACXX_L1IACTTGA.r1.fastq.gz 991_CER.FCC7KUNACXX_L3ICGTACG.r1.fastq.gz 813_CER.FCC7L0WACXX_L5IACTTGA.r1.fastq.gz 816_CER.FCC7L0WACXX_L7IGTGGCC.r1.fastq.gz 1058_CER.FCC7L0WACXX_L4IATTCCT.r1.fastq.gz 1085_CER.FCC79ETACXX_L1IACTTGA.r2.fastq.gz 1103_CER.FCC7KRCACXX_L5IATCACG.r2.fastq.gz 1103_CER.FCC7KRCACXX_L5IATCACG.r2.fastq.gz 1104_CER.FCC7KUNACXX_L4IACTGAT.r2.fastq.gz 1029_CER.FCC7L0WACXX_L2IGGCTAC.r2.fastq.gz 1036_CER.FCC7KUNACXX_L6IGGCTAC.r1.fastq.gz 11285_CER.FCC7L0WACXX_L3ICGTACG.r1.fastq.gz 11288_CER.FCC7KUNACXX_L2IGATCAG.r1.fastq.gz 1129_CER.FCC7L2YACXX_L8IGGCTAC.r2.fastq.gz 11300_CER.FCC7L0WACXX_L7IGTTTCG.r1.fastq.gz 11303_CER.FCC7L0WACXX_L2ITAGCTT.r1.fastq.gz 11374_CER.FCC7KRCACXX_L4IACTGAT.r1.fastq.gz 11396_CER.FCC7KUNACXX_L7IGTTTCG.r1.fastq.gz 11474_CER.FCC7DRJACXX_L4IATTCCT.r1.fastq.gz 11456_CER.FCC7KRCACXX_L6ITAGCTT.r2.fastq.gz 1146_CER.FCC7KRCACXX_L7IGTTTCG.r1.fastq.gz 11460_CER.FCC79ALACXX_L6ITTAGGC.r1.fastq.gz 11505_CER.FCC7L0WACXX_L5ITTAGGC.r1.fastq.gz 11507_CER.FCC7L0WACXX_L2IGATCAG.r1.fastq.gz 11491_CER.FCC7KKVACXX_L2ITAGCTT.r2.fastq.gz 1214_CER.FCC7KRCACXX_L4IATTCCT.r1.fastq.gz 11494_CER.FCC7KUNACXX_L1IACTTGA.r1.fastq.gz 11497_CER.FCC7L0WACXX_L1IATCACG.r1.fastq.gz 11500_CER.FCC7KUNACXX_L4IATTCCT.r1.fastq.gz 1933_CER.FCC7KUNACXX_L1ITTAGGC.r2.fastq.gz 1934_CER.FCC7E15ACXX_L4IACTGAT.r1.fastq.gz 142_CER.FCC7L0WACXX_L8IACTGAT.r2.fastq.gz 1963_CER.FCC7KRCACXX_L7IGTGGCC.r2.fastq.gz 6821_CER.FCC7KKVACXX_L8IATTCCT.r1.fastq.gz 6880_CER.FCC7L0WACXX_L4IACTGAT.r1.fastq.gz ```

Created by Niek de Klein NiekdeKlein
Hi @NiekdeKlein, head, tail, and more do produce the output you describe, but for some reason. But the less and wc -l commands don't see the file as gzipped. If you cp the file to .fastq it all appears to be gzipped. For now you're more than welcome to use the Bam files whoch were used to extract reads from located in syn5049322. You will just need to convert them to fastq first. ``` module load star/2.5.1b picard sample=`basename $1 .snap.bam` region=$2 # Define paths rootdir="/sc/orga/projects/AMP_AD/reprocess" indir="${rootdir}/inputs/Mayo/Mayo${region}-BAM-from-synapse" fastqdir="${rootdir}/inputs/Mayo/Mayo${region}-fastq-from-synBam" # Reference files index='/sc/orga/projects/PBG/REFERENCES/GRCh38/star/Gencode24' # Sort aligned BAM and convert to FASTQ java -Xmx8G -jar $PICARD SortSam \ INPUT="${indir}/${1}" \ OUTPUT=/dev/stdout \ SORT_ORDER=queryname \ QUIET=true \ VALIDATION_STRINGENCY=SILENT \ COMPRESSION_LEVEL=0 \ | java -Xmx4G -jar $PICARD SamToFastq \ INPUT=/dev/stdin \ FASTQ="${fastqdir}/${sample}.r1.fastq" \ SECOND_END_FASTQ="${fastqdir}/${sample}.r2.fastq" \ VALIDATION_STRINGENCY=SILENT # Zip FASTQ files gzip "${fastqdir}/${sample}.r1.fastq" gzip "${fastqdir}/${sample}.r2.fastq" ```
It Looks gzipped to me: ``` tail 1000_CER.FCC7KUNACXX_L5IATCACG.r1.fastq.gz ??????Fh?t?J?t??B?T9?=?w;??????j@????M? ????C??^>T?P.?r?J?j?|g?a?{/?J????????J??S???1@J?)??:?Sl???K?Y??@?\????GG?y??? 7NP?jy?n?I ``` When I move to fastq, then gzip it is double zipped (still binary after zcat) ``` mv 1000_CER.FCC7KUNACXX_L5IATCACG.r1.fastq.gz 1000_CER.FCC7KUNACXX_L5IATCACG.r1.fastq gzip 1000_CER.FCC7KUNACXX_L5IATCACG.r1.fastq zcat 1000_CER.FCC7KUNACXX_L5IATCACG.r1.fastq.gz | tail ??????Fh?t?J?t??B?T9?=?w;??????j@????M? ```
Hi @NiekdeKlein, Those files do not appear to be gzip'd despite having a .gz extension ```ls -ltr 1000_CER.FCC7KUNACXX_L5IATCACG.r1.fastq.gz -rw-rw-r-- 1 jgockley jgockley 7072366592 Dec 16 22:13 1000_CER.FCC7KUNACXX_L5IATCACG.r1.fastq.gz less 1000_CER.FCC7KUNACXX_L5IATCACG.r1.fastq.gz @R0230412:381:C7KUNACXX:5:1101:10000:13114/1 + CCCFFFFFFHAHHJJJIHJFIJJJEHHHGHJEGGIGHHJJIIEHHFFFDBC>A??@@;?B?BDDBDCDDCDDDDDDCDECCD@C>CCC@B@DC@@C@BBDB mv 1000_CER.FCC7KUNACXX_L5IATCACG.r1.fastq.gz 1000_CER.FCC7KUNACXX_L5IATCACG.r1.fastq gzip 1000_CER.FCC7KUNACXX_L5IATCACG.r1.fastq less 1000_CER.FCC7KUNACXX_L5IATCACG.r1.fastq.gz ^_<8B>^H^Hx^RX^@^C1000_CER.FCC7KUNACXX_L5IATCACG.r1.fastq^@<92>6$_f <9A>j$3<99>$@<90>^DYl^?\5^?<92><8B>^H^@<99>U=?Q A^B<81>^H^Ow^O^^Nxz<9C>#<9F>^? 8^^^O<90>?^^ ^_<8B>^H^Hx^RX^@^C1000_CER.FCC7KUNACXX_L5IATCACG.r1.fastq^@<92>6$_f <9A>j$3<99>$@<90>^DYl^?\5^?<92><8B>^H^@<99>U=?Q A^B<81>^H^Ow^O^^Nxz<9C>#<9F>^? 8^^^O<90>?^^ ``` Apologies for the inconvenience!

Are some of the fastq files of syn8612213 corrupt? page is loading…