To whom it may concern, We are trying to run a pipeline on samples from the BrainVar BrainVar - syn21557948 - Files (synapse.org) database, namely the whole genome sequencing and RNA-seq fastq files stored at synapse. I was wondering if there was an efficient way to transfer these data to a google bucket so that we can process the samples using our pipelines in Terra/GCP. I have tried using the synapse API to copy the files to a local folder and then upload to the google cloud bucket, but that seems to take too much time. Another approach I'm considering is to just make the data download for each file individually as part of the pipeline, but that would require significant work, so if there is an easier way to transfer the files that would be great. -Eduardo

Created by Eduardo Maury emauryg
The data is in a private AWS bucket, inaccessible to Synapse users except through Synapse.
@brucehoff Where could we find information about the location of PsychENCODE Knowledge Portal location in AWS?
@KevinBoske @ryanluce You may be interested in this request.
What about attempting something like this? [aws to google cloud ](https://cloud.google.com/solutions/transferring-data-from-amazon-s3-to-cloud-storage-using-vpc-service-controls-and-storage-transfer-service)https://cloud.google.com/solutions/transferring-data-from-amazon-s3-to-cloud-storage-using-vpc-service-controls-and-storage-transfer-service Our pipeline right now is set up to run in Terra/GCP so it would require a fair amount of effort to re-configure to NextFlow/AWS and set accounts etc.
Hi, Eduardo: The files for the PsychENCODE Knowledge Portal are stored in the Amazon (AWS) cloud. As of today we have not investigated tools to mirror large amounts of data from AWS to GCP. Another idea is to use analytical tools that run in the AWS cloud. Some data engineers in our org' are excited about [NextFlow](https://www.nextflow.io/).

Transferring data in synapse container to google cloud bucket page is loading…