Hi, I'm looking to take advantage of the local storage of pre-processed data, and I was wondering if there is a way/command to run multiple pre-processing scripts in parallel on multiple CPUs on your servers. Ex. I have a pre-processing script I would like to run on each individual training image, so I have as many pre-processing scripts as there are images. I'd estimate they each take about 1 hour to run. Can this be parallelized, or should I just run each script individually? Thanks!

Created by Luli Zou luli.zou
I can recommend you two approaches: GNU Parallel and xargs. Here is an example on how to convert all the DICOM images in a directory to PNG images using [GNU Parallel](https://www.gnu.org/software/parallel/) and [ImageMagick](http://www.imagemagick.org/script/index.php): ```bash find /data/ -name "*.dcm" | parallel 'convert {} {/.}.png' ``` "{}" and "{/.}" are placeholders defined by GNU Parallel: {}: absolute path to a DICOM image (e.g. "/data/image.dcm") {/.}: "/" means "without the path to the directory and "." means "without the extension" (if {} represents "/data/image.dcm" then {/.} is set to "image") By default, GNU Parallel use all the CPU cores available. You can also place a list of commands in a text file (one per line) and use the following command to run them in parallel: ```bash parallel < commands.txt ``` See the [GNU Parallel Tutorial](https://www.gnu.org/software/parallel/parallel_tutorial.html) for more detailed information.

Pre-processing in parallel CPUs? page is loading…