Same submission finishes with Express Lane, but fails full Model Training

Hello, Did some testing with the Express Lane and the new template Dockers released last week. Submission 7514144 (Caffe preprocess + training template) to the Express Lane appears to have finished both steps with the Express Lane. Though resubmitting the same as 7515200 to the full Model Training mechanism had the following error: STDOUT: Starting Caffe AlexNet training STDERR: F1111 14:48:48.464579 6 caffe.cpp:93] Check failed: error == cudaSuccess (35 vs. 0) CUDA driver version is insufficient for CUDA runtime version STDERR: * Check failure stack trace: * STDERR: @ 0x7f502f314e6d (unknown) STDERR: @ 0x7f502f316ced (unknown) STDERR: @ 0x7f502f314a5c (unknown) STDERR: @ 0x7f502f31763e (unknown) STDERR: @ 0x40a32e get_gpus() STDERR: @ 0x40b3d3 train() STDERR: @ 0x408e6c main STDERR: @ 0x7f501e506b15 __libc_start_main STDERR: @ 0x409775 (unknown) STDERR: /train.sh: line 20: 6 Aborted (core dumped) caffe train --solver=$MODEL -gpu $GPUS STDOUT: Done Is there a difference in the CUDA runtime version between the two routes of submission? Thank you in advance, Jeff

Created by mobileroaming
> Is there a difference in the CUDA runtime version between the two routes of submission? The Express Lane and Challenge GPU servers have new NVIDIA drivers that support CUDA 8.0. The examples that we have released last week are based on `nvidia/cuda:8.0-cudnn5-devel-centos7` and have been tested on the Express Lane and Challenge machines. I've observed that the CUDA Docker images provided by NVIDIA don't work with the NVIDIA drivers installed on the Open Phase machines, which are going to be decommissioned tonight at 6 pm ET. On the Express Lane machines: ``` $ cat /proc/driver/nvidia/version NVRM version: NVIDIA UNIX x86_64 Kernel Module 367.48 Sat Sep 3 18:21:08 PDT 2016 GCC version: gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ```

Your web browser must have JavaScript enabled in order for this application to display correctly.
If you are an automated web crawler from a search engine, follow this AJAX application crawl link

Drop files to upload

Same submission finishes with Express Lane, but fails full Model Training page is loading…