Cannot start from pretrained model using Tensorflow 0.11.0, please help

Hello @thomas.yu, When I use tensorflow/tensorflow:0.9.0-gpu as the base docker image I can start from pretrained model. However, I cannot do so after I have passed to the docker image tensorflow/tensorflow:0.11.0-gpu with the following error: ... STDOUT: using the checkpoint ./model.ckpt-2600 STDERR: main(sys.argv) STDERR: File "DREAM_DM.py", line 608, in main STDERR: finetune(X_tr, X_te, opts) STDERR: File "DREAM_DM.py", line 445, in finetune STDERR: final_saver.restore(sess, ckpt.model_checkpoint_path) STDERR: File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1340, in restore STDERR: if not file_io.get_matching_files(file_path): STDERR: File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/lib/io/file_io.py", line 231, in get_matching_files STDERR: compat.as_bytes(filename), status)] STDERR: File "/usr/lib/python2.7/contextlib.py", line 24, in exit STDERR: self.gen.next() STDERR: File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/errors.py", line 463, in raise_exception_on_not_ok_status STDERR: pywrap_tensorflow.TF_GetCode(status)) STDERR: tensorflow.python.framework.errors.NotFoundError: ./sys/dev/block/253:738/subsystem/dm-812 When I do "ls -la" in my docker image, I see that checkpoint file and the snapshot were copied: STDOUT: -rw-rw-r--. 1 root root 87 Dec 15 14:35 checkpoint STDOUT: -rw-rw-r--. 1 root root 510005647 Dec 15 14:35 model.ckpt-2600 STDOUT: -rw-rw-r--. 1 root root 6968371 Dec 15 14:35 model.ckpt-2600.meta I need tensorflow 0.11.0 as several things have been fixed in new version. Can you please help me with it? Best

Created by Yaroslav Nikulin (Therapixel) ynikulin
I would like to raise this question again. TensorFlow 0.9.0 has some issues with its batch_norm layer (discussed a lot here: https://github.com/tensorflow/tensorflow/issues/1122). I only know that I have significant performance drop when I pass from is_training=True to False for testing using TensorFlow 0.9.0. Locally I have TensorFlow 0.11.0 and on pilot data it works the other way around (and that's how it should be!): is_training=False improves performance. However, Docker image built from TensorFlow 0.11.0 cannot load a model and continue training from a snapshot as I described above. I guess that the container has troubles accessing the virtual file system (it cannot find the file with fake name ./sys/dev/block/253:738/subsystem/dm-812). Right now I am just always using is_training=True and most probably the models under-perform. Could you please help me with this? Yaroslav
I've been using 11 and it's working for me (but I haven't tried migrating a model from 9 to 11).
Dear @thomas.yu, @tschaffter, I am sorry, but I have to rise this question again. I do need TensorFlow 0.11.0 but apparently it is not supported, I always have the same error: it seems like when a Docker container is built starting from tensorflow/tensorflow:0.11.0-gpu it cannot see the virtual filesystem, no idea why: STDERR: tensorflow.python.framework.errors.NotFoundError: ./sys/dev/block/253:738/subsystem/dm-812 Please, help ! Yaroslav
Just to add, when I change back to Tensorflow 0.9.0 the model is loaded and everything is allright. An example of submission with error is ID 7874131. Does anyone else use Tensorflow 0.11.0? Did you have any problems with that?

Your web browser must have JavaScript enabled in order for this application to display correctly.
If you are an automated web crawler from a search engine, follow this AJAX application crawl link

Drop files to upload

Cannot start from pretrained model using Tensorflow 0.11.0, please help page is loading…