Hi,
I have some TF code loading pre-trained model:
tf_saver = tf.train.Saver()
tf_saver.restore(session, save_path="some path")
The pre-trained model files are uploaded using Docker:
COPY model.cpkt.data-00000-of-00001 /model.cpkt.data-00000-of-00001
COPY model.cpkt.index /model.cpkt.index
COPY model.cpkt.meta /model.cpkt.meta
It took several seconds to load the pre-trained model when I ran this locally. However, it timed out when I tried to run this on Express Training Lane. An example would be submission id 8019899. There is no exception / error on the log file I received. I am wondering if I missed anything if I want to load model dynamically from Docker Image.
Thanks for your help!
Created by Yiqiu Shen ashen Problem is solved by changing the TensorFlow version on Docker image from 11 to 12. I waited in normal training lane for several hours and received a Exception which I did not see in Express Lane logs.
Seems like there is some issue with TensorFlow and the os-level infrastructure. The exception is:
STDERR: File "/usr/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1340, in restore
STDERR: if not file_io.get_matching_files(file_path):
STDERR: File "/usr/lib/python2.7/site-packages/tensorflow/python/lib/io/file_io.py", line 231, in get_matching_files
STDERR: compat.as_bytes(filename), status)]
STDERR: File "/usr/lib64/python2.7/contextlib.py", line 24, in __exit__
STDERR: self.gen.next()
STDERR: File "/usr/lib/python2.7/site-packages/tensorflow/python/framework/errors.py", line 463, in raise_exception_on_not_ok_status
STDERR: pywrap_tensorflow.TF_GetCode(status))
STDERR: tensorflow.python.framework.errors.NotFoundError: /sys/dev/block/7:0/subsystem/dm-809
Hi @tschaffter, @brucehoff I think I might miss some Docker/server specific code about restoring a session in TensorFlow. Do the organizers have any example code available that restores model from /modelState or Docker image? Thanks!
Drop files to upload
TensorFlow Code For Loading Pre-Trained Model Timed out in Express Lane page is loading…