Hi everyone, I thought about running the tensorflow example locally to see how it works using the given dataset but got somesort of KeyError. What is it that i did wrong? ``` Python 3.5.2 :: Continuum Analytics, Inc. --- Metadata-Version: 2.0 Name: tensorflow Version: 0.9.0 Summary: TensorFlow helps the tensors flow Home-page: http://tensorflow.org/ Author: Google Inc. Author-email: opensource@google.com Installer: pip License: Apache 2.0 Location: /home/lintangsutawika/anaconda3/envs/tensorflow/lib/python3.5/site-packages Requires: six, protobuf, numpy, wheel Classifiers: Development Status :: 4 - Beta Intended Audience :: Developers Intended Audience :: Education Intended Audience :: Science/Research License :: OSI Approved :: Apache Software License Programming Language :: Python :: 2.7 Topic :: Scientific/Engineering :: Mathematics Topic :: Software Development :: Libraries :: Python Modules Topic :: Software Development :: Libraries Entry-points: [console_scripts] tensorboard = tensorflow.tensorboard.tensorboard:main Parsing the csv's. Traceback (most recent call last): File "DREAM_DM_starter_tf.py", line 700, in main(sys.argv) File "DREAM_DM_starter_tf.py", line 689, in main X_tr, X_te, Y_tr, Y_te = create_data_splits(path_csv_crosswalk, path_csv_metadata) File "DREAM_DM_starter_tf.py", line 96, in create_data_splits Y_tot.append(dict_tuple_to_cancer[dict_img_to_patside[img_name]]) KeyError: ('65725', '646644.dcm.gz') ```

Created by Lintang Adyuta Sutawika lintangsutawika
This is explained in the Dictionary.xls file If you want to read the label values from the metadata file in a safe way, you can use something like this: ``` # Labels should be 0 or 1 # In some (rare) cases they can be: # . - not imaged # * - value masked def force_num_label(a): # force label to be a number try: return int(a) except: return 2 ``` It forces values that are not numeric (. or *) to be mapped to a (numeric) bogus label value of 2.
Hi Joshua, The dot '.' represents missing data.
Hey guys, I finally got my training submission to start and I solved this problem by inserting a check on the values of row[3] and row[4]. I noticed that sometimes those values could be '.', so I just inserted a check for that. Also, may I asked what is the meaning of '.'? Does it mean that the data set is incomplete? Or does it have a special meaning? Joshua
Hey guys, So the way the code generally works is that I need a dictionary that maps the name of the dicom file to the condition (binary 0/1 for whether there was an abnormality on that breast). We can get this using our .tsv files. Later in the code, when I create my batches, I do so randomly inline, choosing a random batch of dicom files to read in. This will be stored in something I call dataXX. Then, to train, I need the corresponding labels to all my images I stored in dataXX. This should be something like dataYY. It seems like you guys aren't getting to the actual training part, but parsing the .tsv's and creating those dictionaries. In reference to the original question, Lintang, I believe the files are no longer compressed, but actually just dicom files. But it seems maybe you are using an older version of my code? One when the files were still compressed to .gz? Can you confirm that you are indeed using the most recent version of the code? Other than that, I'd have to ask if there are any differences between your local environment and the synapse environments in which I tested out the code. - Darvin
Michael Kawczynski (MichaelK), I also had this problem. I removed the int from ```int(row[3])``` to ```row[3]```
Also getting a similar ValueError: ``` STDOUT: Mon Oct 24 06:47:20 2016 STDOUT: +------------------------------------------------------+ STDOUT: | NVIDIA-SMI 352.99 Driver Version: 352.99 | STDOUT: |-------------------------------+----------------------+----------------------+ STDOUT: | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | STDOUT: | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | STDOUT: |===============================+======================+======================| STDOUT: | 0 Tesla K80 Off | 0000:87:00.0 Off | 0 | STDOUT: | N/A 31C P8 26W / 149W | 55MiB / 11519MiB | 0% Default | STDOUT: +-------------------------------+----------------------+----------------------+ STDOUT: | 1 Tesla K80 Off | 0000:88:00.0 Off | 0 | STDOUT: | N/A 31C P8 28W / 149W | 55MiB / 11519MiB | 0% Default | STDOUT: +-------------------------------+----------------------+----------------------+ STDOUT: STDOUT: +-----------------------------------------------------------------------------+ STDOUT: | Processes: GPU Memory | STDOUT: | GPU PID Type Process name Usage | STDOUT: |=============================================================================| STDOUT: | No running processes found | STDOUT: +-----------------------------------------------------------------------------+ STDERR: Python 2.7.6 STDOUT: --- STDOUT: Metadata-Version: 2.0 STDOUT: Name: tensorflow STDOUT: Version: 0.9.0 STDOUT: Summary: TensorFlow helps the tensors flow STDOUT: Home-page: http://tensorflow.org/ STDOUT: Author: Google Inc. STDOUT: Author-email: opensource@google.com STDOUT: Installer: pip STDOUT: License: Apache 2.0 STDOUT: Location: /usr/local/lib/python2.7/dist-packages STDOUT: Requires: numpy, six, wheel, protobuf STDOUT: Classifiers: STDOUT: Development Status :: 4 - Beta STDOUT: Intended Audience :: Developers STDOUT: Intended Audience :: Education STDOUT: Intended Audience :: Science/Research STDOUT: License :: OSI Approved :: Apache Software License STDOUT: Programming Language :: Python :: 2.7 STDOUT: Topic :: Scientific/Engineering :: Mathematics STDOUT: Topic :: Software Development :: Libraries :: Python Modules STDOUT: Topic :: Software Development :: Libraries STDOUT: Entry-points: STDOUT: [console_scripts] STDOUT: tensorboard = tensorflow.tensorboard.tensorboard:main STDERR: /usr/local/lib/python2.7/dist-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20. STDERR: "This module will be removed in 0.20.", DeprecationWarning) STDERR: I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally STDERR: I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally STDERR: I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally STDERR: I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so locally STDERR: I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally STDOUT: hdf5 not supported (please install/reinstall h5py) STDOUT: Parsing the csv's. STDERR: Traceback (most recent call last): STDERR: File "DREAM_DM_starter_tf.py", line 700, in STDERR: main(sys.argv) STDERR: File "DREAM_DM_starter_tf.py", line 689, in main STDERR: X_tr, X_te, Y_tr, Y_te = create_data_splits(path_csv_crosswalk, path_csv_metadata) STDERR: File "DREAM_DM_starter_tf.py", line 89, in create_data_splits STDERR: dict_tuple_to_cancer[(row[0].strip(), 'L')] = int(row[3]) STDERR: ValueError: invalid literal for int() with base 10: '.' ```

KeyError when running tensorflow example page is loading…