Created by Alberto Albiol alalbiol > Please cancel submissions 8464317 and 8464238, so we can submit the fixed one
Done . You should receive error messages saying that these two submissions failed, but they stopped because we manually intervened. Dear moderators,
we think that we finally found a posible found that explained this strange behaviour
(basically the bug corrupted the memory and the program behaved randomly)
Please cancel submissions 8464317 and 8464238, so we can submit the fixed one
Apologies I'm from Alberto's team '42 is the answer'.
I've just submited a Job produced on last Sunday that finished as core dumped... The job (submited on sunday) worked during 3 days and then coredumped.
We have tried to reproduce why was this situation and the result is shown above.
Now We have tried to 'resubmit' the job that was alived during some days and the result is that is:
```
STDOUT: READING PATIENT DATABASE: :[20000] 20000
STDOUT: READING PATIENT DATABASE: :[30000] 30000
STDOUT: READING PATIENT DATABASE: :[40000] 40000
STDOUT: [***] [ READING PATIENT DATABASE: ] Processed Elements: 42817
STDOUT: [mamo_contest::application::application()] /root/modelState/porpoise_filename_data_3.tsv not found.
STDOUT: Exam list not present using only: "/metadata/images_crosswalk.tsv"
STDOUT: [void patient_list::parse_files(const path&)] "?z??(?????(????(???|??)???)????(????lN.??52??????+????+???x(??? |u????*????jN.?)????+???? ?.? ?.??5a.??g??g??g??g?g??g?M7a.??g?57a.? ?.`*????M7a.?e.?)???@I7a.?H7a.?H7a.P`i?-?+???.???P`i?-R?1.?+????-????+???1.?+????-????+???VK0.????-????+????K0.???,???1CN..????,????0????3N.?????Gf.X????=??mH?h?7'f.?jh?,????f?-???H?h;??H?s ?,???X??/??52???.1.0-???H?h[nw/-????7fa9'f.|??Gf.?S7a.???h-?????-.f?-?-?????-.?E0. ??.?-?????-.x??. ??.?-????6/.p??. ??.?-?????/.p??..??? .????N./????bG????67a.?67a.?)hP/???M?L.8P?A?'????s ?jh/?????5a.??5a.?M7a./???`Gf.??????P`i?-???e.>d?t[?4d?t[?xz??67a.0z?P?`1?????L.?1???P?5a.?.;q??-???-l???-?*??-??????;q??-???-l???-?*??-???-?gP`i?-???-???q?p.1?52???\??-@???-?0????;/.0z?@???-?0???????0'U?-?.0=????(?-?]PP`i?-?]P?j?-?(?-??M0????-?J???:?,.??M??2?-8?2?-??3e.1V y0pCf.?~Cf.???.P>???`>???p>????>????>????(?-??RX?2?-X?2?-?9Y-?(?-0????????-u?L0??????R?@????????0????@????p?????????????? ????P@???Z?M??e.?Cf.Tf.?Th?Cf.?????TTf.Q7f.%?Y????-?Gf.?P7f.Tf.P`Cf.??-@?M????-??M????-P`Cf.?:?,.????-P`Cf.??K?P7f.h?Cf.jCP@???e??.P?Ef.6.p??0??,.??2f.????-P`Cf.d^?0??,.?y@?K??I\ ???-?y?]?Xy???pECf.?4Af.?A??????????0E1f.?y0P?Af.h?@f.hGf.??????e.0?,.?0?rS?0E1f.P?Af.W?L?????y?y?A???0E1f.???P`Cf.?y??Af. Jh?@f.TsP0E1f.h?@f.0E1f.?Gh?t?-???-
uV?q???0f.??x?~?-??Af.E?I.P?Af.?????Af.????,?.P?Ff.????????????H1f.??Af.??Af.&"?H1f.P???&&W?=@f.?H1f.?H1f.?H1f.?L??Af.h?@f.?H1f.P?Af.?????Af.4J ?e.????????pZ?]????????h?@f.??DpZ?h?@f.h?@f.?|?f.h?@f.]??????*]?????D`E??????`E???fr???]??????G???l1f.]????D??Hf.??????Hf.?(A??? ??Xh?? ???X??XQ?p60F???%?WG???E?e.G?????D-a\?1/?%?WG???-??R??A-??R?s@??[G???%?WG???N?WG???
STDERR: /sc1_infer.sh: line 21: 8 Segmentation fault (core dumped) python /root/bin/evaluation_merge.py
STDOUT: ]???]???/]????]???U]???d]???{]????]????]???5^???E^???a^???s^????^????^???_???_???/_???7_???B_???S_???g_???|_????_????_????_???!P??????d@@8 ?&&f. %?W
)I????_???9I?????bG???0.??_??x86_64
```
But the reason to stop last time was, just:
```
STDOUT: Data Shape = (4, 339)
STDOUT: X shape = (4, 339)
STDOUT: getting studio scores
STDOUT: 9957 R 0.105210187559
STDOUT: 9957 L 0.0125300827631
STDOUT: Batch Processed: elapsed since start: 164035.803323
STDOUT: 9965
STDOUT: New batch elapsed since start: 164046.099556
STDOUT: The score label new_score is not available
STDOUT: Data Shape = (4, 339)
STDOUT: X shape = (4, 339)
STDOUT: getting studio scores
STDOUT: 9965 R 0.0756200886285
STDOUT: 9965 L 0.0292592714558
STDOUT: Batch Processed: elapsed since start: 164046.358021
STDOUT: 9969
STDOUT: New batch elapsed since start: 164057.629836
STDOUT: The score label new_score is not available
STDOUT: Data Shape = (5, 339)
STDOUT: X shape = (5, 339)
STDOUT: getting studio scores
STDOUT: 9969 R 0.107686007631
STDOUT: 9969 L 0.0256583138454
STDERR: /sc1_infer.sh: line 21: 7 Segmentation fault (core dumped) python /root/bin/evaluation_merge.py
STDOUT: Batch
```
Which is quite diferent (in the job produced in Sunday the segmentation fault was later to try to read the inference files from the **/metadata** directory. Now all jobs stops just tring to read the **/metadata** directory, the same as the job we prepared to just know what was the issue.
So about your question:
```
Do you obtain systematically the same output when resubmitting the same job?
```
**Yes**, and we have the same sistematic output even with jobs that has passed the first steps.
And I can give now more information:
**No**, We still don't know if the problem we had previous to these submissions is due to our fault or just something that will be fixed.
Alberto,
Do you obtain systematically the same output when resubmitting the same job? It looks like you have some kind of serious error in your code, causing a core dump. Here's a bit more of your log file, which might help you "sleuth" the problem:
```
Loading init weights from : model_init_weights.hdf5
Simulation starts at 2017-03-15 22:00:28.037119
****************
Init test generator reto 1
READING PATIENT DATABASE: :[1] 1
READING PATIENT DATABASE: :[2] 2
READING PATIENT DATABASE: :[3] 3
READING PATIENT DATABASE: :[4] 4
READING PATIENT DATABASE: :[5] 5
READING PATIENT DATABASE: :[6] 6
READING PATIENT DATABASE: :[7] 7
READING PATIENT DATABASE: :[8] 8
READING PATIENT DATABASE: :[9] 9
READING PATIENT DATABASE: :[10] 10
READING PATIENT DATABASE: :[20] 20
READING PATIENT DATABASE: :[30] 30
READING PATIENT DATABASE: :[40] 40
READING PATIENT DATABASE: :[50] 50
READING PATIENT DATABASE: :[60] 60
READING PATIENT DATABASE: :[70] 70
READING PATIENT DATABASE: :[80] 80
READING PATIENT DATABASE: :[90] 90
READING PATIENT DATABASE: :[100] 100
READING PATIENT DATABASE: :[200] 200
READING PATIENT DATABASE: :[300] 300
READING PATIENT DATABASE: :[400] 400
READING PATIENT DATABASE: :[500] 500
READING PATIENT DATABASE: :[600] 600
READING PATIENT DATABASE: :[700] 700
READING PATIENT DATABASE: :[800] 800
READING PATIENT DATABASE: :[900] 900
READING PATIENT DATABASE: :[1000] 1000
READING PATIENT DATABASE: :[2000] 2000
READING PATIENT DATABASE: :[3000] 3000
READING PATIENT DATABASE: :[4000] 4000
READING PATIENT DATABASE: :[5000] 5000
READING PATIENT DATABASE: :[6000] 6000
READING PATIENT DATABASE: :[7000] 7000
READING PATIENT DATABASE: :[8000] 8000
READING PATIENT DATABASE: :[9000] 9000
READING PATIENT DATABASE: :[10000] 10000
READING PATIENT DATABASE: :[20000] 20000
READING PATIENT DATABASE: :[30000] 30000
READING PATIENT DATABASE: :[40000] 40000
[***] [ READING PATIENT DATABASE: ] Processed Elements: 42817
[mamo_contest::application::application()] /root/modelState/porpoise_filename_data_3.tsv not found.
Exam list not present using only: "/metadata/images_crosswalk.tsv"
[void patient_list::parse_files(const path&)] "?W??(?W?(?W?X?W?X?W???W??|?
? ?G???? ???W??W???W?p?? ?? ??P?W??z?
? ?W???W??0!
??0!
??@?f?????????e???????f??????f??0!
??W????f??k??W?@??f????f????f??s????W?P ?W??s??R?
???W?0 ?W???W?
$?
??W?0 ?W??W?V[?
???0 ?W?0?W??[?
? ??P?W?1S?
?P ?W??W?
?W??C?
????l?H????=x????k??????W?? ?W?;???W??/??G?????1??W?[n?W??+a??k?|?l????f????W????
?f??
?W????
??U?
? ?&"
??W????
?x?&"
? ?&"
? ?W??F?
?p?&"
? ?&"
?0 ?W????
?p?&"
?P ?W?p ?W??/?
?P
?W?!?C#????f????f???
?W?M??
?8???A?'????????
?W??]?f??I?f????f?P
?W?`?k??????s????k?%hq_(-??lq_(-?????????
????
?W?P?f???&"
?;?i???i??l?????????????;?i???i??l?????????j??????s??j????q?K ???G?????f??@5g???
?W??K?
?5g??
?W??L?
?@5g??
?W??L?
?@5g????@
?W??M?
?`
?W@5g???
?W????
????????
?? ???s?? ?W?
?W?T?????0??????kJ0??????W??9?k??l{??k??l??uV??W???W?i??k??l{??l{?*??k??q????J?0!?I???k?P?W?????W?P??p?,?? ?????????
???W?&&?W???????????0?h{???l{??l{?Pq?G????????Eu???Wo?Wcow??W?0@?W?|????k?8
?`?k?P??s????k?PP?W????f?Pg{?^Pp?P?W????
???W?????W??]?
???|??h?k?`?k?8????W?s?
??????0?W???
?0?W????p?????
?p?W?Io?
?`?k???W???6??W?????6??W?B?
?+????s????6??0?????JT?s???F??6?W?k/?
/sc1_infer.sh: line 21: 8 Segmentation fault (core dumped) python /root/bin/evaluation_merge.py
??W??s????6Jf?*??s??0??k????????6??W?pR?*?`??*???6?i?*? ?W??
?*?=3I ????0! ?W?pR?*?`??*??R?*??o????W???6?s????*??W?????RP?W??s??pR?*???M?W?????????K`?k???37119??W???W???W??l{?P?W?P??W?eq?k??????W???W???W???W???W???W???W???W?/\
????W?<=?4??9?R*\0?W??????0!p?U?????(J??? 0?W?0?W???F=?4??Y?
P ?4?0?R 0?W?0?W?????W?Za<=??4????`&"????{S??W?@?MPp???Pp???P??k?Q?*??_P???k?P?6??6P?6?M???P??k?P??k???k?
?uV??W?=?4?=?4?P ?4?P??k?P?68?tS????P?60????P??k???????8P?9P?????P??k?Mm?4?P?6??II?L?$?j??]P0????Dm?4???????0!?l?4?pT?4????????????4???????l{?HR???U?J0p?W???W??0!???4? ú???4? úW?L?? ????????????????0!P??k?h@????J????W?0????????W?(s????]P?s???]P0? ??(s?????0P@????J?????4???MHR????V????
?j???1V0??0??k????k??a{???W???W???W???W???W?(s?????RW???W????9Y?(s?????W?P@???u?L??W???R?????W???W???W???W???W???W?p?W?????Mu?kE?k??k??ThN?k??0!??T?k???k?%?YP@?????l????k??k?P??k?ez??@?M??ez????MxH???P??k????4?xH???P??k???K???k?hN?kjC???ej{?Pl??p??0?4?+4?k?P@???P??k?d?-0?4???@?K??I\ ??? ú?-???0!p??k????k?@?W????????0??k???0P@?k?h1?k?h?l??0!??k?0??4??0?rS?0??k?P@?k?W?L????????0?W?0??k??0!P??k? ú?A?k?
uV??&"??k???+x?>???A?k?E?IP@?k???+?A?k??0!?l$?Pfl??0!???????????k??A?k??A?k?&"???k?P?&&?&&W???k????k????k????k??L?A?k?h1?k????k?P@?k??0!?A?k?4J??k???&&??&&?,p?'??W?E>?W?a>?W?s>?W??>?W??>?W???W???W?/??W?7??W?B??W?S??W?g??W?|??W????W????W????W?!?W????d@@8 ?k? %?W
?$?W????W??$?W??!?C#?A)?_n??x86_64
```