Hello organizers,
Could you help me solve the issues of organizing the docker container correctly for the task 2 submission?
I tried both our model and the base model from the Task2_Example.ipynb and followed the instructions from this GitHub repo: https://github.com/Sage-Bionetworks-Challenges/sample-model-templates. The containers successfully ran locally but failed with an "ERROR" workflow status after submitting. I checked the logs and guessed that the following lines might indicate the error. However, the code contained `results.to_csv("/output/predictions.tsv", sep="\t", index=False)` and it should have generated predictions in the output folder.
```
STDERR: 2023-07-14T03:18:55.906036931Z mounting volumes
STDERR: 2023-07-14T03:18:55.906039815Z checking for containers
STDERR: 2023-07-14T03:18:55.906042909Z running container
STDERR: 2023-07-14T03:18:55.906045665Z creating logfile
STDERR: 2023-07-14T03:18:55.906048665Z finished training
STDERR: 2023-07-14T03:18:55.906051339Z Traceback (most recent call last):
STDERR: 2023-07-14T03:18:55.906054213Z File "/var/lib/docker/volumes/workflow_orchestrator_shared/_data/8674d223-5f3a-41f5-8e67-543ee688a94e/node-033ce7cb-adfa-4c64-9c70-f2676d6a753d-1cc402dd0e11d5ae18db04a6de87223d/tmp913x51th/a8d5c382-d793-4b77-ba8e-1440e3888197/tlkg0vp8s/tmp-outwd7i08yy/run_docker.py", line 234, in
STDERR: 2023-07-14T03:18:55.906057612Z main(syn, args)
STDERR: 2023-07-14T03:18:55.906060455Z File "/var/lib/docker/volumes/workflow_orchestrator_shared/_data/8674d223-5f3a-41f5-8e67-543ee688a94e/node-033ce7cb-adfa-4c64-9c70-f2676d6a753d-1cc402dd0e11d5ae18db04a6de87223d/tmp913x51th/a8d5c382-d793-4b77-ba8e-1440e3888197/tlkg0vp8s/tmp-outwd7i08yy/run_docker.py", line 207, in main
STDERR: 2023-07-14T03:18:55.906064326Z raise Exception("No 'predictions.tsv' file written to /output, "
STDERR: 2023-07-14T03:18:55.906068059Z Exception: No 'predictions.tsv' file written to /output, please check inference docker
```
Is there anything I can do to check this issue locally, or could you help me check if there are any other bugs that cause this error?
Thank you very much for your time and help!
Best,
Kaiwen
Created by Kaiwen Deng dengkw @dengkw No worries. I'll still drop your error here so you have it.
```
[5221 rows x 2455 columns]
NameError: name 'LabelEncoder' is not defined
```
Hi @mdsage1
EDIT: I figured out the bugs. Sorry for disturbing you.
Kaiwen Hi @dengkw
Your log has the following error:
"
TypeError: ufunc 'divide' not supported for the input types, and the inputs
could not be safely coerced to any supported types according to the casting rule
''safe''
"
I hope that helps! Thank you! Hi @vchung
Sorry to disturb you again. Can I request the logs of submission 9737099?
Thank you very much for your time and help! @dengkw ,
Yes, the workflow has been updated to make it clear that Docker logs will not be shared for this challenge, as the data is sensitive. We understand that this does create an inconvenience when errors arise, but we will do our best at making the debugging process as seamless as possible!
Below are the logs for your requested invalid submissions:
**9737021**
```python
Traceback (most recent call last):
File "run_model.py", line 240, in
main()
File "run_model.py", line 216, in main
input_data, input_gt, label_encoder = process_raw_data(input_dir)
File "run_model.py", line 137, in process_raw_data
cshq = cshq.groupby(["Participant_ID"]).agg(cshq_agg).reset_index()
File "/usr/local/lib/python3.8/site-packages/pandas/core/frame.py", line 8252, in groupby
return DataFrameGroupBy(
File "/usr/local/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 931, in __init__
grouper, exclusions, obj = get_grouper(
File "/usr/local/lib/python3.8/site-packages/pandas/core/groupby/grouper.py", line 985, in get_grouper
raise KeyError(gpr)
KeyError: 'Participant_ID'
```
**9737026**
```python
ValueError: DataFrame.dtypes for data must be int, float or bool.
Did not expect the data types in the following fields: entity_Loss_Survey_id,
Loss_Relationship_To_Participant,
Loss_Relationship_To_Participant_Other_Curated,
Loss_Participant_Age_Passed_Away, Loss_Date_Participant_Month,
Loss_Date_Participant_Day, Loss_Date_Participant_Year, Loss_Autopsy_YN,
Loss_BioSamples_Given_YN, Loss_Connect_OtherData_RX,
Loss_Patient_Community_Connection
```
**9737036**
```python
KeyError: 'Participant_ID'
``` Hi @vchung
The docker evaluation system seems to have changed after today's office hour. The returned message said, "Error encountered while running your Docker container; logs cannot be shared for this challenge, so please contact the organizers in the Discussion Forum for help."
Can I have the logs for submissions 9737021, 9737026, and 9737036? Since the docker logs only showed the errors for the evaluation program, can I also have the logs showing why my models are stopped during training?
Thank you very much! @dengkw ,
Yes, we can keep the submission queues for the Leaderboard Round open during the Final Round. However, we cannot be held accountable for any models that may have been submitted to the wrong queue, i.e. you intended to submit to the Leaderboard Round queue but submitted to the Final Round by mistake. Please be mindful when making your submissions once the Final Round begins. Hi @vchung,
Are we still able to make the leaderboard round submission during the final round? If not, is it possible to extend the time of the leaderboard round?
Thank you very much @dengkw ,
Great question. For the final round, you will have a total submission limit of 3, where we will take the _best-performing_ model for final ranking.
Hope this helps! Hi @vchung
Thank you very much for helping me solve this issue. I found that the submission of the base model just worked, and failed submissions reported the errors.
The failed ones are our test submissions. We got the "prediction not found" error and tried to directly copy one predictions.tsv to /output for debugging.
Btw, do we have similar submission limits in the final round of task 2? Or the submission queue will be different from the leaderboard round?
Thank you! @dengkw ,
First, thank you so much for your patience as we try to address the issue!
There was an internal error on our end, and a fix has been made. I will re-run some of your earlier submissions to see whether that particular issue has been resolved.
EDIT: typo fix