The status of my submissions is "INVALID" I got emails including the invalid reason following: Error encountered while running your Docker container; contact the Challenge Organizers in the Discussion Board for more info. I also checked the log files, but I could not any clues to solve the issue - it is R code, but there is no information involved in R. There is no issue when I use the docker images on my PC or server. I will really appreciate if someone help me. Best Regards, Gabe

Created by Gabriel Kim GabeKim
Sorry for the late reply Gabe. This one fell off our radar for a bit. It seems that there might just be an issue with parsing the csv. Are you using the default delimiters? Parameter|default|unix delimiter|, |, doublequote |True | True escapechar |None |None lineterminator|\r\n | \n quotechar |" | " quoting|csv.QUOTE_MINIMAL | csv.QUOTE_ALL skipinitialspace |True | False strict | False | False
@vchung, @BlueT, @jgolob, First, thanks again for all your helpful replies. What I figured out is that the test input CSV files will contain quotations for values while the training files do not have (the last error I got is involved in the quotation issue.) I think it will also be for column names in the first header lines. I know this challenge is a totally blind or black box test, so I understand you do not provide the column names of taxonomy files either. But, the codes of participants to handle the unknown columns can have unexpected errors, and also they also can cause false predictions. For that reason, I would like to ask you to provide just the column names. You don't have to share any other values including outcomes. The challenge will be still meaningful, or more meaningful by reducing false prediction models. Actually, I was planning to spend my third submission to know if my R code to read the input files in the second submission is correct, because I used a different R code in the first submission and got a higher score for the first submission even if I prepared a better prediction model for the second submission. If the third submission with just changing the code of the second submission to the first submission one get higher score, I can realize that the code of the second submission have an inssue to read input files. However, if you provides the column names of input files, I can save my submission count and my energy and use them to make better prediction models. Thank you. Gabe
@vchung, I fixed it. Thank you.
@HuNBiome_Gabe , Your recent submission (ID 9723488) has an error related to the `read.table` function in your code - it is currently failing because it is expecting all rows to have the same number of elements. Hope this helps!
Hi @jgolob, Thanks for the helpful explanation, but I think the error log will not be identical to the previous one. Can you please share the error log from the R code?
Greetings! Great question. The only feature tables (in wide format, with one row-per-specimen) where the feature columns should be identical for the training and validation data sets are: `alpha_diversity/alpha_diversity.csv` `phylotypes/phylotype_nreads.1e0.csv` `phylotypes/phylotype_nreads.1e_1.csv` `phylotypes/phylotype_nreads.5e_1.csv` `phylotypes/phylotype_relabd.1e0.csv` `phylotypes/phylotype_relabd.1e_1.csv` `phylotypes/phylotype_relabd.5e_1.csv` `community_state_types/cst_valencia.csv` These are the harmonized data features, sufficiently consistent across studies after [placement on phylogenetic scaffolding](https://www.biorxiv.org/content/10.1101/2022.07.26.501561v1) to be shared entirely between the validation and training sets. The remainder of wide-format data are not guaranteed to have all columns shared between the training and test data sets. This includes _all_ of the taxonomy-containing tables. There may be 'missing' columns present in the training data that are not in the validation (which can be inferred to be zero) and additional columns in the validation data not found in the training set. [Taxonomic annotation in microbiome data is fraught](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1690-0), and thus is not able to be fully harmonized across studies. A model relying upon it will need to address this limitation of this feature-type.
Hi @vchung and @BlueT, Thanks for the replies. They are helpful and the first submission was successfully valid. I editted the code, submited the second one, and got the invalid error again. I am sorry to bother you, but can you please share the error log of the second submission? Best Regards, Gabe
Hi @vchung , Can you please confirm that the names of all columns of the feature tables in the training sets have corresponding ones in the test set? Thank you.
Hi @HuNBiome_Gabe , For this Challenge, we are not sharing any Docker logs, hence the error message to reach out to us for more information. Apologies if this was not made clear! I looked into your two submissions and they both received the same error, that is: ```r Error in predict.xgb.Booster(modelFit, newdata) : Feature names stored in `object` and `newdata` are different! Calls: predict ... probFunction -> -> predict -> predict.xgb.Booster ``` Hopefully this helps!

invalid submission page is loading…