Dear AMP-AD maintainers, We are preparing a manuscript for submission to the Journal of Statistical Software (https://www.jstatsoft.org/). The manuscript describes our solver for ordinal regression problems (https://github.com/labsyspharm/ordinalRidge). We would like to show its application to real data, and predicting Braak stage from RNAseq expression seems like a fantastic setting to evaluate an ordinal regression solver. The journal requires that "All figures, tables, and other output presented in the manuscript must be fully and exactly reproducible on at least one platform. [...] Typically, reproducibility is demonstrated by a standalone replication script." We can compose a script that automatically pulls relevant data from Synapse to generate the manuscript figures, but unfortunately a reviewer would not be able to execute this script unless they had signed the corresponding data usage agreements. One possible workaround may be to scramble gene and sample identities (e.g., shuffle rows and label them as Gene1, Gene2, etc., followed by a similar operation for the columns). Such a scrambled matrix would not be useful for any biological discovery, but it would preserve the overall statistical properties of the dataset. Since our manuscript is focused on software and not biology, the scrambled matrix would be enough to demonstrate the needed functionality. My question is whether releasing such a scrambled matrix alongside a script to reproduce the manuscript figures would violate the data usage agreements? We will of course cite the original source of the data with all relevant acknowledgments. If releasing a scrambled matrix would be a violation of the DUA, do you have any suggestions for how we can leverage AMP-AD data in a way that allows for complete reproducibility with an automated script, when the script is executed by somebody who may not have access to the data (such as a reviewer)? Please let me know if you need any additional details, -Artem

Created by Artem Sokolov ArtemSokolov
@ArtemSokolov that sounds great! Please let us know when you hear back from the journal
Hi @elang Thanks for following up. Unfortunately, no, it's not just for anonymous reviewers. The journal [instructions for authors](https://www.jstatsoft.org/authors) state that: > The replication materials must enable reproducibility of all results from the manuscript (see also Review Process). In case the attachments are too large for upload, this should be explained in the submission and a link should be provided. and these replication scripts are released alongside published articles (e.g., https://www.jstatsoft.org/article/view/2980). In the past, we requested that users of our scripts sign the data usage agreements (https://github.com/labsyspharm/DRIAD#wrangling-amp-ad-datasets). We will ask the journal if it's OK to do the same here, while providing anonymous access for reviewers, which would allow them to run the script without waiting for DUA approval. We are also not 100% committed to this journal. I noticed that many articles in it are published 1.5 - 2 years after submission, which is a bit too slow for us, so we may be looking at other journals as well. I will follow up here if we need anonymous access to the data. Thanks very much for reminding me about this feature!
@ArtemSokolov thanks for reaching out. To confirm, is the scrambled data only needed for the purpose of an anonymous manuscript review? We actually have a process at Sage whereby we can grant reviewers anonymous access to data hosted on Synapse. I would recommend not scrambling the matrix data, and instead going through this anonymous review workflow so that we can get the reviewers direct access to the Synapse data. Please let me know if you'd like to move forward with that process, and I can review next steps with you.
Hi Artem, sorry for the slow reply. Tagging @elang who is our human data governance support to answer the question about the scrambled matrix

Including a copy of scrambled gene expression data with a manuscript submission page is loading…