No Description

Horea Christian 61d09b98de Start log-in shell in order for environment variables to load 2 months ago
.datalad d6959d2965 [DATALAD] new dataset 1 year ago
.github 0869cdec30 Configuring git for test environment 2 months ago
.paper b7f400cc93 [DATALAD RUNCMD] proper capitalization of DataLad 5 months ago
.reproman c8e0b80f21 [ReproMan] save results for 20230706-100912-74dd 10 months ago
code 59bf565c3b Discussion re-write and removal of all mentions of “standard” 6 months ago
inputs 6c23921c0f restore inputs/mousebraintemplates to latest 10 months ago
outputs a76005622d Re-execution done by a test user 6 months ago
publishing 2d45cc4e62 Moved file to more appropriate location 2 months ago
reproman d03d61b91e reproman progress 10 months ago
scratch a390274b8b Make SCRATCH_PATH to default to local scratch/ which we have with .gitignore file 9 months ago
.gitattributes ebbf24b7ab add a rule to commit ./publishing/figs/workflow.png directly to git 6 months ago
.gitignore 1455927c59 git ignore now produced pdf which were removed from git 5 months ago
.gitmodules ce6da4d909 .git at the end of url upsets git annex 10 months ago
CITATION.cff fc36735ea8 Fix the title and provide license for CITATION.cff 6 months ago
LICENSE 06e097762a Release under Apache 2.0 with Horea's blessing 6 months ago
Makefile 61d09b98de Start log-in shell in order for environment variables to load 2 months ago
README.md ac2bc0b40f Readme TODO addressed 2 months ago
inputs.txt b32a971f29 Place desired inputs listing into external file 9 months ago
paperdiff.sh 95fbbedae4 added binary graphic for workflow 6 months ago
yoh-notes 025f95d92b Removed notes deprecated as part of re-write 6 months ago

README.md

OPFVTA Reexecution Study

Manuscript PDF Poster PDF

This repository contains code, data, and prose as required to:

  • re-execute analysis and production of the OPFVTA article in a container environment.
  • generate a meta-article, detailing the reexecution environment and including a reference example of how to inspect reproducibility across reexecutions.

How to clone this repository

This repository contains submodules/subdatasets, which should be installed alongside the parent repository:

datalad install -r https://gin.g-node.org/TheChymera/opfvta-reexecution.git
cd opfvta-replication-2023

How to re-run

This reexecution system contains two targets, the original article, and publishing documents for the “meta-article”. The latter can be executed in absence of the former, as it draws on the cummulative reexecution record found in the outputs/ directory. Executing the former will produce your own reexecution file, which you can contribute to the record, and which all publishing documents created on your system will take into account. For a graphical break-down, see the following workflow figure:

workflow

I. Reexecuting the OPFVTA Article

Warnings:

  1. We estimate that the analysis required more than 500GB, 400GB of which will be stored in a scratch directory, which is ./scratch/ by default and can be configured with the SCRATCH_PATH variable.
  2. The analysis self-limits RAM to run on less powerful systems
  3. Reexecuting the computation as well as the article is time consuming and resource-intensive, it is recommended to use a tool such as tmux or screen to preserve long running processes.

First, retrieve the data and other large files:

make submodule-data

Note that the outputs/ directory will have some content from other executions (organized by datetime) for comparison.

Once the required content has been fetched, you can reexecute the OPFVTA article via singularity or oci containers. This step generates intermediate results in the scratch directory which are not preserved in this repository by default, as configured in scratch/.gitignore. The final result is a PDF article and its associated elements (mainly volumetric binary data, .nii.gz files) which will be stored in a datestamped and annotated directory under outputs/. Most large files, including the results are stored and versioned via git-annex and therefore present in this repository. After executing the analysis the output will have a new entry for the data produced, which can be saved and recorded with datalad save.

For apptainer/singularity:

make analysis-singularity

or

With docker or podman, you can execute the analysis inside an OCI container.

make analysis-oci

The results of the analysis will be stored under outputs/<execution-environment>_<datestamp>

II. Reexecuting the Meta-Article

To avoid confusion, we use the term 'article' to refer to a version of the OPFVTA article, and 'meta-article' to refer to the paper regarding the reexecution process and findings.

Generation of the meta-article uses files generated by the OPFVTA analysis which are expected to be in the outputs/ directory. Prior to generating the meta-article, outputs/ must contain the data from previous analyses, which is not locally available by default.

Note: Regenerating the OPFVTA article will create an additional pdf, but the previous pdfs are required to compare.

To fetch the OPFVTA analysis outputs:

datalad get outputs/*/article.pdf

Finally we generate new graphical elements and compile the text via LaTeX into a novel meta-article PDF.

The meta-article can then be generated by a container with all of the dependencies preinstalled using:

make container-article

or

If you prefer to run the generation outside of a container, you will need to install dependencies (suggested to use distribution package manager, packages below are debian names):

- laTex
- biber
- datalad
- diff-pdf
- graphviz
- matplotlib
- pandas
- seaborn
- sklearn
- statsmodels
- yaml

You will also need to install sourceserifpro font using the tlmgr.

make article

In either of those cases you can then view the produced meta-article at TODO.

Cleaning up between runs

The steps are designed to be idempotent, and some dynamically generated components will not be regenerated for subsequent runs. If you are not merely trying to get a PDF to read or working on the human-readable text — but instead working on the figure-generating code — it is advisable to always deep-clean the dynamic elements in between re-making the article.

make article-clean && make article

Internal

We openly share all code and data via the Gin repository referenced above. This open infrastructure is however slow, which may be particularly inconvenient for prolonged development work. Trusted collaborators may instead prefer to use the smaug.dartmouth.edu remote.

To use this remote you should:

  1. Make sure you have SSH access to smaug.dartmouth.edu, and have configured the host via your config file; you can do so by running:
cat >> ~/.ssh/config<< EOF
Host smaug
	Hostname smaug.dartmouth.edu
	AddKeysToAgent yes
	Port <secret_port>
	User <your_username_for_which_smaug_has_your_SSH_public_key>
EOF
  1. Add the remote to the Git repository, and make sure you are synced up:
cd path/to/your/repo
git remote add smaug smaug:/mnt/btrfs/datasets/incoming/con/opfvta-replication-2023.git
datalad get . -s smaug
  1. Optionally, and only if you are an internal contributor looking to push t, it makes sense to to automatically keep our three remotes (gin, smaug, GitHub) in sync, you should make use