title: "CNS2023 Data Workflows Workshop"
author: "Michael Denker, Moritz Kern, Thomas Wachtler and Reema Gupta"
date: 2023-07-15
Data Workflows Tutorial – CNS 2023, Leipzig
This repository contains (to read as will contain) all files required for the CNS*2023 Tutorial titled, "T08: Using open tools to build efficient workflows for data access, management and analysis".
Schedule
The workshop will take place on the 15th of July 2023 in two parts:
Overview
Session I: 9:00 -- 10:10 am
- 9:00-9:10 Welcome, Introduction
- 9:10-9:30 Introduction to GIN
- 9:30-9:40 Introduction to the Dataset
- 9:40-10:10 Primer on Neo I
----- 10:10-10:40 COFFEE BREAK -----
Session II: 10:40--12:10 am
- 10:40-11:00 Primer on Neo II
- 11:00-11:30 Data Analysis with Elephant
- 11:30-12:10 Data Organization and Storage with NIX
Requirements
To benefit from the workshop you need to have some experience with the Python programming language. To follow the tutorial, you may use any one of the following three options:
1. Working offline
Before attending the workshop please make sure that either the machine you are working on can run jupyter notebooks and install python packages.
Either download the contents of the CNS2023-Data-Workflows repository via the web or use the command line to clone the repository using git clone https://gin.g-node.org/CNS2023-Leipzig/CNS2023-Data-Workflows
.
In the latter case, please manually download the data file which is controlled by git annex and will not be cloned by using git:
https://gin.g-node.org/CNS2023-Leipzig/CNS2023-Data-Workflows/raw/master/data/l101210-001_small_cut_60.0s.nix
To make sure your machine is set up for the workshop, please install the Python requirements running pip install -r requirements.txt
and start (jupyter noteboook requirements.ipynb
) and run the requirements jupyter notebook before the workshop. We recommend using Anaconda as a Python virtual environment to make sure you are running the workshop in a clean Python environment.
2. EBRAINS Collaboratory
To interactively follow the tutorials online, we suggest creating a free EBRAINS account (https://www.ebrains.eu/page/sign-up) in advance. Also available on tutorials.python-elephant.org -> /Events/CNS_2023
Dataset Used
The Reach-2-Grasp experiment
Full data manuscript and dataset
- Brochier, T., Zehl, L., Hao, Y., Duret, M., Sprenger, J., Denker, M., Grün, S. & Riehle, A. (2018). Massively parallel recordings in macaque motor cortex during an instructed delayed reach-to-grasp task, Scientific Data, 5, 180055. http://doi.org/10.1038/sdata.2018.55
- https://gin.g-node.org/INT/multielectrode_grasp
Tutorial Abstract
Neuroscientists today face challenges in managing the growing volume and complexity of data generated through rapid technological and methodological advancements and sophisticated experimental paradigms. Data management tools and methods provide indispensable solutions for researchers to efficiently handle, organize, and analyze datasets, facilitating model validation, refinement, and simulation, while fostering collaborations. This tutorial presents examples combining multiple tools synergistically into a complete digitized workflow, to help researchers manage and control data and analysis processes.
- odML (https://g-node.org/odml) is an open, lightweight and flexible format that provides a common schema (with implementations in XML, JSON, YAML) to collect, organize and share metadata in a human- and machine-readable way.
- NIX (https://g-node.org/nix) is a lean data model and file format for storing fully annotated scientific datasets, i.e. the data together with rich metadata (odML) and their relations in a consistent, comprehensive format.
- GIN (https://gin.g-node.org) is a platform for version-controlled (git and git-annex) data management and collaboration. It supports any file type and folder structure, provides both web and command-line access, option for local installation, and services including format validation and data publication (DOI).
- Neo (http://neuralensemble.org/neo), provides programmatic data objects for working with and representing electrophysiological data, and can read data from many proprietary formats. In combination with NIX, Neo makes electrophysiological data interoperable with generic analysis scripts, tools and services.
- Elephant (https://python-elephant.org) provides a large portfolio of standard and advanced methods for analyzing data from neuronal spike trains or time series data, such as LFPs. The Neo data model makes them easily accessible to scientists and applications.
- Alpaca (https://alpaca-prov.readthedocs.io) enables simple capture of human-readable provenance of the data processing workflow.
Background reading:
- Grewe, J., Wachtler, T., Benda, J., 2011. A Bottom-up Approach to Data Annotation in Neurophysiology. Frontiers in Neuroinformatics 5, 16. https://doi.org/10.3389/fninf.2011.00016
- Zehl, L., Jaillet, F., Stoewer, A., Grewe, J., Sobolev, A., Wachtler, T., Brochier, T.G., Riehle, A., Denker, M., Grün, S., 2016. Handling Metadata in a Neurophysiology Laboratory. Frontiers in Neuroinformatics 10, 26. https://doi.org/10.3389/fninf.2016.00026
- Sprenger, J., Zehl, L., Pick, J., Sonntag, M., Grewe, J., Wachtler, T., Grün, S., Denker, M., 2019. odMLtables: A User-Friendly Approach for Managing Metadata of Neurophysiological Experiments. Front. Neuroinform. 13, 62. https://doi.org/10.3389/fninf.2019.00062
- Brochier, T., Zehl, L., Hao, Y., Duret, M., Sprenger, J., Denker, M., Grün, S., Riehle, A., 2018. Massively parallel recordings in macaque motor cortex during an instructed delayed reach-to-grasp task. Scientific Data 5, 180055. https://doi.org/10.1038/sdata.2018.55
- Denker, M., Grün, S., Wachtler, T., Scherberger, H., 2021. Reproducibility and efficiency in handling complex neurophysiological data. Neuroforum 27, 27–34. https://doi.org/10.1515/nf-2020-0041