Data Workflows Tutorial – CNS 2023, Leipzig

Michael Denker e7934d3962 Merge remote-tracking branch 'refs/remotes/origin/master' 10 maanden geleden
data 1949df5664 gin commit from PF2DSLKM 10 maanden geleden
images e35a3ee5ed Add pictures for L arrays 10 maanden geleden
notebooks e7934d3962 Merge remote-tracking branch 'refs/remotes/origin/master' 10 maanden geleden
slides 82b9560726 Add slides 10 maanden geleden
.gitignore bf4e7291dc gin commit from PF2DSLKM 10 maanden geleden
README.md 60ea21888e git-annex in denker@PF3RD7D6 10 maanden geleden
requirements.ipynb 308dcb4423 Initial requirement files 10 maanden geleden
requirements.txt c0593a007b Added nixio version 10 maanden geleden

README.md


title: "CNS2023 Data Workflows Workshop" author: "Michael Denker, Moritz Kern, Thomas Wachtler and Reema Gupta"

date: 2023-07-15

Data Workflows Tutorial – CNS 2023, Leipzig

This repository contains (to read as will contain) all files required for the CNS*2023 Tutorial titled, "T08: Using open tools to build efficient workflows for data access, management and analysis".

Schedule

The workshop will take place on the 15th of July 2023 in two parts:

Overview

Session I: 9:00 -- 10:10 am

  • 9:00-9:10 Welcome, Introduction
  • 9:10-9:30 Introduction to GIN
  • 9:30-9:40 Introduction to the Dataset
  • 9:40-10:10 Primer on Neo I

----- 10:10-10:40 COFFEE BREAK -----

Session II: 10:40--12:10 am

  • 10:40-11:00 Primer on Neo II
  • 11:00-11:30 Data Analysis with Elephant
  • 11:30-12:10 Data Organization and Storage with NIX

Requirements

To benefit from the workshop you need to have some experience with the Python programming language. To follow the tutorial, you may use any one of the following three options:

1. Working offline

Before attending the workshop please make sure that either the machine you are working on can run jupyter notebooks and install python packages.

Either download the contents of the CNS2023-Data-Workflows repository via the web or use the command line to clone the repository using git clone https://gin.g-node.org/CNS2023-Leipzig/CNS2023-Data-Workflows. In the latter case, please manually download the data file which is controlled by git annex and will not be cloned by using git: https://gin.g-node.org/CNS2023-Leipzig/CNS2023-Data-Workflows/raw/master/data/l101210-001_small_cut_60.0s.nix

To make sure your machine is set up for the workshop, please install the Python requirements running pip install -r requirements.txt and start (jupyter noteboook requirements.ipynb) and run the requirements jupyter notebook before the workshop. We recommend using Anaconda as a Python virtual environment to make sure you are running the workshop in a clean Python environment.

2. EBRAINS Collaboratory

To interactively follow the tutorials online, we suggest creating a free EBRAINS account (https://www.ebrains.eu/page/sign-up) in advance. Also available on tutorials.python-elephant.org -> /Events/CNS_2023

Dataset Used

The Reach-2-Grasp experiment

Full data manuscript and dataset

Tutorial Abstract

Neuroscientists today face challenges in managing the growing volume and complexity of data generated through rapid technological and methodological advancements and sophisticated experimental paradigms. Data management tools and methods provide indispensable solutions for researchers to efficiently handle, organize, and analyze datasets, facilitating model validation, refinement, and simulation, while fostering collaborations. This tutorial presents examples combining multiple tools synergistically into a complete digitized workflow, to help researchers manage and control data and analysis processes.

  • odML (https://g-node.org/odml) is an open, lightweight and flexible format that provides a common schema (with implementations in XML, JSON, YAML) to collect, organize and share metadata in a human- and machine-readable way.
  • NIX (https://g-node.org/nix) is a lean data model and file format for storing fully annotated scientific datasets, i.e. the data together with rich metadata (odML) and their relations in a consistent, comprehensive format.
  • GIN (https://gin.g-node.org) is a platform for version-controlled (git and git-annex) data management and collaboration. It supports any file type and folder structure, provides both web and command-line access, option for local installation, and services including format validation and data publication (DOI).
  • Neo (http://neuralensemble.org/neo), provides programmatic data objects for working with and representing electrophysiological data, and can read data from many proprietary formats. In combination with NIX, Neo makes electrophysiological data interoperable with generic analysis scripts, tools and services.
  • Elephant (https://python-elephant.org) provides a large portfolio of standard and advanced methods for analyzing data from neuronal spike trains or time series data, such as LFPs. The Neo data model makes them easily accessible to scientists and applications.
  • Alpaca (https://alpaca-prov.readthedocs.io) enables simple capture of human-readable provenance of the data processing workflow.

Background reading:

  • Grewe, J., Wachtler, T., Benda, J., 2011. A Bottom-up Approach to Data Annotation in Neurophysiology. Frontiers in Neuroinformatics 5, 16. https://doi.org/10.3389/fninf.2011.00016
  • Zehl, L., Jaillet, F., Stoewer, A., Grewe, J., Sobolev, A., Wachtler, T., Brochier, T.G., Riehle, A., Denker, M., Grün, S., 2016. Handling Metadata in a Neurophysiology Laboratory. Frontiers in Neuroinformatics 10, 26. https://doi.org/10.3389/fninf.2016.00026
  • Sprenger, J., Zehl, L., Pick, J., Sonntag, M., Grewe, J., Wachtler, T., Grün, S., Denker, M., 2019. odMLtables: A User-Friendly Approach for Managing Metadata of Neurophysiological Experiments. Front. Neuroinform. 13, 62. https://doi.org/10.3389/fninf.2019.00062
  • Brochier, T., Zehl, L., Hao, Y., Duret, M., Sprenger, J., Denker, M., Grün, S., Riehle, A., 2018. Massively parallel recordings in macaque motor cortex during an instructed delayed reach-to-grasp task. Scientific Data 5, 180055. https://doi.org/10.1038/sdata.2018.55
  • Denker, M., Grün, S., Wachtler, T., Scherberger, H., 2021. Reproducibility and efficiency in handling complex neurophysiological data. Neuroforum 27, 27–34. https://doi.org/10.1515/nf-2020-0041