Data Workflows Tutorial – CNS 2023, Leipzig

Reema Gupta dda1b305c3 Added logos and cell metadata for slide rendering před 10 měsíci
images dda1b305c3 Added logos and cell metadata for slide rendering před 10 měsíci
notebooks dda1b305c3 Added logos and cell metadata for slide rendering před 10 měsíci
.gitignore c80e69547f Data files in gitignore před 11 měsíci
LICENSE 7259ba8ee3 Initial commit před 11 měsíci
README.md 9d36688863 Updated README with detailed schedule před 10 měsíci
requirements.ipynb 308dcb4423 Initial requirement files před 10 měsíci
requirements.txt 308dcb4423 Initial requirement files před 10 měsíci

README.md


title: "CNS2023 Data Workflows Workshop" author: "Michael Denker, Moritz Kern, Thomas Wachtler and Reema Gupta"

date: 2023-07-15

Data Workflows Tutorial – CNS 2023, Leipzig

This repository contains (to read as will contain) all files required for the CNS*2023 Tutorial titled, "T08: Using open tools to build efficient workflows for data access, management and analysis".

Schedule

The workshop will take place on the 15th of July 2023 in two parts:

Overview

Session I: 9:00 -- 10:10 am

  • 9:00-9:10 Welcome, Introduction
  • 9:10-9:30 Introduction to GIN
  • 9:30-9:40 Introduction to the Dataset
  • 9:40-10:10 Primer on Neo I

----- 10:10-10:40 COFFEE BREAK -----

Session II: 10:40--12:10 am

  • 10:40-11:00 Primer on Neo II
  • 11:00-11:30 Data Analysis with Elephant
  • 11:30-12:10 Data Organization and Storage with NIX

Requirements

To benefit from the workshop you need to have some experience with the Python programming language. To follow the tutorial, you may use any one of the following three options:

1. Working offline

Before attending the workshop please make sure that either the machine you are working on can run jupyter notebooks and install python packages.

Either download the contents of the CNS2023-Data-Workflows repository via the web or use the command line to clone the repository using git clone https://gin.g-node.org/CNS2023-Leipzig/CNS2023-Data-Workflows.git.

To make sure your machine is set up for the workshop, please install the Python requirements running pip install -r requirements.txt and start (jupyter noteboook requirements.ipynb) and run the requirements jupyter notebook before the workshop. We recommend using Anaconda as a Python virtual environment to make sure you are running the workshop in a clean Python environment.

2. EBRAINS Collaboratory

To interactively follow the tutorials online, we suggest creating a free EBRAINS account (https://www.ebrains.eu/page/sign-up) in advance.

3. Open Source Brain

TODO

Dataset Used

The Reach-2-Grasp experiment

Full data manuscript and dataset

Tutorial Abstract

Neuroscientists today face challenges in managing the growing volume and complexity of data generated through rapid technological and methodological advancements and sophisticated experimental paradigms. Data management tools and methods provide indispensable solutions for researchers to efficiently handle, organize, and analyze datasets, facilitating model validation, refinement, and simulation, while fostering collaborations. This tutorial presents examples combining multiple tools synergistically into a complete digitized workflow, to help researchers manage and control data and analysis processes.

  • odML (https://g-node.org/odml) is an open, lightweight and flexible format that provides a common schema (with implementations in XML, JSON, YAML) to collect, organize and share metadata in a human- and machine-readable way.
  • NIX (https://g-node.org/nix) is a lean data model and file format for storing fully annotated scientific datasets, i.e. the data together with rich metadata (odML) and their relations in a consistent, comprehensive format.
  • GIN (https://gin.g-node.org) is a platform for version-controlled (git and git-annex) data management and collaboration. It supports any file type and folder structure, provides both web and command-line access, option for local installation, and services including format validation and data publication (DOI).
  • Neo (http://neuralensemble.org/neo), provides programmatic data objects for working with and representing electrophysiological data, and can read data from many proprietary formats. In combination with NIX, Neo makes electrophysiological data interoperable with generic analysis scripts, tools and services.
  • Elephant (https://python-elephant.org) provides a large portfolio of standard and advanced methods for analyzing data from neuronal spike trains or time series data, such as LFPs. The Neo data model makes them easily accessible to scientists and applications.
  • Alpaca (https://alpaca-prov.readthedocs.io) enables simple capture of human-readable provenance of the data processing workflow.

Background reading:

  • Grewe, J., Wachtler, T., Benda, J., 2011. A Bottom-up Approach to Data Annotation in Neurophysiology. Frontiers in Neuroinformatics 5, 16. https://doi.org/10.3389/fninf.2011.00016
  • Zehl, L., Jaillet, F., Stoewer, A., Grewe, J., Sobolev, A., Wachtler, T., Brochier, T.G., Riehle, A., Denker, M., Grün, S., 2016. Handling Metadata in a Neurophysiology Laboratory. Frontiers in Neuroinformatics 10, 26. https://doi.org/10.3389/fninf.2016.00026
  • Sprenger, J., Zehl, L., Pick, J., Sonntag, M., Grewe, J., Wachtler, T., Grün, S., Denker, M., 2019. odMLtables: A User-Friendly Approach for Managing Metadata of Neurophysiological Experiments. Front. Neuroinform. 13, 62. https://doi.org/10.3389/fninf.2019.00062
  • Brochier, T., Zehl, L., Hao, Y., Duret, M., Sprenger, J., Denker, M., Grün, S., Riehle, A., 2018. Massively parallel recordings in macaque motor cortex during an instructed delayed reach-to-grasp task. Scientific Data 5, 180055. https://doi.org/10.1038/sdata.2018.55
  • Denker, M., Grün, S., Wachtler, T., Scherberger, H., 2021. Reproducibility and efficiency in handling complex neurophysiological data. Neuroforum 27, 27–34. https://doi.org/10.1515/nf-2020-0041