doi/story_listening

No Description

gxlilyBerkeley e80c315e38 update datacite yml according to instructin		11 months ago
.datalad	595c822d7c [DATALAD] new dataset	11 months ago
features	c979389933 first commit	11 months ago
mappers	c979389933 first commit	11 months ago
responses	c979389933 first commit	11 months ago
source_data	a5efa9d45e rm outdated file	11 months ago
stimuli	c979389933 first commit	11 months ago
.gitattributes	8624796cd0 Instruct annex to add text files to Git	11 months ago
LICENSE	0f0e03344c Initial commit	11 months ago
README.md	b312028943 update README.md source info	11 months ago
datacite.yml	e80c315e38 update datacite yml according to instructin	11 months ago

Nature Story Listening 3T fMRI Data

Summary

This dataset contains BOLD fMRI responses in human subjects listening to a set of natural autobiographic stories. The functional data were collected in eleven subjects, in two sessions over two separate days for each subject. Details of the experiment are described in the original publications [1], [2], [3], [4]. Source data used to generate all the figures in the publication [4] is included. Code used to analyze the data in the publication [4] is here.

[1] Huth, A. G., De Heer, W. A., Griffiths, T. L., Theunissen, F. E., & Gallant, J. L. Natural speech reveals the semantic maps that tile human cerebral cortex. Nature 532, 453–458 (2016). https://doi.org/10.1038/nature17637

[2] de Heer, W. A., Huth, A. G., Griffiths, T. L., Gallant, J. L., & Theunissen, F. E.. The hierarchical cortical organization of human speech processing. Journal of Neuroscience, 37(27), 6539-6557 (2017). DOI: https://doi.org/10.1523/JNEUROSCI.3267-16.2017

[3] Deniz, F., Nunez-Elizalde, A. O., Huth, A. G., & Gallant, J. L.. The representation of semantic information across human cerebral cortex during listening versus reading is invariant to stimulus modality. Journal of Neuroscience, 39(39), 7722-7736 (2019). DOI: https://doi.org/10.1523/JNEUROSCI.0675-19.2019

[4] Gong, X., Huth, A. G., Deniz, F., Johnson, K., Gallant, J. L., & Theunissen, F. E.. Phonemic segmentation of narrative speech in human cerebral cortex. Nature Communications, (2023). https://doi.org/

If you publish any work using the dataset, please cite the original publication [2] and [4], and cite the dataset [1b] in the following recommended format:

[1] Huth, A. G., De Heer, W. A., Deniz, F., Gong, X., Gallant, J. L., & Theunissen, F. E.. Nature Story Listening 3T fMRI Data.

How to get started

With git and git-annex

To download the data with git-annex, run the commands:

# clone the repository, without the data files
git clone https://gin.g-node.org/gallantlab/story_listening
cd story_listening
# download one file (e.g. features/features_matrix.hdf)
git annex get features/features_matrix.hdf --from wasabi
# download all files
git annex get . --from wasabi

To maximize the downloading speed, two remotes are available to download the data. The first remote is GIN (--from origin), but the bandwidth might be limited. The second remote is Wasabi (--from wasabi), with a larger bandwidth.

Dataset content

Data file organization

features/                    → feature spaces used for voxelwise modeling
    english1000.hdf          → semantic embeddings, as described in [1], [2], [3], [4]
    feature_basis.hdf        → all feature labels, as described in [1]
    feature_matrix.hdf       → all feature, as described in [1]
mappers/                     → plotting mappers for each subject
    S01_mappers.hdf
    ...
    S11_mappers.hdf
responses/                   → functional responses for each subject
    S01_BOLD.hdf
    ...
    S11_BOLD.hdf
    simulation_BOLD.hdf      → simulated functional responses for simulation analysis
stimuli/                     → natural autobiographic story, for each fMRI run
    test.wav
    train_00.wav
    ...
    train_11.wav

Data format

All files are hdf5 files, with multiple arrays stored inside. The names, shapes, and descriptions of each array are listed below.


Each file in `features` contains:
    X_train: array of shape (3737, n_features)
        Training features.
    X_test: array of shape (291, n_features)
        Testing features.

    where (n_features = 448) for `spectral power` 
    and (n_features = 1) for `number of phonemes` & `number of words` 
    and (n_features = 39) for `single phoneme`
    and (n_features = 858) for `diphone`.
    and (n_features = 4841) for `triphone`.
    and (n_features = 985) for `semantics`.

Each file in `mappers` contains:
    voxel_to_flatmap: CSR sparse array of shape (n_pixels, n_voxels)
        Mapper from voxels to flatmap image. The sparse array is stored with
        four dense arrays: (data, indices, indptr, shape).
    voxel_to_fsaverage: CSR sparse array of shape (n_vertices, n_voxels)
        Mapper from voxels to FreeSurfer surface. The sparse array is stored
        with four dense arrays: (data, indices, indptr, shape).
    flatmap_mask: array of shape (width, height)
        Pixels of the flatmap image associated with a voxel.
    flatmap_rois: array of shape (width, height, 4)
        Transparent image with annotated ROIs (for subjects S01, S02, and S03).
    flatmap_curvature: array of shape (width, height)
        Transparent image with binarized curvature to locate sulci/gyri.
    roi_mask_xxx: array of shape (n_voxels, )
        Mask indicating which voxels are in the ROI `xxx`.
        ROI list is different on each subject. SO4 and S05 have no ROIs.

Each file in `responses` contains:
    Y_train: array of shape (3737, n_voxels)
        Training responses.
    Y_test: array of shape (291, n_voxels)
        Testing responses.

Each file in `stimuli` contains the raw sound wav for each story. 

Each file in `source_data` contains the data used to generate each figure in the publication [4].
    `source_data_manuscript` contains data for generating figure 4, 5, 6, and 7(c, d) for the main paper. 
	Data for 5 is also used to generate supplementary table 3.
	Data for 7c is also used to generate supplementary figure 14.  
    `source_data_performance` contains data for generating all the flatmaps for both the main paper (figure 2, 3, 7(a, b))and the supplements(supfigure 5, 6, 8, 11, 13)
    `source_data_supplements` contains data for generating figures in the supplements not provided in the `source_data_performance`.

datacite.yml
Title	Nature Story Listening 3T fMRI Data
Authors	Huth,Alexander G.;University of California, Berkeley;ORCID:0000-0002-7590-3525 de Heer,Wendy A.;University of California, Berkeley Deniz,Fatma;University of California, Berkeley Gong,Xue L.;University of California, Berkeley;ORCID:0000-0001-7656-525X
Description	This dataset contains BOLD fMRI responses in human subjects listening to a set of natural autobiographic stories. The functional data were collected for eleven subjects, in two sessions over two separate days for each subject. Details of the experiment are described in the original publications.
License	Creative Commons CC0 1.0 Public Domain Dedication (https://creativecommons.org/publicdomain/zero/1.0/)
References	Huth, A. G., Lee, T., Nishimoto, S., Bilenko, N. Y., Vu, A. T., & Gallant, J. L. (2016). Decoding the semantic content of natural movies from human brain activity. Frontiers in systems neuroscience, 10, 81. [doi:10.3389/fnsys.2016.00081] (IsReferencedBy) de Heer, W. A., Huth, A. G., Griffiths, T. L., Gallant, J. L., & Theunissen, F. E.. The hierarchical cortical organization of human speech processing. Journal of Neuroscience, 37(27), 6539-6557 (2017). [doi:10.1523/JNEUROSCI.3267-16.2017] (IsReferencedBy) Deniz, F., Nunez-Elizalde, A. O., Huth, A. G., & Gallant, J. L.. The representation of semantic information across human cerebral cortex during listening versus reading is invariant to stimulus modality. Journal of Neuroscience, 39(39), 7722-7736 (2019). [doi:10.1523/JNEUROSCI.0675-19.2019] (IsReferencedBy) Gong, X., Huth, A. G., Johnson, K., Gallant, J. L., & Theunissen, F. E.. Phonemic segmentation of narrative speech in human cerebral cortex. Nature Communications, (2023) [doi:tba] (IsSupplementTo) Gong, X., Theunissen, F. E.. (2023) Phoneme Segmentation (Version 0.0.1) [Computer software]. [doi:10.5281/zenodo.7938599] (IsReferencedBy)
Funding	Weil Foundation grant Dingwall Foundation grant in Neurolinguistics NSF grant, 1912373
Keywords	Neuroscience fMRI Naturalistic stimuli Voxelwise encoding models
Resource Type	Dataset

README.md