No Description

Agata Kozioł 2cb0a33dc3 Corrected sampler, imported builder, added conversion to mp3 1 month ago
.datalad edddf89a36 initial commit 6 months ago
annotations 00d672d1c0 move builder 2 months ago
confusion_matrices 2d9d0c1368 update 6 months ago
extra 2cb0a33dc3 Corrected sampler, imported builder, added conversion to mp3 1 month ago
metadata 2cb0a33dc3 Corrected sampler, imported builder, added conversion to mp3 1 month ago
recordings 2cb0a33dc3 Corrected sampler, imported builder, added conversion to mp3 1 month ago
sampler 2cb0a33dc3 Corrected sampler, imported builder, added conversion to mp3 1 month ago
scripts 2cb0a33dc3 Corrected sampler, imported builder, added conversion to mp3 1 month ago
turn-taking e681559371 merge remote into local 2 months ago
.DS_Store da839ee3a4 audio data 6 months ago
.gitattributes edddf89a36 initial commit 6 months ago
.gitignore 6b24a63858 ignore pycharm projects 1 month ago
README.md edddf89a36 initial commit 6 months ago

README.md

Dataset Name

Suggested citation: XXX.

Data description

Data documentation

The dataset is structured according to the ChildProject package standards detailed here.

Participants & recordings

Number of participants, general info on them.

Number of recordings, specificities.

Available annotations

What sort of annotations are presents.

We strongly recommend the use of the converted/ versions of the annotations, to avoid issues of time stamping and category assignment.

Automated annotations

keep relevant, add new ones if necessary

  • alice: automated counts of phonemes, syllables, and words done by ALICE; all recordings have been analyzed with this system
  • its: automated analyses using the LENA software; only a small proportion of recordings (those gathered with a LENA device) have this annotation
  • vcm: automated analyses aimed at categorization the key child's vocalizations into: canonical, non-canonical, crying, and other (which includes both "junk" = not the child at all; and laughing, which occurred infrequently), done using VCM; all recordings have been analyzed with this system
  • vtc: automated analyses that distinguish key child, other children, male adult, female adult, using VTC; all recordings have been analyzed with this system

Manual annotations

explain what each set of manual annotation is and how it was produced Example from tsimane2017:

  • eaf_2021: A small number of recordings were selected for coding by two speech-and-language students unfamiliar with the language and the families recorded, who annotated randomly sampled 15-second sections, skipping any that failed to contain speech by male adults or other children, following this coding manual, derived from the ACLEW Annotation Scheme, to do only segmentation.

Getting access to the data

To gain access to the data, please email XXXX? or Homebank or ... ?

Re-using the dataset

Requirements

You will first need to install the ChildProject package as well as DataLad. Instructions to install these packages can be found here.

Configuring your SSH key on GIN

This step should only be done once for all.

  1. Copy your SSH public key to your clipboard (usually located in ~/.ssh/id_rsa.pub). If you don't have one, please create one following these instructions.
  2. In your browser, go to GIN > Your parameters > SSH keys.
  3. Click on the blue "Add a key" button, then paste the content of your public key in the Content field, and submit.

Your key should now appear in your list of SSH keys - you can add as many as necessary.

Installing the dataset

The next step is to clone the dataset :

datalad install git@gin.g-node.org:/LAAC-LSCP/XXX.git
cd XXX

Getting data

You can get data from a dataset using the datalad get command, e.g.:

datalad get recordings/converted/ # download converted recordings
datalad get annotations/*/converted/ # get converted annotations

Or:

datalad get . # get everything

You can download many files in parallel using the -J or --jobs parameters:

datalad get . -J 4 # get everything, with 4 parallel transfers

For more help with using DataLad, please refer to our cheatsheet or DataLad's own cheatsheet. If this is not enough, check DataLad's documentation and Handbook.

Fetching updates

If you are notified of changes to the data, please retrieve them by issuing the following commands:

datalad update --merge
datalad get .

Removing the data

It is important that you delete the data once your project is complete. This can be done with datalad remove:

datalad remove -r path/to/your/dataset

Maintainers

Maintainers should install the dataset from LAAC-LSCP and run the setup procedure as follows:

datalad install git@gin.g-node.org:/LAAC-LSCP/tsimane2017.git
cd tsimane2017
datalad run-procedure setup --public --confidential

Changes should be pushed to origin, that will trigger a push to the others:

datalad push

History

History of the dataset, add an entry to explain what was done at this point in time, example (from tsimane2017):

  • 2022-04 Alex Cristia processed files from Camila Scaff, resulting in data from XX children. These data came from copies of the back-up drives Lacie, hand-written notes, and re-listening to the audio files to determine date and child ID. Some of the processing was done by hand; other via a script called gen-recordings.R, which used to live in a dropbox folder shared between Alex and Camila.