#82 more complex spikeglx folder

Спојено
sprenger споји(ла) 8 комит(е) из NeuralEnsemble/spikeglx_extended у NeuralEnsemble/master пре 1 година
Samuel Garcia коментирира пре 2 година

New dataset from Graham Findlay See issue #81.

See also https://github.com/SpikeInterface/spikeinterface/issues/628

This is needed for a patch in neo.

New dataset from Graham Findlay See issue #81. See also https://github.com/SpikeInterface/spikeinterface/issues/628 This is needed for a patch in neo.
sprenger коментирира пре 2 година
Власник

Hi @samuelgarcia, thanks for taking care of the upload. Some open questions / comments are:

  • spikeglx/README.txt: the sentence * have a more in the sub folder complete README.md needs to be completed
  • if spikeglx has an internal format versioning it would be good to use this as main folder label instead of sample_data_v2 (or maybe the spikeglx version v.20201103)
  • this dataset contains a lot (264) of tiny files summing up to 33MB in total, Many of the tiny files seem to be duplicates, e.g. /SpikeGLX/5-19-2022-CI5/5-19-2022-CI5_g0/ containing 8 .bin files. Is this duplication essential for the features you want to test or would it be possible to
    • remove some of the duplicate files
    • make the files duplicates on the git-annex level (keeping different filenames, that all link to the same content)
Hi @samuelgarcia, thanks for taking care of the upload. Some open questions / comments are: - `spikeglx/README.txt`: the sentence `* have a more in the sub folder complete README.md` needs to be completed - if spikeglx has an internal format versioning it would be good to use this as main folder label instead of `sample_data_v2` (or maybe the spikeglx version v.20201103) - this dataset contains a lot (264) of tiny files summing up to 33MB in total, Many of the tiny files seem to be duplicates, e.g. `/SpikeGLX/5-19-2022-CI5/5-19-2022-CI5_g0/` containing 8 `.bin` files. Is this duplication essential for the features you want to test or would it be possible to - remove some of the duplicate files - make the files duplicates on the git-annex level (keeping different filenames, that all link to the same content)
Samuel Garcia коментирира пре 2 година
Власник

Hi Julia, I will fix the naming and readme.

Theses little bin are not duplicated. They are 10ms recording with several case of the acquisition system : mono/several gate and mono/several trigger. With overlapping or not chunks.

In neo, this will make the segment index a bit more complicated, I am woring on it.

I know that it increase the dataset but I think this is need.

@grahamfindlay: any comments ?

Hi Julia, I will fix the naming and readme. Theses little bin are not duplicated. They are 10ms recording with several case of the acquisition system : mono/several gate and mono/several trigger. With overlapping or not chunks. In neo, this will make the segment index a bit more complicated, I am woring on it. I know that it increase the dataset but I think this is need. @grahamfindlay: any comments ?
sprenger коментирира пре 2 година
Власник

@samuelgarcia Ok, but for the bin files that have exactly the same size you don't really care about the values of the samples in there as these only contain signal samples and no metadata, right? So I could replace the content of all bin files of identical size with the content of a single file.

@samuelgarcia Ok, but for the `bin` files that have exactly the same size you don't really care about the values of the samples in there as these only contain signal samples and no metadata, right? So I could replace the content of all `bin` files of identical size with the content of a single file.
sprenger коментирира пре 2 година
Власник

Note: I added a commit to lock the files.

Note: I added a commit to lock the files.
Samuel Garcia коментирира пре 2 година
Власник

You mean with symbolic link ?

You mean with symbolic link ?
sprenger коментирира пре 2 година
Власник

With a symbolic link when the files are locked, but when unlocked the files will be independent, just containing the identical content.

With a symbolic link when the files are locked, but when unlocked the files will be independent, just containing the identical content.
Samuel Garcia коментирира пре 2 година
Власник

how we can do that in gin ?

how we can do that in gin ?
Graham Findlay коментирира пре 2 година

Yes, if you don't care about the content of the .bin files, it would be fine to replace their values with the content of a single file.

Caveats:

  • .meta files cannot be consolidated in this way.
  • You may care about the contents of the .bin files if you wish to write tests confirming that they were concatenated/loaded properly, especially in the case of overlapping t-segments.
  • .meta files contain information like hashes for the .bin files, which will obviously no longer be accurate.
  • Although I requested that the acquisition system give me files of consistent duration, there may be some variability in the actual number of samples per file. If you truly make all these .bin filenames point to the same underlying data, meta fields like fileTimeSecs and fileSyzeBytes may be inaccurate.
Yes, if you don't care about the content of the `.bin` files, it would be fine to replace their values with the content of a single file. Caveats: - `.meta` files cannot be consolidated in this way. - You may care about the contents of the `.bin` files if you wish to write tests confirming that they were concatenated/loaded properly, especially in the case of overlapping t-segments. - `.meta` files contain information like hashes for the `.bin` files, which will obviously no longer be accurate. - Although I requested that the acquisition system give me files of consistent duration, there may be some variability in the actual number of samples per file. If you truly make all these `.bin` filenames point to the same underlying data, meta fields like `fileTimeSecs` and `fileSyzeBytes` may be inaccurate.
sprenger коментирира пре 2 година
Власник

@samuelgarcia: if two files have the identical content git-annex will automatically only store the content once. So you could (e.g. using gin-cli):

  • unlock all bin files
  • replace the content of all files with identical size by only a single version
  • commit the files again
  • lock the files again
  • upload the locked version of the files
@samuelgarcia: if two files have the identical content git-annex will automatically only store the content once. So you could (e.g. using `gin-cli`): - unlock all `bin` files - replace the content of all files with identical size by only a single version - commit the files again - lock the files again - upload the locked version of the files
Samuel Garcia коментирира пре 1 година
Власник

@sprenger : can we merge this ? I already merge your PR into that branch.

@sprenger : can we merge this ? I already merge your PR into that branch.
sprenger затворено пре 1 година
sprenger коментирира пре 1 година
Власник

@samuelgarcia: It's merged. Can you confirm again the merged version works for your tests?

@samuelgarcia: It's merged. Can you confirm again the merged version works for your tests?
Samuel Garcia коментирира пре 1 година
Власник
test seams to pass!! https://github.com/NeuralEnsemble/python-neo/pull/1125
Спајање је успешно завршено!
Пријавите се да се прикључе у овом разговору.
Нема лабеле
Нема фазе
Нема одговорних
3 учесника
Учитавање...
Откажи
Сачувај
Још нема садржаја.