#62 Presumably corrupted git-annex branches

Open
opened 1 year ago by adswa · 0 comments

Hi! First and foremost a huge thank you for Gin! It is an immeasurably useful infrastructure for science.

I've recently noticed what I presume to be a corruption of the git-annex branch after pushing to Gin, and reported it originally at https://github.com/datalad/datalad-gooey/issues/349.

The issue presents as follows: At the moment, pushing a DataLad dataset/git annex repo causes a severance of the git-annex branch, and complete divergence of my local and the remote git-annex branch on Gin. This happens with datasets I previously pushed successfully (small datasets I often use for demonstrations or ad-hoc testing).

An example is this dataset (you might see different gin repos in the errors below as I tried to pin this down to parametrization or operating system, but the errors were identical over different scenarios). Its originally from https://github.com/datalad-datasets/machinelearning-books, and contains PDFs that have a web special remote registered (i.e., files came from a git annex addurl call). If I add a new gin repository as a remote, and push it using datalad push, the push succeeds for the default branch, but fails with a non-fast-forward error for the git-annex branch, similar to the one below:

*	refs/heads/master:refs/heads/master	[new branch]
!	refs/heads/git-annex:refs/heads/git-annex	[rejected] (non-fast-forward)
Done'] [err: 'Delta compression using up to 16 threads
Total 422 (delta 198), reused 149 (delta 33), pack-reused 0                                                                                      error: failed to push some refs to 'gin.g-node.org:/adswa/ml-books-only-ssh.git'
hint: Updates were rejected because a pushed branch tip is behind its remote
hint: counterpart. Check out this branch and integrate the remote changes
hint: (e.g. 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.']

Investigating the remote git-annex branch on Gin shows that the git-annex branch has been re-created from scratch (it seems), by a committer ID called "Gogs": https://gin.g-node.org/adswa/mlbooksmoretests/src/git-annex. The local git-annex branch shows commits indicating that the branch was rewritten or otherwise vastly changed:

(gooyey) C:\Users\adina\Desktop\ml-books2>git log git-annex
commit 4e226892a69de8989b56cef5f41c49f138aee09e (git-annex)
Author: Adina Wagner <adina.wagner@t-online.de>
Date:   Fri Oct 14 09:22:57 2022 +0200

    continuing transition ["forget git history"]

commit 38be5a7d07b019e2a7e42c8dff0734926c276f7d
Author: Adina Wagner <adina.wagner@t-online.de>
Date:   Fri Oct 14 09:17:56 2022 +0200

    update

commit 72cd967f9648209aab5c55aebf5b60f1aea41099 (origin/git-annex)
Author: Adina Wagner <adina.wagner@t-online.de>
Date:   Tue Apr 19 13:29:07 2022 +0200

    update

A manual pull fails locally:

❱ git pull gin git-annex
From https://gin.g-node.org/adswa/mlbooksmoretests
 * branch            git-annex  -> FETCH_HEAD
fatal: refusing to merge unrelated histories

And annexed data that should be readily available from the web special remote can't be retrieved after cloning the repository.

(gooey) adina@muninn in /tmp/mlbooksmoretests on git:master
❱ git-annex whereis A.Shashua-Introduction_to_Machine_Learning.pdf          1 !
whereis A.Shashua-Introduction_to_Machine_Learning.pdf (0 copies) failed
whereis: 1 failed
(gooey) adina@muninn in /tmp/mlbooksmoretests on git:master

❱ git annex get A.Shashua-Introduction_to_Machine_Learning.pdf            130 !
get A.Shashua-Introduction_to_Machine_Learning.pdf (not available) 
  No other repository is known to contain the file.
failed
get: 1 failed
(gooey) adina@mun

I have seen this on Linux and Windows-based operating systems with different versions of git-annex, using DataLad but also only git push and git annex sync commands. I also reproduced this with several datasets I previously pushed successfully, with data available from web special remotes, other types of special remotes, or purely local availability. Can you advise what might be wrong?

Hi! First and foremost a huge thank you for Gin! It is an immeasurably useful infrastructure for science. I've recently noticed what I presume to be a corruption of the git-annex branch after pushing to Gin, and reported it originally at https://github.com/datalad/datalad-gooey/issues/349. The issue presents as follows: At the moment, pushing a DataLad dataset/git annex repo causes a severance of the git-annex branch, and complete divergence of my local and the remote git-annex branch on Gin. This happens with datasets I previously pushed successfully (small datasets I often use for demonstrations or ad-hoc testing). An example is [this dataset](https://gin.g-node.org/adswa/mlbooksmoretests) (you might see different gin repos in the errors below as I tried to pin this down to parametrization or operating system, but the errors were identical over different scenarios). Its originally from https://github.com/datalad-datasets/machinelearning-books, and contains PDFs that have a web special remote registered (i.e., files came from a `git annex addurl` call). If I add a new gin repository as a remote, and push it using ``datalad push``, the push succeeds for the default branch, but fails with a non-fast-forward error for the ``git-annex`` branch, similar to the one below: ``` * refs/heads/master:refs/heads/master [new branch] ! refs/heads/git-annex:refs/heads/git-annex [rejected] (non-fast-forward) Done'] [err: 'Delta compression using up to 16 threads Total 422 (delta 198), reused 149 (delta 33), pack-reused 0 error: failed to push some refs to 'gin.g-node.org:/adswa/ml-books-only-ssh.git' hint: Updates were rejected because a pushed branch tip is behind its remote hint: counterpart. Check out this branch and integrate the remote changes hint: (e.g. 'git pull ...') before pushing again. hint: See the 'Note about fast-forwards' in 'git push --help' for details.'] ``` Investigating the remote git-annex branch on Gin shows that the git-annex branch has been re-created from scratch (it seems), by a committer ID called "Gogs": https://gin.g-node.org/adswa/mlbooksmoretests/src/git-annex. The local git-annex branch shows commits indicating that the branch was rewritten or otherwise vastly changed: ``` (gooyey) C:\Users\adina\Desktop\ml-books2>git log git-annex commit 4e226892a69de8989b56cef5f41c49f138aee09e (git-annex) Author: Adina Wagner <adina.wagner@t-online.de> Date: Fri Oct 14 09:22:57 2022 +0200 continuing transition ["forget git history"] commit 38be5a7d07b019e2a7e42c8dff0734926c276f7d Author: Adina Wagner <adina.wagner@t-online.de> Date: Fri Oct 14 09:17:56 2022 +0200 update commit 72cd967f9648209aab5c55aebf5b60f1aea41099 (origin/git-annex) Author: Adina Wagner <adina.wagner@t-online.de> Date: Tue Apr 19 13:29:07 2022 +0200 update ``` A manual pull fails locally: ``` ❱ git pull gin git-annex From https://gin.g-node.org/adswa/mlbooksmoretests * branch git-annex -> FETCH_HEAD fatal: refusing to merge unrelated histories ``` And annexed data that should be readily available from the web special remote can't be retrieved after cloning the repository. ``` (gooey) adina@muninn in /tmp/mlbooksmoretests on git:master ❱ git-annex whereis A.Shashua-Introduction_to_Machine_Learning.pdf 1 ! whereis A.Shashua-Introduction_to_Machine_Learning.pdf (0 copies) failed whereis: 1 failed (gooey) adina@muninn in /tmp/mlbooksmoretests on git:master ❱ git annex get A.Shashua-Introduction_to_Machine_Learning.pdf 130 ! get A.Shashua-Introduction_to_Machine_Learning.pdf (not available) No other repository is known to contain the file. failed get: 1 failed (gooey) adina@mun ``` I have seen this on Linux and Windows-based operating systems with different versions of git-annex, using DataLad but also only git push and git annex sync commands. I also reproduced this with several datasets I previously pushed successfully, with data available from web special remotes, other types of special remotes, or purely local availability. Can you advise what might be wrong?
Sign in to join this conversation.
No Milestone
No assignee
1 Participants
Loading...
Cancel
Save
There is no content yet.