Fix check-added-large-files --enforce-all to correctly consider all git-lfs files.#674
Fix check-added-large-files --enforce-all to correctly consider all git-lfs files.#674asottile merged 1 commit intopre-commit:masterfrom amartani:master
Conversation
| try: | ||
| # This also includes all empty files in the repository: | ||
| # https://github.com/git-lfs/git-lfs/issues/4660 | ||
| lfs_ret = cmd_output('git', 'lfs', 'ls-files', '--name-only') |
There was a problem hiding this comment.
this output does not work for files with special names -- typically you need structured output (json) or null delimeters
There was a problem hiding this comment.
I believe git-lfs ls-files doesn't have any option to output json. There's an open issue about it: git-lfs/git-lfs#4679
Do you think it's reasonable to add a TODO here to switch to another command when the related issue is fixed?
There was a problem hiding this comment.
I'd rather not add the feature with a known shortcoming. I wonder if we can bypass git-lfs entirely and query git's attributes directly? (something something git check-attr filter?)
There was a problem hiding this comment.
Sorry for the delay. I've been investigating alternatives here, but I couldn't come up with a good one.
The main issue that I'm concerned with is that it is relatively easy to end up with a file that is marked with filter=lfs in gitattributes but is not actually using git-lfs:
- If you don't have git-lfs installed, then git will just ignore the lfs filter and will commit as a regular file.
- If you have it installed, but you edit gitattributes directly instead of using
git lfs trackand you don'tgit add --renormalize, then the files are also committed as regular files.
As an aside - it would be great to have a pre-commit check that catches these cases, checking that all files that have filter=lfs in gitattributes are actually being tracked by git-lfs. If you know a good way of enforcing that, then it would be easier to switch this to something else based on git check-attr
There was a problem hiding this comment.
hmmm, yeah that's tricky -- given git-lfs can't produce a parseable output I don't think we can use that. I think the attributes would be a good compromise though, despite the ways to get around it.
There was a problem hiding this comment.
Sounds good; re-implemented using check-attr instead. PTAL
Do you think it is worth keeping the old lfs_files function, or should I switch both paths (with and without --enforce-all) to use the new approach?
There was a problem hiding this comment.
yeah let's switch both of them to be consistent
|
|
||
| return set(json.loads(lfs_ret)['files']) | ||
| check_attr_ret = cmd_output( | ||
| 'git', 'check-attr', 'filter', '-z', '--', *filenames, |
There was a problem hiding this comment.
this could exceed commandline lengths -- this should probably use --stdin
There was a problem hiding this comment.
Done. Since pre_commit_hooks.util.cmd_output doesn't allow passing an input, I switched to use subprocess.run().
…it-lfs files. `git lfs status` only outputs status for files that are pending some git-lfs related operation. For usage with --enforce-all, we need the list of all files that are tracked, which can be achived by `git lfs ls-files`. Fixes: #560

git lfs statusonly outputs status for files that are pending some git-lfs related operation.For usage with --enforce-all, we need the list of all files that are tracked. Ideally, this would
be done using
git lfs ls-files; however, that doesn't provide a parseable output option(git-lfs/git-lfs#4679). Therefore, this uses
git check-attrand usesthe attribute
filter=lfsto determine which files to consider as git-lfs files.Fixes: #560