Revision - 4f22b10 - do not stream large files to pack when filters are in use

Revision 4f22b1015d4203ccdf2b66f27ee5946504342ace authored by Jeff King on 24 February 2012, 22:10:17 UTC, committed by Junio C Hamano on 24 February 2012, 22:18:20 UTC

do not stream large files to pack when filters are in use

Because git's object format requires us to specify the
number of bytes in the object in its header, we must know
the size before streaming a blob into the object database.
This is not a problem when adding a regular file, as we can
get the size from stat(). However, when filters are in use
(such as autocrlf, or the ident, filter, or eol
gitattributes), we have no idea what the ultimate size will
be.

The current code just punts on the whole issue and ignores
filter configuration entirely for files larger than
core.bigfilethreshold. This can generate confusing results
if you use filters for large binary files, as the filter
will suddenly stop working as the file goes over a certain
size.  Rather than try to handle unknown input sizes with
streaming, this patch just turns off the streaming
optimization when filters are in use.

This has a slight performance regression in a very specific
case: if you have autocrlf on, but no gitattributes, a large
binary file will avoid the streaming code path because we
don't know beforehand whether it will need conversion or
not. But if you are handling large binary files, you should
be marking them as such via attributes (or at least not
using autocrlf, and instead marking your text files as
such). And the flip side is that if you have a large
_non_-binary file, there is a correctness improvement;
before we did not apply the conversion at all.

The first half of the new t1051 script covers these failures
on input. The second half tests the matching output code
paths. These already work correctly, and do not need any
adjustment.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

1 parent 4c3b57b

Files
Changes

Permalinks

branch.h

#ifndef BRANCH_H
#define BRANCH_H

/* Functions for acting on the information about branches. */

/*
 * Creates a new branch, where head is the branch currently checked
 * out, name is the new branch name, start_name is the name of the
 * existing branch that the new branch should start from, force
 * enables overwriting an existing (non-head) branch, reflog creates a
 * reflog for the branch, and track causes the new branch to be
 * configured to merge the remote branch that start_name is a tracking
 * branch for (if any).
 */
void create_branch(const char *head, const char *name, const char *start_name,
		   int force, int reflog,
		   int clobber_head, enum branch_track track);

/*
 * Validates that the requested branch may be created, returning the
 * interpreted ref in ref, force indicates whether (non-head) branches
 * may be overwritten. A non-zero return value indicates that the force
 * parameter was non-zero and the branch already exists.
 *
 * Contrary to all of the above, when attr_only is 1, the caller is
 * not interested in verifying if it is Ok to update the named
 * branch to point at a potentially different commit. It is merely
 * asking if it is OK to change some attribute for the named branch
 * (e.g. tracking upstream).
 *
 * NEEDSWORK: This needs to be split into two separate functions in the
 * longer run for sanity.
 *
 */
int validate_new_branchname(const char *name, struct strbuf *ref, int force, int attr_only);

/*
 * Remove information about the state of working on the current
 * branch. (E.g., MERGE_HEAD)
 */
void remove_branch_state(void);

/*
 * Configure local branch "local" as downstream to branch "remote"
 * from remote "origin".  Used by git branch --set-upstream.
 */
#define BRANCH_CONFIG_VERBOSE 01
extern void install_branch_config(int flag, const char *local, const char *origin, const char *remote);

#endif

Showing with 0 additions and 0 deletions (0 / 0 diffs computed)

Computing file changes ...