[cvsnt] Forcing binary behavior
Glen Starrett
glen.starrett at march-hare.com
Thu Feb 15 22:19:32 GMT 2007
Jim Hyslop wrote:
> You raise an interesting point, though, which maybe Arthur or one of the
> others from March Hare can comment on: which algorithm is more efficient
> at storing text files, the standard RCS diff or the binary algorithm
> CVSNT uses?
That depends on the nature of the file and the changes...
CVSNT (and CVS) uses a line based difference on text files *and* binary
files by default. This means, it looks for a CRLF and if the preceding
characters are not exactly the same, it marks the change down in the RCS
file. Binary files are stored inefficiently because the "lines" can be
extremely long (since there is not normally any pattern to how far
you'll have to look to find CRLF).
CVSNT adds 'binary deltas' ('B' keyword) which is a more efficient
algorithm for binaries and, with the 'z' keyword also will compress
those deltas.
If you have a text file that has really really long lines, and very few
of them, you'll be as inefficient as binary. Or, if you have a normal
text file with as little as 1 change per line, it'll store the entire thing.
Metadata overhead is about the same for both binaries and text.
Don't forget to consider the advantage of text -- ability to merge,
automatic translating line endings based on the client OS, etc.
Regards,
--
Glen Starrett
Technical Account Manager, North America
March Hare Software, LLC
http://march-hare.com/cvspro/
More information about the cvsnt
mailing list