[cvsnt] Re: Performance problems
Tony Hoyle
tmh at nodomain.org
Tue Dec 28 04:34:49 GMT 2004
Nitzan Shaked wrote:
> Secondly -- I would increase repository size, but not by much at all! I
> propose a hierarchical, or logarithmic, storage of the full versions. So the
> difference in the number of times the full version would be saved between
> 10,000 revisions and 20,000 is... 1. Not terrible at all.
If you're only saving 1 extra revision there isn't any point in the
complexity of trying to handle something like that.
The diff rebuild in cvs is *very* fast. Checking out rev. 1.1 of a
10,000 revision file would cause a rebuild probably of less than a
second. Those are the extremes. In the real world most people work on
revisions
much nearer to HEAD than that - often just a couple of revisions.
In the example of the CVSNT_2_0_x branch that's close to a worst case
(since it's been branched for longer than most branches would normally)
and the slowdown isn't noticable.
> there is a user on the other side, and commits are not done on hundreds of
> files usuaully, the small time per file will not be noticed by the user who
Commits are often done on thousands or tens of thousands of files in
large repositories.
> Tagging at a file level is important, at least for me. But isn't there a way
> to do so without writing to each file? I suppose you could store the tags in
> a linked list in the file, so that adding a tag to the file won't have to
How do you suggest doing this?
> re-write the whole file. Or: could be stored in a different file altogether
You still have to rewrite the file. CVS *never* just modifies a file -
that would be unsafe on disk failure/powercut etc. It builds a
completely new file (mostly by doing a copy of the unchanged elements
and patching the new ones in) then at the last moment does a (hopefully)
atomic rename of the file on top of the old one.
> As an idea: the client knows which revisions of which files it is currently
> holding. Just send that information to the server (recurse over all
> client-side directories) and call that a tag. Put in a file of it's own.
> Scalable, and quite fast.
Not really.. you're still having to write the file, which is the slow
part. You're saving little or nothing on the current scheme, unless the
RCS files are *really* big, and in that case other factors are already
slowing you down.
> What's the idea behind hierarchical tags? And slightly related: is there a
> place I can read about ,v structure?
With a heirarchical tag, you don't recurse down directories on rtag, you
just tag the directory with the exact moment of the tag (this requires
high-granularity timers in the files... per-second isn't nearly good
enough). Every file/directory below that is deemed to have a tag that
is the current version at that moment, unless overridden by a lower down
tag (on a subdirectory or on the file).
You have to be careful with branches done this way... it's not as easy
as I make it sound.
Don't 'man rcsfile' on a Unix system gives you the basic structure.
cvsnt extends this somewhat but is still mostly compatible.
Tony
More information about the cvsnt
mailing list