[nflug] copy (cp) large number of files
joshj at linuxmail.org
joshj at linuxmail.org
Tue Apr 25 14:39:58 EDT 2006
THE RESULTS
After seeing all the different ways to copy huge amounts of files I just
had to try em all out and race em. I made two directories; 'olddir' and
'newdir'. 'olddir' containted 30,000 files (file1 - file30000) (all
empty). Each time I copied the files from 'olddir' to 'newdir' and
checked to make sure all the files made it (ls newdir/ |wc -l) and then
deleted them from newdir. I was pretty surprised by the speed of some of
these.
time cp -R . ../newdir
real 0m17.185s
user 0m2.432s
sys 0m14.605s
time for file in *; do cp $file ../newdir/; done
real 7m3.556s
user 2m4.120s
sys 4m58.983s
# does not maintain directory structure
time find . -type f -exec cp {} ../newdir/ \;
real 4m44.564s
user 1m37.750s
sys 2m59.355s
# this created 'olddir' in 'newdir'
time tar -cf - olddir | (cd newdir; tar -xpBf -)
real 0m25.429s
user 0m8.041s
sys 0m16.905s
# This creates newdir
time cp -R olddir/ newdir
real 0m15.352s
user 0m2.092s
sys 0m13.117s
# This creates olddir in newdir even if newdir doesn't exist.
time rsync -av olddir newdir >/dev/null
real 1m37.080s
user 0m27.834s
sys 1m8.472s
The 'find' method broke the directory structure. So if you have
multiple files with the same name but in different directories then you
will lose data. Only the first two methods did not require the creation
of new directories (which I think was a prerequisite of the original
post since it was a mounted directory). I think that the tar method can
do this too but I couldn't figure it out.
-Josh
Thus spake Ken Smith on Tue, 25 Apr 2006
> On Tue, 2006-04-25 at 10:43 -0400, Jason Lasker wrote:
>> I ran into this problem too and used a tar pipe to copy files...
>>
>> tar -cf - directory | (cd parent; tar -xf -)
>>
>> Check you tar options for the proper switches/parameters
>>
>
> I usually use "-xpBf" on the tar doing the extracting (second one). The
> "p" means preserve everything possible (owner/group/perms/timestamps)
> and the "B" locks the blocking factor at something sane for a pipe (tar
> still thinks it's talking to a tape drive by default and if for some
> reason the first read is a bit short it can decide to use a small block
> size which makes it pretty inefficient).
>
> And for those of you with multiple machines and data that needs to move
> amongst them this can work as well:
>
> tar -cf - directory | ssh target-machine "(cd parent; tar -xpBf -)"
>
> :-)
>
> --
> Ken Smith
> - From there to here, from here to | kensmith at cse.buffalo.edu
> there, funny things are everywhere. |
> - Theodore Geisel |
>
>
> _______________________________________________
> nflug mailing list
> nflug at nflug.org
> http://www.nflug.org/mailman/listinfo/nflug
>
_______________________________________________
nflug mailing list
nflug at nflug.org
http://www.nflug.org/mailman/listinfo/nflug
More information about the nflug
mailing list