[nflug] copy (cp) large number of files

joshj at linuxmail.org joshj at linuxmail.org
Tue Apr 25 14:39:58 EDT 2006


THE RESULTS

After seeing all the different ways to copy huge amounts of files I just
had to try em all out and race em. I made two directories; 'olddir' and
'newdir'. 'olddir' containted 30,000 files (file1 - file30000) (all
empty). Each time I copied the files from 'olddir' to 'newdir' and
checked to make sure all the files made it (ls newdir/ |wc -l) and then
deleted them from newdir. I was pretty surprised by the speed of some of
these.



time cp -R . ../newdir

real    0m17.185s
user    0m2.432s
sys     0m14.605s

time for file in *; do cp $file ../newdir/; done

real    7m3.556s
user    2m4.120s
sys     4m58.983s

# does not maintain directory structure
time find . -type f -exec cp {} ../newdir/ \;

real    4m44.564s
user    1m37.750s
sys     2m59.355s

# this created 'olddir' in 'newdir'
time tar -cf - olddir | (cd newdir; tar -xpBf -)

real    0m25.429s
user    0m8.041s
sys     0m16.905s

# This creates newdir
time cp -R olddir/ newdir

real    0m15.352s
user    0m2.092s
sys     0m13.117s

# This creates olddir in newdir even if newdir doesn't exist.
time rsync -av olddir newdir >/dev/null

real    1m37.080s
user    0m27.834s
sys     1m8.472s




The 'find' method broke the directory structure. So if you have
multiple files with the same name but in different directories then you
will lose data. Only the first two methods did not require the creation
of new directories (which I think was a prerequisite of the original
post since it was a mounted directory). I think that the tar method can
do this too but I couldn't figure it out.

-Josh

Thus spake Ken Smith on Tue, 25 Apr 2006

> On Tue, 2006-04-25 at 10:43 -0400, Jason Lasker wrote:
>> I ran into this problem too and used a tar pipe to copy files...
>>
>> tar -cf - directory | (cd parent; tar -xf -)
>>
>> Check you tar options for the proper switches/parameters
>>
>
> I usually use "-xpBf" on the tar doing the extracting (second one).  The
> "p" means preserve everything possible (owner/group/perms/timestamps)
> and the "B" locks the blocking factor at something sane for a pipe (tar
> still thinks it's talking to a tape drive by default and if for some
> reason the first read is a bit short it can decide to use a small block
> size which makes it pretty inefficient).
>
> And for those of you with multiple machines and data that needs to move
> amongst them this can work as well:
>
> tar -cf - directory | ssh target-machine "(cd parent; tar -xpBf -)"
>
> :-)
>
> --
>                                                Ken Smith
> - From there to here, from here to      |       kensmith at cse.buffalo.edu
>  there, funny things are everywhere.   |
>                      - Theodore Geisel |
>
>
> _______________________________________________
> nflug mailing list
> nflug at nflug.org
> http://www.nflug.org/mailman/listinfo/nflug
>
_______________________________________________
nflug mailing list
nflug at nflug.org
http://www.nflug.org/mailman/listinfo/nflug



More information about the nflug mailing list