bogdan romocea
2004-Oct-15 19:58 UTC
[R] combine many .csv files into a single file/data frame
Dear R users, I have a few hundred .csv files which I need to put together (full outer joins on a common variable) to do a factor analysis. Each file may contain anywhere from a few hundred to a few thousand rows. What would be the most efficient way to do this in R? Please include some sample code if applicable. Thank you, b.
(Ted Harding)
2004-Oct-15 20:41 UTC
[R] combine many .csv files into a single file/data frame
On 15-Oct-04 bogdan romocea wrote:> Dear R users, > > I have a few hundred .csv files which I need to put > together (full outer joins on a common variable) to do a > factor analysis. Each file may contain anywhere from a few > hundred to a few thousand rows. What would be the most > efficient way to do this in R? Please include some sample > code if applicable.If you're using Linux/Unix, you should consider using the 'join' command. Simple example (jpoining files "j1" and "j2"): $ cat j1 1 a 1 b 2 c 2 d 2 e 3 f 3 g 4 h $ cat j2 1 A 1 B 2 C 3 D 3 E 3 F $ join j1 j2 1 a A 1 a B 1 b A 1 b B 2 c C 2 d C 2 e C 3 f D 3 f E 3 f F 3 g D 3 g E 3 g F See 'man join' for details of options which you can use to adapt the command to your needs. Hoping this helps, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861 [NB: New number!] Date: 15-Oct-04 Time: 21:41:44 ------------------------------ XFMail ------------------------------
(Ted Harding)
2004-Oct-16 09:39 UTC
[R] combine many .csv files into a single file/data frame
On 15-Oct-04 bogdan romocea wrote:> I have a few hundred .csv files which I need to put > together (full outer joins on a common variable) to do a > factor analysis. Each file may contain anywhere from a few > hundred to a few thousand rows. What would be the most > efficient way to do this in R? Please include some sample > code if applicable.Sorry, the example I posted last night was written too hastily and does not illustrate your precise query well. Here is a better one, with explanation. $ cat j1 1,a 1,b 2,c 2,d 2,e 3,f 3,g 4,h 4,i 5,j $ cat j2 1,A 1,B 1,C 2,D 2,E 4,F 4,G 5,H 6,I 6,J $ join -t , -a 1 -a 2 -o 0,1.2,2.2 j1 j2 1,a,A 1,a,B 1,a,C 1,b,A 1,b,B 1,b,C 2,c,D 2,c,E 2,d,D 2,d,E 2,e,D 2,e,E 3,f, 3,g, 4,h,F 4,h,G 4,i,F 4,i,G 5,j,H 6,,I 6,,J Explanation of options: "-t ," Input and output field separator is "," (for CSV) "-a 1" Output a line for every line of j1 not matched in j2 "-a 2" Output a line for every line of j2 not matched in j1 "-o 0,1.2,2.2" Output field format specification: 0 denotes the match (join) field (needed when using "-a") 1.2 denotes field 2 from file 1 ("j1") 2.2 denotes field 2 from file 2 ("j2") "j1" and "j2" are of course the two files to be joined. Using the "-a" option gives you the full outer join which you want. This command only works for two files at a time (and you must give two). To join several files you would have to loop through them on the lines of $ join -t , -a 1 -a 2 -o 0,1.2,2.2 j1 j2 > J which creates a file "J" which is the full outer join of "j1", "j2". Then $ join -t , -a 1 -a 2 -o 0,1.2,2.2 J j3 > J and so on through j4, j5, ... For your "few hundred files" this is best done with a loop like $ for i in * ; do join -t , -a 1 -a 2 -o 0,1.2,2.2 J $i > J ; done having first done it for j1, j2 as above and put these two file out of sight in a different directory. J itself also needs to be out of sight otherwise it will get joined to itself at some stage. E.g. where "J" is written above it could in fact be written as "../joins/J" and similarly for "../joins/j1" and "../joins/j2". E.g. first move j1 & j2 to ../joins and then do $ join -t , -a 1 -a 2 -o 0,1.2,2.2 ../joins/j1 ../joins/j2 > ../joins/J and then $ for i in * ; do join -t , -a 1 -a 2 -o 0,1.2,2.2 ../joins/J $i > ../joins/J done Sorry for the previous sloppy response, and hoping the above helps. Best wishes, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861 [NB: New number!] Date: 16-Oct-04 Time: 10:39:05 ------------------------------ XFMail ------------------------------