Denis Chabot
2010-Aug-12 17:57 UTC
[R] reading fixed width format data with 2 types of lines
Hi, I know how to read fixed width format data with read.fwf, but suddenly I need to read in a large number of old fwf files with 2 types of lines. Lines that begin with "3" in first column carry one set of variables, and lines that begin with "4" carry another set, like this: ? 3A00206546L070049016090045 99 1015002 001001008010004002004007003 001 3A00206546L070049006090030 99 1029001002001001006014002 3A00206546L070049002290004 99 1015 001001 3A00206546L070049001692559049033 1015 018036024 3A00206546L070049002290004 99 1001 002 4A00176546L068047090010111000606516400150010000001501063 065914 4A00176546L06804709001011100040761600000000 1092 095614 4A00196546L098000100010111001706214400005010000000051062 065914 4A00176546L06804709001011100050591300000000 1062 065914 4A00196546L098000100010111002604721400020010000000201042 046114 4A00196546L098000100010111002504221400005012000000051042 046114 4A00196546L098000100010111002903721400050012200000501032 036214 ? I have searched for tricks to do this but I must not have used the right keywords, I found nothing. I suppose I could read the entire file as a single character variable for each line, then subset for lines that begin with 3 and save this in an ascii file that will then be reopened with a read.fwf call, and do the same with lines that begin with 4. But this does not appear to me to be very elegant nor efficient? Is there a better method? Thanks in advance, Denis Chabot
Tim Gruene
2010-Aug-12 20:01 UTC
[R] reading fixed width format data with 2 types of lines
I don't know if it's elegant enough for you, but you could split the file into two files with 'grep "^3" file > file_3' and 'grep "^4" file > file_4' and then read them in separately. Tim On Thu, Aug 12, 2010 at 01:57:19PM -0400, Denis Chabot wrote:> Hi, > > I know how to read fixed width format data with read.fwf, but suddenly I need to read in a large number of old fwf files with 2 types of lines. Lines that begin with "3" in first column carry one set of variables, and lines that begin with "4" carry another set, like this: > > ? > 3A00206546L070049016090045 99 1015002 001001008010004002004007003 001 > 3A00206546L070049006090030 99 1029001002001001006014002 > 3A00206546L070049002290004 99 1015 001001 > 3A00206546L070049001692559049033 1015 018036024 > 3A00206546L070049002290004 99 1001 002 > 4A00176546L068047090010111000606516400150010000001501063 065914 > 4A00176546L06804709001011100040761600000000 1092 095614 > 4A00196546L098000100010111001706214400005010000000051062 065914 > 4A00176546L06804709001011100050591300000000 1062 065914 > 4A00196546L098000100010111002604721400020010000000201042 046114 > 4A00196546L098000100010111002504221400005012000000051042 046114 > 4A00196546L098000100010111002903721400050012200000501032 036214 > ? > > I have searched for tricks to do this but I must not have used the right keywords, I found nothing. > > I suppose I could read the entire file as a single character variable for each line, then subset for lines that begin with 3 and save this in an ascii file that will then be reopened with a read.fwf call, and do the same with lines that begin with 4. But this does not appear to me to be very elegant nor efficient? Is there a better method? > > Thanks in advance, > > > Denis Chabot > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- -- Tim Gruene Institut fuer anorganische Chemie Tammannstr. 4 D-37077 Goettingen GPG Key ID = A46BEE1A -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Digital signature URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100812/e032d2e0/attachment.bin>