j.joshua thomas
2007-Mar-10 05:45 UTC
[R] read a irregular text file data into dataframe()
I am using R2.4.1 calling a text file contains the following data structure: when i call the file into R using tData<-read.table("c:\\test.txt") it gave me Error saying, irregular column in the data set however i need to use the below type of data Is there any alternative in R? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 0010 0028 0061 0088 0010 0042 0084 0004 0010 0055 0010 0018 0040 0042 0010 0046 0059 0010 0016 0042 0055 0010 0012 0018 0054 0010 0034 0042 0102 0081 0001 0076 0085 0080 0086 0017 0032 0081 0004 0010 0055 0010 0042 0061 0080 0010 0017 0078 0084 0006 0010 0040 0042 0075 0080 0005 0028 0032 0006 0010 0040 0061 -- Lecturer J. Joshua Thomas KDU College Penang Campus Research Student, University Sains Malaysia [[alternative HTML version deleted]]
I don't know of any canned function to do this but you can write your own function (see contents below) to: (1) open file connection (2) read number of fields (3) create empty matrix with the number of rows and maximum number of columns of your data (4) rewind to beginning of file (5) scan line-by-line and fill the matrix (6) close the file connection (7) convert matrix to data frame (8) use the function type.convert to automatically convert numerical columns to mode numeric (since scan(), as I've specified it, reads in everything as mode character, which converts the holding matrix's mode to character from its default of logical). the function below will work for your example data set, but to make it more general, you can add arguments like 'what' to scan(), 'sep' to both count.fields() and scan(); depending on whether you have column names you can modify it accordingly as well. # call function with this line df <- read.irregular("c:\\test.txt") # this is the function read.irregular <- function(filenm) { fileID <- file(filenm,open="rt") nFields <- count.fields(fileID) mat <- matrix(nrow=length(nFields),ncol=max(nFields)) invisible(seek(fileID,where=0,origin="start",rw="read")) for(i in 1:nrow(mat) ) { mat[i,1:nFields[i]] <-scan(fileID,what="",nlines=1,quiet=TRUE) } close(fileID) df <- as.data.frame(mat) df[] <- lapply(df,type.convert,as.is=TRUE) return(df) } Hope this helps. --- "j.joshua thomas" <researchjj at gmail.com> wrote:> I am using R2.4.1 calling a text file contains the following data > structure: > > when i call the file into R using > > tData<-read.table("c:\\test.txt") > > it gave me Error saying, irregular column in the data set > however i need to use the below type of data > > Is there any alternative in R? > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > 0010 0028 0061 0088 > 0010 0042 0084 > 0004 0010 0055 > 0010 0018 0040 0042 > 0010 0046 0059 > 0010 0016 0042 0055 > 0010 0012 0018 0054 > 0010 0034 0042 0102 > 0081 > 0001 0076 0085 > 0080 0086 > 0017 0032 0081 > 0004 0010 0055 > 0010 0042 0061 0080 > 0010 0017 0078 0084 > 0006 0010 0040 0042 > 0075 0080 > 0005 0028 0032 > 0006 0010 0040 0061 > -- > Lecturer J. Joshua Thomas > KDU College Penang Campus > Research Student, > University Sains Malaysia > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >____________________________________________________________________________________ It's here! Your new message! Get new email alerts with the free Yahoo! Toolbar.
Petr Klasterecky
2007-Mar-10 07:42 UTC
[R] read a irregular text file data into dataframe()
read.table("c:\\test.txt",fill=TRUE) Petr j.joshua thomas napsal(a):> I am using R2.4.1 calling a text file contains the following data structure: > > when i call the file into R using > > tData<-read.table("c:\\test.txt") > > it gave me Error saying, irregular column in the data set > however i need to use the below type of data > > Is there any alternative in R? > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > 0010 0028 0061 0088 > 0010 0042 0084 > 0004 0010 0055 > 0010 0018 0040 0042 > 0010 0046 0059 > 0010 0016 0042 0055 > 0010 0012 0018 0054 > 0010 0034 0042 0102 > 0081 > 0001 0076 0085 > 0080 0086 > 0017 0032 0081 > 0004 0010 0055 > 0010 0042 0061 0080 > 0010 0017 0078 0084 > 0006 0010 0040 0042 > 0075 0080 > 0005 0028 0032 > 0006 0010 0040 0061-- Petr Klasterecky Dept. of Probability and Statistics Charles University in Prague Czech Republic