Farley, Robert
2008-Nov-05 23:51 UTC
[R] How do I read a text (.csv) file to match a matrix/cross tab? (Object confusion??)
I'm having a problem reading data to set control totals for a dataframe. I want to adjust a dataframe based on a 2-d table of values, which I get by using :> CurrentX1Sums <- as.matrix(xtabs(~tripid_nu+lineon, data=SurveyData))> CurrentX2Sums <- apply(CurrentX1Sums, 1, sum)I've created a .csv file with new (target) sums that looks like this: tripid_nu Warner Center De Soto Pierce College Tampa Reseda Balboa Woodley Sepulveda Van Nuys Woodman Valley College Laurel Canyon North Hollywood 9011880 5 2 2 2 2 2 2 2 2 2 6 4 1 9011890 1 1 1 1 1 1 2 1 1 1 1 2 1 9011960 2 2 2 1 2 2 1 2 3 2 2 1 1 9011970 1 1 1 1 2 1 1 2 6 2 2 2 24 9012040 2 2 2 3 2 7 2 2 2 1 1 1 1 9012050 1 1 1 1 1 1 1 2 2 2 1 1 5 ...{More}... I'm trying to read/process it like this:> NewTargetData <- read.table("C:/Data/R/NewTarget.csv", header=TRUE,sep=",", na.strings="NA", dec=".")> NewTargetX1Sums <- as.matrix(NewTargetData)> NewTargetX2Sums <- apply(NewTargetX1Sums, 1, sum)The structures of CurrentX1Sums and NewTargetX1Sums are different:> str(CurrentX1Sums)xtabs [1:55, 1:13] 1 0 1 0 1 0 0 0 0 1 ... - attr(*, "dimnames")=List of 2 ..$ tripid_nu: chr [1:55] "9011880" "9011890" "9011960" "9011970" ... ..$ lineon : chr [1:13] "Warner Center" "De Soto" "Pierce College" "Tampa" ... - attr(*, "class")= chr [1:2] "xtabs" "table" - attr(*, "call")= language xtabs(formula = ~tripid_nu + lineon, data SurveyData)> str(NewTargetX1Sums)int [1:55, 1:14] 9011880 9011890 9011960 9011970 9012040 9012050 9012130 9012280 9012290 9012720 ... - attr(*, "dimnames")=List of 2 ..$ : NULL ..$ : chr [1:14] "tripid_nu" "Warner.Center" "De.Soto" "Pierce.College" ...>Question 1) The structures (CurrentX1Sums , NewTargetX1Sums) are different. One way is in the dimension of the rownames. Instead of line numbers, I want tripid_nu. How do I do that? What's the appropriate "structure" for both? Question 2) Why do the labels in NewTargetData have dots in place of spaces? Will that be a problem later when I try to match them with SurveyData? Question 3) Ultimately, I want to create a variable in the original dataframe like: SurveyData$NewX1 = TargetX1Sums/ CurrentX1Sums { for each tripid_nu, lineon combination} Am I on the right track to do so? Any hints on what THAT syntax will look like? Thanks in advance, ######################################################################## ###################################### #My work to date:> SurveyData <- read.spss("C:/Data/R/orange_delivery.sav",use.value.labels=TRUE, max.value.labels=Inf, to.data.frame=TRUE)> NewTargetData <- read.table("C:/Data/R/NewTarget.csv", header=TRUE,sep=",", na.strings="NA", dec=".")>#----------------------------------------------------------------------- --------> temp <- sub(' +$', '', SurveyData$direction_) # Remove spacesfrom variable names> SurveyData$direction_ <- temp>#----------------------------------------------------------------------- --------> SurveyData$StnNum=as.numeric(SurveyData$lineon)> CurrentX1Sums <- as.matrix(xtabs(~tripid_nu+lineon, data=SurveyData))> CurrentX2Sums <- apply(CurrentX1Sums, 1, sum)> NewTargetX1Sums <- as.matrix(NewTargetData)> NewTargetX2Sums <- apply(NewTargetX1Sums, 1, sum)>> str(CurrentX1Sums)xtabs [1:55, 1:13] 1 0 1 0 1 0 0 0 0 1 ... - attr(*, "dimnames")=List of 2 ..$ tripid_nu: chr [1:55] "9011880" "9011890" "9011960" "9011970" ... ..$ lineon : chr [1:13] "Warner Center" "De Soto" "Pierce College" "Tampa" ... - attr(*, "class")= chr [1:2] "xtabs" "table" - attr(*, "call")= language xtabs(formula = ~tripid_nu + lineon, data SurveyData)> str(NewTargetX1Sums)int [1:55, 1:14] 9011880 9011890 9011960 9011970 9012040 9012050 9012130 9012280 9012290 9012720 ... - attr(*, "dimnames")=List of 2 ..$ : NULL ..$ : chr [1:14] "tripid_nu" "Warner.Center" "De.Soto" "Pierce.College" ...>> CurrentX1Sumslineon tripid_nu Warner Center De Soto Pierce College Tampa Reseda Balboa Woodley Sepulveda Van Nuys Woodman Valley College Laurel Canyon North Hollywood 9011880 1 0 2 1 0 2 1 0 0 0 1 0 0 9011890 0 0 0 0 0 0 1 0 0 0 0 1 0 9011960 1 1 2 0 1 1 0 1 3 2 1 0 0 9011970 0 0 0 0 1 0 0 1 6 1 1 1 14 ...{More}...> NewTargetX1Sumstripid_nu Warner.Center De.Soto Pierce.College Tampa Reseda Balboa Woodley Sepulveda Van.Nuys Woodman Valley.College Laurel.Canyon North.Hollywood [1,] 9011880 5 2 2 2 2 2 2 2 2 2 6 4 1 [2,] 9011890 1 1 1 1 1 1 2 1 1 1 1 2 1 [3,] 9011960 2 2 2 1 2 2 1 2 3 2 2 1 1 [4,] 9011970 1 1 1 1 2 1 1 2 6 2 2 2 24 ...{More}... ######################################################################## ###################################### Robert Farley Metro 1 Gateway Plaza Mail Stop 99-23-7 Los Angeles, CA 90012-2952 Voice: (213)922-2532 Fax: (213)922-2868 www.Metro.net [[alternative HTML version deleted]]
Kellie Wills
2008-Nov-06 00:41 UTC
[R] How do I read a text (.csv) file to match a matrix/cross tab? (Object confusion??)
read.table doesn't realize the first column should be row names. Try read.table("C:/Data/R/NewTarget.csv", header=TRUE, sep=",", na.strings="NA", dec=".", row.names="tripid_nu") Kellie Wills Engineering Service Manager REvolution Computing kellie at revolution-computing.com On Nov 5, 2008, at 3:51 PM, Farley, Robert wrote:> I'm having a problem reading data to set control totals for a > dataframe. > I want to adjust a dataframe based on a 2-d table of values, which I > get > by using : > >> CurrentX1Sums <- as.matrix(xtabs(~tripid_nu+lineon, data=SurveyData)) > > >> CurrentX2Sums <- apply(CurrentX1Sums, 1, sum) > > > > I've created a .csv file with new (target) sums that looks like this: > > tripid_nu Warner Center De Soto Pierce College Tampa > Reseda Balboa Woodley Sepulveda Van Nuys > Woodman > Valley College Laurel Canyon North Hollywood > > 9011880 5 2 2 2 2 2 2 > 2 2 2 6 4 1 > > 9011890 1 1 1 1 1 1 2 > 1 1 1 1 2 1 > > 9011960 2 2 2 1 2 2 1 > 2 3 2 2 1 1 > > 9011970 1 1 1 1 2 1 1 > 2 6 2 2 2 24 > > 9012040 2 2 2 3 2 7 2 > 2 2 1 1 1 1 > > 9012050 1 1 1 1 1 1 1 > 2 2 2 1 1 5 > > ...{More}... > > > > I'm trying to read/process it like this: > >> NewTargetData <- read.table("C:/Data/R/NewTarget.csv", header=TRUE, > sep=",", na.strings="NA", dec=".") > >> NewTargetX1Sums <- as.matrix(NewTargetData) > >> NewTargetX2Sums <- apply(NewTargetX1Sums, 1, sum) > > > > > > The structures of CurrentX1Sums and NewTargetX1Sums are different: > > > >> str(CurrentX1Sums) > > xtabs [1:55, 1:13] 1 0 1 0 1 0 0 0 0 1 ... > > - attr(*, "dimnames")=List of 2 > > ..$ tripid_nu: chr [1:55] "9011880" "9011890" "9011960" "9011970" ... > > ..$ lineon : chr [1:13] "Warner Center" "De Soto" "Pierce College" > "Tampa" ... > > - attr(*, "class")= chr [1:2] "xtabs" "table" > > - attr(*, "call")= language xtabs(formula = ~tripid_nu + lineon, > data > SurveyData) > > > >> str(NewTargetX1Sums) > > int [1:55, 1:14] 9011880 9011890 9011960 9011970 9012040 9012050 > 9012130 9012280 9012290 9012720 ... > > - attr(*, "dimnames")=List of 2 > > ..$ : NULL > > ..$ : chr [1:14] "tripid_nu" "Warner.Center" "De.Soto" > "Pierce.College" ... > >> > > > > > > > > > > Question 1) The structures (CurrentX1Sums , NewTargetX1Sums) are > different. One way is in the dimension of the rownames. Instead of > line numbers, I want tripid_nu. How do I do that? What's the > appropriate "structure" for both? > > > > > > Question 2) Why do the labels in NewTargetData have dots in place of > spaces? Will that be a problem later when I try to match them with > SurveyData? > > > > > > Question 3) Ultimately, I want to create a variable in the original > dataframe like: > > SurveyData$NewX1 = TargetX1Sums/ CurrentX1Sums { for each > tripid_nu, lineon combination} > > Am I on the right track to do so? Any hints on what THAT syntax will > look like? > > > > > > > > > > Thanks in advance, > > > > > > > > > > > > ######################################################################## > ###################################### > > #My work to date: > >> SurveyData <- read.spss("C:/Data/R/orange_delivery.sav", > use.value.labels=TRUE, max.value.labels=Inf, to.data.frame=TRUE) > >> NewTargetData <- read.table("C:/Data/R/NewTarget.csv", header=TRUE, > sep=",", na.strings="NA", dec=".") > >> > #----------------------------------------------------------------------- > -------- > >> temp <- sub(' +$', '', SurveyData$direction_) # Remove spaces > from variable names > >> SurveyData$direction_ <- temp > >> > #----------------------------------------------------------------------- > -------- > >> SurveyData$StnNum=as.numeric(SurveyData$lineon) > >> CurrentX1Sums <- as.matrix(xtabs(~tripid_nu+lineon, data=SurveyData)) > > >> CurrentX2Sums <- apply(CurrentX1Sums, 1, sum) > >> NewTargetX1Sums <- as.matrix(NewTargetData) > >> NewTargetX2Sums <- apply(NewTargetX1Sums, 1, sum) > >> > >> str(CurrentX1Sums) > > xtabs [1:55, 1:13] 1 0 1 0 1 0 0 0 0 1 ... > > - attr(*, "dimnames")=List of 2 > > ..$ tripid_nu: chr [1:55] "9011880" "9011890" "9011960" "9011970" ... > > ..$ lineon : chr [1:13] "Warner Center" "De Soto" "Pierce College" > "Tampa" ... > > - attr(*, "class")= chr [1:2] "xtabs" "table" > > - attr(*, "call")= language xtabs(formula = ~tripid_nu + lineon, > data > SurveyData) > >> str(NewTargetX1Sums) > > int [1:55, 1:14] 9011880 9011890 9011960 9011970 9012040 9012050 > 9012130 9012280 9012290 9012720 ... > > - attr(*, "dimnames")=List of 2 > > ..$ : NULL > > ..$ : chr [1:14] "tripid_nu" "Warner.Center" "De.Soto" > "Pierce.College" ... > >> > > > >> CurrentX1Sums > > lineon > > tripid_nu Warner Center De Soto Pierce College Tampa Reseda Balboa > Woodley Sepulveda Van Nuys Woodman Valley College Laurel Canyon North > Hollywood > > 9011880 1 0 2 1 0 2 > 1 0 0 0 1 0 > 0 > > 9011890 0 0 0 0 0 0 > 1 0 0 0 0 1 > 0 > > 9011960 1 1 2 0 1 1 > 0 1 3 2 1 0 > 0 > > 9011970 0 0 0 0 1 0 > 0 1 6 1 1 1 > 14 > > ...{More}... > >> NewTargetX1Sums > > tripid_nu Warner.Center De.Soto Pierce.College Tampa Reseda > Balboa > Woodley Sepulveda Van.Nuys Woodman Valley.College Laurel.Canyon > North.Hollywood > > [1,] 9011880 5 2 2 2 > 2 2 > 2 2 2 2 6 4 > 1 > > [2,] 9011890 1 1 1 1 > 1 1 > 2 1 1 1 1 2 > 1 > > [3,] 9011960 2 2 2 1 > 2 2 > 1 2 3 2 2 1 > 1 > > [4,] 9011970 1 1 1 1 > 2 1 > 1 2 6 2 2 2 > 24 > > ...{More}... > > > > > > > > ######################################################################## > ###################################### > > > > > > > > > > Robert Farley > > Metro > > 1 Gateway Plaza > > Mail Stop 99-23-7 > > Los Angeles, CA 90012-2952 > > Voice: (213)922-2532 > > Fax: (213)922-2868 > > www.Metro.net > > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.