Hello, this is my first post. I have a large CSV file where I need to fill in the 1st and 2nd column with a Loan # and Account name that would be found in a line of text : like this: ,,Loan #:,ML-113-07,Account Name:, Quilting Boutique,,,,,,,,,,, I would like to place the Loan #: ML-113-07 in the first column and the account name quilting boutique in the second column. If possible I would also like to copy these details all the way down until the end of the record. But I would just be happy to know the best way to script the conditional logic that says something like if column 3 = "Loan #" then Column 1 eq Column 4 and Column 2 eq column 6. Here is a snap shot of the CSV file I am working with. Thanks Loan #,Account Name,Date,Action,Amount,Disbursed,Capitalized,Interest,Fees,Insurance,Principal,Interest,Fees,Insurance,Writeoff,Recovery,Balance ,,,,,,,,,,,,,,,,Page #: 11 ,,Date,Action,Amount,Disbursed,Capitalized,Interest,Fees,Insurance,Principal,Interest,Fees,Insurance,Writeoff,Recovery,Balance ,,Loan #:,ML-123-07,Account Name:, Quilting Shop,,,,,,,,,,, ,,11/30/2009,Interest,2.36,0,0,2.36,0,0,0,0,0,0,0,0,1767.76 ,,12/24/2009,Payment: Regular,161,0,0,11.33,0,0,147.31,13.69,0,0,0,0,1620.45 ,,12/31/2009,Interest,3.03,0,0,3.03,0,0,0,0,0,0,0,0,1620.45 ,,01/26/2010,Fee: Late,10,0,0,0,10,0,0,0,0,0,0,0,1620.45 ,,01/31/2010,Interest,13.42,0,0,13.42,0,0,0,0,0,0,0,0,1620.45 ,,02/09/2010,Payment: Regular,180,0,0,3.9,0,0,149.65,20.35,10,0,0,0,1470.8 ,,02/25/2010,Payment: Regular,170,0,0,6.29,0,0,163.71,6.29,0,0,0,0,1307.09 ,,02/28/2010,Interest,1.05,0,0,1.05,0,0,0,0,0,0,0,0,1307.09 ,,03/25/2010,Payment: Regular,180,0,0,8.73,0,0,170.22,9.78,0,0,0,0,1136.87 ,,03/31/2010,Interest,1.82,0,0,1.82,0,0,0,0,0,0,0,0,1136.87 ,,04/26/2010,Fee: Late,10,0,0,0,10,0,0,0,0,0,0,0,1136.87 ,,04/26/2010,Payment: Regular,165,0,0,7.9,0,0,145.28,9.72,10,0,0,0,991.59 ,,04/30/2010,Interest,1.06,0,0,1.06,0,0,0,0,0,0,0,0,991.59 ,,05/27/2010,Payment: Regular,999.8,0,0,7.15,0,0,991.59,8.21,0,0,0,0,0 ,,,Loan Totals,,5000,0,795.58,172,0,5000,795.58,172,0,0,0,0 ,,,,,,,,,,,,,,,,Page #: 12 ,,Date,Action,Amount,Disbursed,Capitalized,Interest,Fees,Insurance,Principal,Interest,Fees,Insurance,Writeoff,Recovery,Balance ,,Loan #:,ML-124-07,Account Name:,Tata Bird Farm,,,,,,,,,,, ,,10/26/2007,Commitment,5000,0,0,0,0,0,0,0,0,0,0,0,0 ,,10/26/2007,Advance: Principal,5000,5000,0,0,0,0,0,0,0,0,0,0,5000 ,,10/26/2007,Fee: Admin,50,0,0,0,50,0,0,0,0,0,0,0,5000 ,,10/26/2007,Payment: Customized,50,0,0,0,0,0,0,0,50,0,0,0,5000 -- View this message in context: http://r.789695.n4.nabble.com/Data-Cleaning-New-user-coming-from-SAS-tp4651348.html Sent from the R help mailing list archive at Nabble.com.
Other readers of this list may have better suggestions for how to read in data with interspersed header rows, but here's a work-around to do specifically what you requested ... # find the rows where "Loan" is in the Date column sel <- grep("Loan", dat$Date) # create a new vector with these row numbers as IDs ID <- rep(NA, dim(dat)[1]) ID[sel] <- sel # use the na.locf() function from the zoo package to # replace each missing ID with the most recent non-missing ID prior to it library(zoo) IDlong <- na.locf(ID, na.rm=FALSE) # use these filled in row numbers to assign values to # the Loan.. and Account.Name columns dat$Loan.. <- dat$Action[IDlong] dat$Account.Name <- dat$Disbursed[IDlong] Jean arum <arumkone@wrdf.org> wrote on 11/29/2012 02:25:06 PM:> > Hello, this is my first post. I have a large CSV file where I need tofill in> the 1st and 2nd column with a Loan # and Account name that would befound in> a line of text : like this: ,,Loan #:,ML-113-07,Account Name:, Quilting > Boutique,,,,,,,,,,, > I would like to place the Loan #: ML-113-07 in the first column and the > account name quilting boutique in the second column. If possible I would > also like to copy these details all the way down until the end of the > record. But I would just be happy to know the best way to script the > conditional logic that says something like if column 3 = "Loan #" then > Column 1 eq Column 4 and Column 2 eq column 6. Here is a snap shot ofthe> CSV file I am working with. Thanks > Loan #,Account >Name,Date,Action,Amount,Disbursed,Capitalized,Interest,Fees,Insurance,Principal,Interest,Fees,Insurance,Writeoff,Recovery,Balance> ,,,,,,,,,,,,,,,,Page #: 11 >,,Date,Action,Amount,Disbursed,Capitalized,Interest,Fees,Insurance,Principal,Interest,Fees,Insurance,Writeoff,Recovery,Balance> ,,Loan #:,ML-123-07,Account Name:, Quilting Shop,,,,,,,,,,, > ,,11/30/2009,Interest,2.36,0,0,2.36,0,0,0,0,0,0,0,0,1767.76 > ,,12/24/2009,Payment:Regular,161,0,0,11.33,0,0,147.31,13.69,0,0,0,0,1620.45> ,,12/31/2009,Interest,3.03,0,0,3.03,0,0,0,0,0,0,0,0,1620.45 > ,,01/26/2010,Fee: Late,10,0,0,0,10,0,0,0,0,0,0,0,1620.45 > ,,01/31/2010,Interest,13.42,0,0,13.42,0,0,0,0,0,0,0,0,1620.45 > ,,02/09/2010,Payment:Regular,180,0,0,3.9,0,0,149.65,20.35,10,0,0,0,1470.8> ,,02/25/2010,Payment:Regular,170,0,0,6.29,0,0,163.71,6.29,0,0,0,0,1307.09> ,,02/28/2010,Interest,1.05,0,0,1.05,0,0,0,0,0,0,0,0,1307.09 > ,,03/25/2010,Payment:Regular,180,0,0,8.73,0,0,170.22,9.78,0,0,0,0,1136.87> ,,03/31/2010,Interest,1.82,0,0,1.82,0,0,0,0,0,0,0,0,1136.87 > ,,04/26/2010,Fee: Late,10,0,0,0,10,0,0,0,0,0,0,0,1136.87 > ,,04/26/2010,Payment:Regular,165,0,0,7.9,0,0,145.28,9.72,10,0,0,0,991.59> ,,04/30/2010,Interest,1.06,0,0,1.06,0,0,0,0,0,0,0,0,991.59 > ,,05/27/2010,Payment: Regular,999.8,0,0,7.15,0,0,991.59,8.21,0,0,0,0,0 > ,,,Loan Totals,,5000,0,795.58,172,0,5000,795.58,172,0,0,0,0 > ,,,,,,,,,,,,,,,,Page #: 12 >,,Date,Action,Amount,Disbursed,Capitalized,Interest,Fees,Insurance,Principal,Interest,Fees,Insurance,Writeoff,Recovery,Balance> ,,Loan #:,ML-124-07,Account Name:,Tata Bird Farm,,,,,,,,,,, > ,,10/26/2007,Commitment,5000,0,0,0,0,0,0,0,0,0,0,0,0 > ,,10/26/2007,Advance: Principal,5000,5000,0,0,0,0,0,0,0,0,0,0,5000 > ,,10/26/2007,Fee: Admin,50,0,0,0,50,0,0,0,0,0,0,0,5000 > ,,10/26/2007,Payment: Customized,50,0,0,0,0,0,0,0,50,0,0,0,5000[[alternative HTML version deleted]]
HI Arum, May be this helps: #dat is the data ?res<-do.call(rbind,lapply(split(dat,cumsum(as.numeric(grepl("Loan",dat$Date)))),function(x){ x[,1]<-x[1,4];x[,2]<-x[1,6];return(x)})) ?row.names(res)<-1:nrow(res) ?res[,1:4] #????? Loan..?????? Account.Name?????? Date???????????? Action #1??????????????????????????????????????????????????????????? #2???????????????????????????????????? Date???????????? Action #3? ML-113-07? Quilting Boutique??? Loan #:????????? ML-113-07 #4? ML-113-07? Quilting Boutique 11/30/2009?????????? Interest #5? ML-113-07? Quilting Boutique 12/24/2009?? Payment: Regular #6? ML-113-07? Quilting Boutique????????????????????????????? #7? ML-113-07? Quilting Boutique?????? Date???????????? Action #8? ML-114-07????????? Tata Espr??? Loan #:????????? ML-114-07 #9? ML-114-07????????? Tata Espr 10/26/2007???????? Commitment #10 ML-114-07????????? Tata Espr 10/26/2007 Advance: Principal A.K. ----- Original Message ----- From: arum <arumkone at wrdf.org> To: r-help at r-project.org Cc: Sent: Thursday, November 29, 2012 3:25 PM Subject: [R] Data Cleaning -New user coming from SAS Hello, this is my first post. I have a large CSV file where I need to fill in the 1st and 2nd column with a Loan # and Account name that would be found in a line of text : like this: ,,Loan #:,ML-113-07,Account Name:, Quilting Boutique,,,,,,,,,,, I would like to place the Loan #: ML-113-07 in the first column and the account name quilting boutique in the second column. If possible I would also like to copy these details all the way down until the end of the record.? But I would just be happy to know the best way to script the conditional logic that says something like if column 3 = "Loan #" then Column 1 eq Column 4 and Column 2 eq column 6.? Here is a snap shot of the CSV file I am working with.? Thanks Loan #,Account Name,Date,Action,Amount,Disbursed,Capitalized,Interest,Fees,Insurance,Principal,Interest,Fees,Insurance,Writeoff,Recovery,Balance ,,,,,,,,,,,,,,,,Page #: 11 ,,Date,Action,Amount,Disbursed,Capitalized,Interest,Fees,Insurance,Principal,Interest,Fees,Insurance,Writeoff,Recovery,Balance ,,Loan #:,ML-123-07,Account Name:, Quilting Shop,,,,,,,,,,, ,,11/30/2009,Interest,2.36,0,0,2.36,0,0,0,0,0,0,0,0,1767.76 ,,12/24/2009,Payment: Regular,161,0,0,11.33,0,0,147.31,13.69,0,0,0,0,1620.45 ,,12/31/2009,Interest,3.03,0,0,3.03,0,0,0,0,0,0,0,0,1620.45 ,,01/26/2010,Fee: Late,10,0,0,0,10,0,0,0,0,0,0,0,1620.45 ,,01/31/2010,Interest,13.42,0,0,13.42,0,0,0,0,0,0,0,0,1620.45 ,,02/09/2010,Payment: Regular,180,0,0,3.9,0,0,149.65,20.35,10,0,0,0,1470.8 ,,02/25/2010,Payment: Regular,170,0,0,6.29,0,0,163.71,6.29,0,0,0,0,1307.09 ,,02/28/2010,Interest,1.05,0,0,1.05,0,0,0,0,0,0,0,0,1307.09 ,,03/25/2010,Payment: Regular,180,0,0,8.73,0,0,170.22,9.78,0,0,0,0,1136.87 ,,03/31/2010,Interest,1.82,0,0,1.82,0,0,0,0,0,0,0,0,1136.87 ,,04/26/2010,Fee: Late,10,0,0,0,10,0,0,0,0,0,0,0,1136.87 ,,04/26/2010,Payment: Regular,165,0,0,7.9,0,0,145.28,9.72,10,0,0,0,991.59 ,,04/30/2010,Interest,1.06,0,0,1.06,0,0,0,0,0,0,0,0,991.59 ,,05/27/2010,Payment: Regular,999.8,0,0,7.15,0,0,991.59,8.21,0,0,0,0,0 ,,,Loan Totals,,5000,0,795.58,172,0,5000,795.58,172,0,0,0,0 ,,,,,,,,,,,,,,,,Page #: 12 ,,Date,Action,Amount,Disbursed,Capitalized,Interest,Fees,Insurance,Principal,Interest,Fees,Insurance,Writeoff,Recovery,Balance ,,Loan #:,ML-124-07,Account Name:,Tata Bird Farm,,,,,,,,,,, ,,10/26/2007,Commitment,5000,0,0,0,0,0,0,0,0,0,0,0,0 ,,10/26/2007,Advance: Principal,5000,5000,0,0,0,0,0,0,0,0,0,0,5000 ,,10/26/2007,Fee: Admin,50,0,0,0,50,0,0,0,0,0,0,0,5000 ,,10/26/2007,Payment: Customized,50,0,0,0,0,0,0,0,50,0,0,0,5000 -- View this message in context: http://r.789695.n4.nabble.com/Data-Cleaning-New-user-coming-from-SAS-tp4651348.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.