Hello, this is my first post. I have a large CSV file where I need to fill in the 1st and 2nd column with a Loan # and Account name that would be found in a line of text : like this: ,,Loan #:,ML-113-07,Account Name:, Quilting Boutique,,,,,,,,,,, I would like to place the Loan #: ML-113-07 in the first column and the account name quilting boutique in the second column. If possible I would also like to copy these details all the way down until the end of the record. But I would just be happy to know the best way to script the conditional logic that says something like if column 3 = "Loan #" then Column 1 eq Column 4 and Column 2 eq column 6. Here is a snap shot of the CSV file I am working with. Thanks Loan #,Account Name,Date,Action,Amount,Disbursed,Capitalized,Interest,Fees,Insurance,Principal,Interest,Fees,Insurance,Writeoff,Recovery,Balance ,,,,,,,,,,,,,,,,Page #: 11 ,,Date,Action,Amount,Disbursed,Capitalized,Interest,Fees,Insurance,Principal,Interest,Fees,Insurance,Writeoff,Recovery,Balance ,,Loan #:,ML-123-07,Account Name:, Quilting Shop,,,,,,,,,,, ,,11/30/2009,Interest,2.36,0,0,2.36,0,0,0,0,0,0,0,0,1767.76 ,,12/24/2009,Payment: Regular,161,0,0,11.33,0,0,147.31,13.69,0,0,0,0,1620.45 ,,12/31/2009,Interest,3.03,0,0,3.03,0,0,0,0,0,0,0,0,1620.45 ,,01/26/2010,Fee: Late,10,0,0,0,10,0,0,0,0,0,0,0,1620.45 ,,01/31/2010,Interest,13.42,0,0,13.42,0,0,0,0,0,0,0,0,1620.45 ,,02/09/2010,Payment: Regular,180,0,0,3.9,0,0,149.65,20.35,10,0,0,0,1470.8 ,,02/25/2010,Payment: Regular,170,0,0,6.29,0,0,163.71,6.29,0,0,0,0,1307.09 ,,02/28/2010,Interest,1.05,0,0,1.05,0,0,0,0,0,0,0,0,1307.09 ,,03/25/2010,Payment: Regular,180,0,0,8.73,0,0,170.22,9.78,0,0,0,0,1136.87 ,,03/31/2010,Interest,1.82,0,0,1.82,0,0,0,0,0,0,0,0,1136.87 ,,04/26/2010,Fee: Late,10,0,0,0,10,0,0,0,0,0,0,0,1136.87 ,,04/26/2010,Payment: Regular,165,0,0,7.9,0,0,145.28,9.72,10,0,0,0,991.59 ,,04/30/2010,Interest,1.06,0,0,1.06,0,0,0,0,0,0,0,0,991.59 ,,05/27/2010,Payment: Regular,999.8,0,0,7.15,0,0,991.59,8.21,0,0,0,0,0 ,,,Loan Totals,,5000,0,795.58,172,0,5000,795.58,172,0,0,0,0 ,,,,,,,,,,,,,,,,Page #: 12 ,,Date,Action,Amount,Disbursed,Capitalized,Interest,Fees,Insurance,Principal,Interest,Fees,Insurance,Writeoff,Recovery,Balance ,,Loan #:,ML-124-07,Account Name:,Tata Bird Farm,,,,,,,,,,, ,,10/26/2007,Commitment,5000,0,0,0,0,0,0,0,0,0,0,0,0 ,,10/26/2007,Advance: Principal,5000,5000,0,0,0,0,0,0,0,0,0,0,5000 ,,10/26/2007,Fee: Admin,50,0,0,0,50,0,0,0,0,0,0,0,5000 ,,10/26/2007,Payment: Customized,50,0,0,0,0,0,0,0,50,0,0,0,5000 -- View this message in context: http://r.789695.n4.nabble.com/Data-Cleaning-New-user-coming-from-SAS-tp4651348.html Sent from the R help mailing list archive at Nabble.com.
Other readers of this list may have better suggestions for how to read in
data with interspersed header rows, but here's a work-around to do
specifically what you requested ...
# find the rows where "Loan" is in the Date column
sel <- grep("Loan", dat$Date)
# create a new vector with these row numbers as IDs
ID <- rep(NA, dim(dat)[1])
ID[sel] <- sel
# use the na.locf() function from the zoo package to
# replace each missing ID with the most recent non-missing ID prior to it
library(zoo)
IDlong <- na.locf(ID, na.rm=FALSE)
# use these filled in row numbers to assign values to
# the Loan.. and Account.Name columns
dat$Loan.. <- dat$Action[IDlong]
dat$Account.Name <- dat$Disbursed[IDlong]
Jean
arum <arumkone@wrdf.org> wrote on 11/29/2012 02:25:06
PM:>
> Hello, this is my first post. I have a large CSV file where I need to
fill in> the 1st and 2nd column with a Loan # and Account name that would be
found in> a line of text : like this: ,,Loan #:,ML-113-07,Account Name:, Quilting
> Boutique,,,,,,,,,,,
> I would like to place the Loan #: ML-113-07 in the first column and the
> account name quilting boutique in the second column. If possible I would
> also like to copy these details all the way down until the end of the
> record. But I would just be happy to know the best way to script the
> conditional logic that says something like if column 3 = "Loan #"
then
> Column 1 eq Column 4 and Column 2 eq column 6. Here is a snap shot of
the> CSV file I am working with. Thanks
> Loan #,Account
>
Name,Date,Action,Amount,Disbursed,Capitalized,Interest,Fees,Insurance,Principal,Interest,Fees,Insurance,Writeoff,Recovery,Balance> ,,,,,,,,,,,,,,,,Page #: 11
>
,,Date,Action,Amount,Disbursed,Capitalized,Interest,Fees,Insurance,Principal,Interest,Fees,Insurance,Writeoff,Recovery,Balance> ,,Loan #:,ML-123-07,Account Name:, Quilting Shop,,,,,,,,,,,
> ,,11/30/2009,Interest,2.36,0,0,2.36,0,0,0,0,0,0,0,0,1767.76
> ,,12/24/2009,Payment:
Regular,161,0,0,11.33,0,0,147.31,13.69,0,0,0,0,1620.45> ,,12/31/2009,Interest,3.03,0,0,3.03,0,0,0,0,0,0,0,0,1620.45
> ,,01/26/2010,Fee: Late,10,0,0,0,10,0,0,0,0,0,0,0,1620.45
> ,,01/31/2010,Interest,13.42,0,0,13.42,0,0,0,0,0,0,0,0,1620.45
> ,,02/09/2010,Payment:
Regular,180,0,0,3.9,0,0,149.65,20.35,10,0,0,0,1470.8> ,,02/25/2010,Payment:
Regular,170,0,0,6.29,0,0,163.71,6.29,0,0,0,0,1307.09> ,,02/28/2010,Interest,1.05,0,0,1.05,0,0,0,0,0,0,0,0,1307.09
> ,,03/25/2010,Payment:
Regular,180,0,0,8.73,0,0,170.22,9.78,0,0,0,0,1136.87> ,,03/31/2010,Interest,1.82,0,0,1.82,0,0,0,0,0,0,0,0,1136.87
> ,,04/26/2010,Fee: Late,10,0,0,0,10,0,0,0,0,0,0,0,1136.87
> ,,04/26/2010,Payment:
Regular,165,0,0,7.9,0,0,145.28,9.72,10,0,0,0,991.59> ,,04/30/2010,Interest,1.06,0,0,1.06,0,0,0,0,0,0,0,0,991.59
> ,,05/27/2010,Payment: Regular,999.8,0,0,7.15,0,0,991.59,8.21,0,0,0,0,0
> ,,,Loan Totals,,5000,0,795.58,172,0,5000,795.58,172,0,0,0,0
> ,,,,,,,,,,,,,,,,Page #: 12
>
,,Date,Action,Amount,Disbursed,Capitalized,Interest,Fees,Insurance,Principal,Interest,Fees,Insurance,Writeoff,Recovery,Balance> ,,Loan #:,ML-124-07,Account Name:,Tata Bird Farm,,,,,,,,,,,
> ,,10/26/2007,Commitment,5000,0,0,0,0,0,0,0,0,0,0,0,0
> ,,10/26/2007,Advance: Principal,5000,5000,0,0,0,0,0,0,0,0,0,0,5000
> ,,10/26/2007,Fee: Admin,50,0,0,0,50,0,0,0,0,0,0,0,5000
> ,,10/26/2007,Payment: Customized,50,0,0,0,0,0,0,0,50,0,0,0,5000
[[alternative HTML version deleted]]
HI Arum,
May be this helps:
#dat is the data
?res<-do.call(rbind,lapply(split(dat,cumsum(as.numeric(grepl("Loan",dat$Date)))),function(x){
x[,1]<-x[1,4];x[,2]<-x[1,6];return(x)}))
?row.names(res)<-1:nrow(res)
?res[,1:4]
#????? Loan..?????? Account.Name?????? Date???????????? Action
#1???????????????????????????????????????????????????????????
#2???????????????????????????????????? Date???????????? Action
#3? ML-113-07? Quilting Boutique??? Loan #:????????? ML-113-07
#4? ML-113-07? Quilting Boutique 11/30/2009?????????? Interest
#5? ML-113-07? Quilting Boutique 12/24/2009?? Payment: Regular
#6? ML-113-07? Quilting Boutique?????????????????????????????
#7? ML-113-07? Quilting Boutique?????? Date???????????? Action
#8? ML-114-07????????? Tata Espr??? Loan #:????????? ML-114-07
#9? ML-114-07????????? Tata Espr 10/26/2007???????? Commitment
#10 ML-114-07????????? Tata Espr 10/26/2007 Advance: Principal
A.K.
----- Original Message -----
From: arum <arumkone at wrdf.org>
To: r-help at r-project.org
Cc:
Sent: Thursday, November 29, 2012 3:25 PM
Subject: [R] Data Cleaning -New user coming from SAS
Hello, this is my first post. I have a large CSV file where I need to fill in
the 1st and 2nd column with a Loan # and Account name that would be found in
a line of text : like this: ,,Loan #:,ML-113-07,Account Name:, Quilting
Boutique,,,,,,,,,,,
I would like to place the Loan #: ML-113-07 in the first column and the
account name quilting boutique in the second column. If possible I would
also like to copy these details all the way down until the end of the
record.? But I would just be happy to know the best way to script the
conditional logic that says something like if column 3 = "Loan #" then
Column 1 eq Column 4 and Column 2 eq column 6.? Here is a snap shot of the
CSV file I am working with.? Thanks
Loan #,Account
Name,Date,Action,Amount,Disbursed,Capitalized,Interest,Fees,Insurance,Principal,Interest,Fees,Insurance,Writeoff,Recovery,Balance
,,,,,,,,,,,,,,,,Page #: 11
,,Date,Action,Amount,Disbursed,Capitalized,Interest,Fees,Insurance,Principal,Interest,Fees,Insurance,Writeoff,Recovery,Balance
,,Loan #:,ML-123-07,Account Name:, Quilting Shop,,,,,,,,,,,
,,11/30/2009,Interest,2.36,0,0,2.36,0,0,0,0,0,0,0,0,1767.76
,,12/24/2009,Payment: Regular,161,0,0,11.33,0,0,147.31,13.69,0,0,0,0,1620.45
,,12/31/2009,Interest,3.03,0,0,3.03,0,0,0,0,0,0,0,0,1620.45
,,01/26/2010,Fee: Late,10,0,0,0,10,0,0,0,0,0,0,0,1620.45
,,01/31/2010,Interest,13.42,0,0,13.42,0,0,0,0,0,0,0,0,1620.45
,,02/09/2010,Payment: Regular,180,0,0,3.9,0,0,149.65,20.35,10,0,0,0,1470.8
,,02/25/2010,Payment: Regular,170,0,0,6.29,0,0,163.71,6.29,0,0,0,0,1307.09
,,02/28/2010,Interest,1.05,0,0,1.05,0,0,0,0,0,0,0,0,1307.09
,,03/25/2010,Payment: Regular,180,0,0,8.73,0,0,170.22,9.78,0,0,0,0,1136.87
,,03/31/2010,Interest,1.82,0,0,1.82,0,0,0,0,0,0,0,0,1136.87
,,04/26/2010,Fee: Late,10,0,0,0,10,0,0,0,0,0,0,0,1136.87
,,04/26/2010,Payment: Regular,165,0,0,7.9,0,0,145.28,9.72,10,0,0,0,991.59
,,04/30/2010,Interest,1.06,0,0,1.06,0,0,0,0,0,0,0,0,991.59
,,05/27/2010,Payment: Regular,999.8,0,0,7.15,0,0,991.59,8.21,0,0,0,0,0
,,,Loan Totals,,5000,0,795.58,172,0,5000,795.58,172,0,0,0,0
,,,,,,,,,,,,,,,,Page #: 12
,,Date,Action,Amount,Disbursed,Capitalized,Interest,Fees,Insurance,Principal,Interest,Fees,Insurance,Writeoff,Recovery,Balance
,,Loan #:,ML-124-07,Account Name:,Tata Bird Farm,,,,,,,,,,,
,,10/26/2007,Commitment,5000,0,0,0,0,0,0,0,0,0,0,0,0
,,10/26/2007,Advance: Principal,5000,5000,0,0,0,0,0,0,0,0,0,0,5000
,,10/26/2007,Fee: Admin,50,0,0,0,50,0,0,0,0,0,0,0,5000
,,10/26/2007,Payment: Customized,50,0,0,0,0,0,0,0,50,0,0,0,5000
--
View this message in context:
http://r.789695.n4.nabble.com/Data-Cleaning-New-user-coming-from-SAS-tp4651348.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.