thr3ads.net - R help - [R] Data Cleaning -New user coming from SAS [Nov 2012]

If this information is useful, please help other people find it:
Share via:

arum

2012-Nov-29 20:25 UTC

[R] Data Cleaning -New user coming from SAS

Hello, this is my first post. I have a large CSV file where I need to fill in
the 1st and 2nd column with a Loan # and Account name that would be found in
a line of text : like this: ,,Loan #:,ML-113-07,Account Name:, Quilting
Boutique,,,,,,,,,,,
I would like to place the Loan #: ML-113-07 in the first column and the
account name quilting boutique in the second column. If possible I would
also like to copy these details all the way down until the end of the
record.  But I would just be happy to know the best way to script the
conditional logic that says something like if column 3 = "Loan #" then
Column 1 eq Column 4 and Column 2 eq column 6.  Here is a snap shot of the
CSV file I am working with.  Thanks
Loan #,Account
Name,Date,Action,Amount,Disbursed,Capitalized,Interest,Fees,Insurance,Principal,Interest,Fees,Insurance,Writeoff,Recovery,Balance
,,,,,,,,,,,,,,,,Page #: 11
,,Date,Action,Amount,Disbursed,Capitalized,Interest,Fees,Insurance,Principal,Interest,Fees,Insurance,Writeoff,Recovery,Balance
,,Loan #:,ML-123-07,Account Name:, Quilting Shop,,,,,,,,,,,
,,11/30/2009,Interest,2.36,0,0,2.36,0,0,0,0,0,0,0,0,1767.76
,,12/24/2009,Payment: Regular,161,0,0,11.33,0,0,147.31,13.69,0,0,0,0,1620.45
,,12/31/2009,Interest,3.03,0,0,3.03,0,0,0,0,0,0,0,0,1620.45
,,01/26/2010,Fee: Late,10,0,0,0,10,0,0,0,0,0,0,0,1620.45
,,01/31/2010,Interest,13.42,0,0,13.42,0,0,0,0,0,0,0,0,1620.45
,,02/09/2010,Payment: Regular,180,0,0,3.9,0,0,149.65,20.35,10,0,0,0,1470.8
,,02/25/2010,Payment: Regular,170,0,0,6.29,0,0,163.71,6.29,0,0,0,0,1307.09
,,02/28/2010,Interest,1.05,0,0,1.05,0,0,0,0,0,0,0,0,1307.09
,,03/25/2010,Payment: Regular,180,0,0,8.73,0,0,170.22,9.78,0,0,0,0,1136.87
,,03/31/2010,Interest,1.82,0,0,1.82,0,0,0,0,0,0,0,0,1136.87
,,04/26/2010,Fee: Late,10,0,0,0,10,0,0,0,0,0,0,0,1136.87
,,04/26/2010,Payment: Regular,165,0,0,7.9,0,0,145.28,9.72,10,0,0,0,991.59
,,04/30/2010,Interest,1.06,0,0,1.06,0,0,0,0,0,0,0,0,991.59
,,05/27/2010,Payment: Regular,999.8,0,0,7.15,0,0,991.59,8.21,0,0,0,0,0
,,,Loan Totals,,5000,0,795.58,172,0,5000,795.58,172,0,0,0,0
,,,,,,,,,,,,,,,,Page #: 12
,,Date,Action,Amount,Disbursed,Capitalized,Interest,Fees,Insurance,Principal,Interest,Fees,Insurance,Writeoff,Recovery,Balance
,,Loan #:,ML-124-07,Account Name:,Tata Bird Farm,,,,,,,,,,,
,,10/26/2007,Commitment,5000,0,0,0,0,0,0,0,0,0,0,0,0
,,10/26/2007,Advance: Principal,5000,5000,0,0,0,0,0,0,0,0,0,0,5000
,,10/26/2007,Fee: Admin,50,0,0,0,50,0,0,0,0,0,0,0,5000
,,10/26/2007,Payment: Customized,50,0,0,0,0,0,0,0,50,0,0,0,5000



--
View this message in context:
http://r.789695.n4.nabble.com/Data-Cleaning-New-user-coming-from-SAS-tp4651348.html
Sent from the R help mailing list archive at Nabble.com.

Jean V Adams

2012-Nov-29 23:12 UTC

head link

[R] Data Cleaning -New user coming from SAS

Other readers of this list may have better suggestions for how to read in 
data with interspersed header rows, but here's a work-around to do 
specifically what you requested ...

# find the rows where "Loan" is in the Date column
sel <- grep("Loan", dat$Date)

# create a new vector with these row numbers as IDs
ID <- rep(NA, dim(dat)[1])
ID[sel] <- sel

# use the na.locf() function from the zoo package to 
# replace each missing ID with the most recent non-missing ID prior to it
library(zoo)
IDlong <- na.locf(ID, na.rm=FALSE)

# use these filled in row numbers to assign values to 
# the Loan.. and Account.Name columns
dat$Loan.. <- dat$Action[IDlong]
dat$Account.Name <- dat$Disbursed[IDlong]

Jean



arum <arumkone@wrdf.org> wrote on 11/29/2012 02:25:06
PM:> 
> Hello, this is my first post. I have a large CSV file where I need to 
fill in> the 1st and 2nd column with a Loan # and Account name that would be 
found in> a line of text : like this: ,,Loan #:,ML-113-07,Account Name:, Quilting
> Boutique,,,,,,,,,,,
> I would like to place the Loan #: ML-113-07 in the first column and the
> account name quilting boutique in the second column. If possible I would
> also like to copy these details all the way down until the end of the
> record.  But I would just be happy to know the best way to script the
> conditional logic that says something like if column 3 = "Loan #"
then
> Column 1 eq Column 4 and Column 2 eq column 6.  Here is a snap shot of 
the> CSV file I am working with.  Thanks
> Loan #,Account
> 
Name,Date,Action,Amount,Disbursed,Capitalized,Interest,Fees,Insurance,Principal,Interest,Fees,Insurance,Writeoff,Recovery,Balance> ,,,,,,,,,,,,,,,,Page #: 11
> 
,,Date,Action,Amount,Disbursed,Capitalized,Interest,Fees,Insurance,Principal,Interest,Fees,Insurance,Writeoff,Recovery,Balance> ,,Loan #:,ML-123-07,Account Name:, Quilting Shop,,,,,,,,,,,
> ,,11/30/2009,Interest,2.36,0,0,2.36,0,0,0,0,0,0,0,0,1767.76
> ,,12/24/2009,Payment: 
Regular,161,0,0,11.33,0,0,147.31,13.69,0,0,0,0,1620.45> ,,12/31/2009,Interest,3.03,0,0,3.03,0,0,0,0,0,0,0,0,1620.45
> ,,01/26/2010,Fee: Late,10,0,0,0,10,0,0,0,0,0,0,0,1620.45
> ,,01/31/2010,Interest,13.42,0,0,13.42,0,0,0,0,0,0,0,0,1620.45
> ,,02/09/2010,Payment: 
Regular,180,0,0,3.9,0,0,149.65,20.35,10,0,0,0,1470.8> ,,02/25/2010,Payment: 
Regular,170,0,0,6.29,0,0,163.71,6.29,0,0,0,0,1307.09> ,,02/28/2010,Interest,1.05,0,0,1.05,0,0,0,0,0,0,0,0,1307.09
> ,,03/25/2010,Payment: 
Regular,180,0,0,8.73,0,0,170.22,9.78,0,0,0,0,1136.87> ,,03/31/2010,Interest,1.82,0,0,1.82,0,0,0,0,0,0,0,0,1136.87
> ,,04/26/2010,Fee: Late,10,0,0,0,10,0,0,0,0,0,0,0,1136.87
> ,,04/26/2010,Payment: 
Regular,165,0,0,7.9,0,0,145.28,9.72,10,0,0,0,991.59> ,,04/30/2010,Interest,1.06,0,0,1.06,0,0,0,0,0,0,0,0,991.59
> ,,05/27/2010,Payment: Regular,999.8,0,0,7.15,0,0,991.59,8.21,0,0,0,0,0
> ,,,Loan Totals,,5000,0,795.58,172,0,5000,795.58,172,0,0,0,0
> ,,,,,,,,,,,,,,,,Page #: 12
> 
,,Date,Action,Amount,Disbursed,Capitalized,Interest,Fees,Insurance,Principal,Interest,Fees,Insurance,Writeoff,Recovery,Balance> ,,Loan #:,ML-124-07,Account Name:,Tata Bird Farm,,,,,,,,,,,
> ,,10/26/2007,Commitment,5000,0,0,0,0,0,0,0,0,0,0,0,0
> ,,10/26/2007,Advance: Principal,5000,5000,0,0,0,0,0,0,0,0,0,0,5000
> ,,10/26/2007,Fee: Admin,50,0,0,0,50,0,0,0,0,0,0,0,5000
> ,,10/26/2007,Payment: Customized,50,0,0,0,0,0,0,0,50,0,0,0,5000
	[[alternative HTML version deleted]]

arun

2012-Nov-30 01:42 UTC

head link

[R] Data Cleaning -New user coming from SAS

HI Arum,

May be this helps:
#dat is the data

?res<-do.call(rbind,lapply(split(dat,cumsum(as.numeric(grepl("Loan",dat$Date)))),function(x){
x[,1]<-x[1,4];x[,2]<-x[1,6];return(x)}))
?row.names(res)<-1:nrow(res)

?res[,1:4]
#????? Loan..?????? Account.Name?????? Date???????????? Action
#1??????????????????????????????????????????????????????????? 
#2???????????????????????????????????? Date???????????? Action
#3? ML-113-07? Quilting Boutique??? Loan #:????????? ML-113-07
#4? ML-113-07? Quilting Boutique 11/30/2009?????????? Interest
#5? ML-113-07? Quilting Boutique 12/24/2009?? Payment: Regular
#6? ML-113-07? Quilting Boutique????????????????????????????? 
#7? ML-113-07? Quilting Boutique?????? Date???????????? Action
#8? ML-114-07????????? Tata Espr??? Loan #:????????? ML-114-07
#9? ML-114-07????????? Tata Espr 10/26/2007???????? Commitment
#10 ML-114-07????????? Tata Espr 10/26/2007 Advance: Principal
A.K.



----- Original Message -----
From: arum <arumkone at wrdf.org>
To: r-help at r-project.org
Cc: 
Sent: Thursday, November 29, 2012 3:25 PM
Subject: [R] Data Cleaning -New user coming from SAS

Hello, this is my first post. I have a large CSV file where I need to fill in
the 1st and 2nd column with a Loan # and Account name that would be found in
a line of text : like this: ,,Loan #:,ML-113-07,Account Name:, Quilting
Boutique,,,,,,,,,,,
I would like to place the Loan #: ML-113-07 in the first column and the
account name quilting boutique in the second column. If possible I would
also like to copy these details all the way down until the end of the
record.? But I would just be happy to know the best way to script the
conditional logic that says something like if column 3 = "Loan #" then
Column 1 eq Column 4 and Column 2 eq column 6.? Here is a snap shot of the
CSV file I am working with.? Thanks
Loan #,Account
Name,Date,Action,Amount,Disbursed,Capitalized,Interest,Fees,Insurance,Principal,Interest,Fees,Insurance,Writeoff,Recovery,Balance
,,,,,,,,,,,,,,,,Page #: 11
,,Date,Action,Amount,Disbursed,Capitalized,Interest,Fees,Insurance,Principal,Interest,Fees,Insurance,Writeoff,Recovery,Balance
,,Loan #:,ML-123-07,Account Name:, Quilting Shop,,,,,,,,,,,
,,11/30/2009,Interest,2.36,0,0,2.36,0,0,0,0,0,0,0,0,1767.76
,,12/24/2009,Payment: Regular,161,0,0,11.33,0,0,147.31,13.69,0,0,0,0,1620.45
,,12/31/2009,Interest,3.03,0,0,3.03,0,0,0,0,0,0,0,0,1620.45
,,01/26/2010,Fee: Late,10,0,0,0,10,0,0,0,0,0,0,0,1620.45
,,01/31/2010,Interest,13.42,0,0,13.42,0,0,0,0,0,0,0,0,1620.45
,,02/09/2010,Payment: Regular,180,0,0,3.9,0,0,149.65,20.35,10,0,0,0,1470.8
,,02/25/2010,Payment: Regular,170,0,0,6.29,0,0,163.71,6.29,0,0,0,0,1307.09
,,02/28/2010,Interest,1.05,0,0,1.05,0,0,0,0,0,0,0,0,1307.09
,,03/25/2010,Payment: Regular,180,0,0,8.73,0,0,170.22,9.78,0,0,0,0,1136.87
,,03/31/2010,Interest,1.82,0,0,1.82,0,0,0,0,0,0,0,0,1136.87
,,04/26/2010,Fee: Late,10,0,0,0,10,0,0,0,0,0,0,0,1136.87
,,04/26/2010,Payment: Regular,165,0,0,7.9,0,0,145.28,9.72,10,0,0,0,991.59
,,04/30/2010,Interest,1.06,0,0,1.06,0,0,0,0,0,0,0,0,991.59
,,05/27/2010,Payment: Regular,999.8,0,0,7.15,0,0,991.59,8.21,0,0,0,0,0
,,,Loan Totals,,5000,0,795.58,172,0,5000,795.58,172,0,0,0,0
,,,,,,,,,,,,,,,,Page #: 12
,,Date,Action,Amount,Disbursed,Capitalized,Interest,Fees,Insurance,Principal,Interest,Fees,Insurance,Writeoff,Recovery,Balance
,,Loan #:,ML-124-07,Account Name:,Tata Bird Farm,,,,,,,,,,,
,,10/26/2007,Commitment,5000,0,0,0,0,0,0,0,0,0,0,0,0
,,10/26/2007,Advance: Principal,5000,5000,0,0,0,0,0,0,0,0,0,0,5000
,,10/26/2007,Fee: Admin,50,0,0,0,50,0,0,0,0,0,0,0,5000
,,10/26/2007,Payment: Customized,50,0,0,0,0,0,0,0,50,0,0,0,5000



--
View this message in context:
http://r.789695.n4.nabble.com/Data-Cleaning-New-user-coming-from-SAS-tp4651348.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Possibly Parallel Threads

Search for more possibly parallel threads

R help - Nov 2012 - Data Cleaning -New user coming from SAS

[R] Data Cleaning -New user coming from SAS

[R] Data Cleaning -New user coming from SAS

[R] Data Cleaning -New user coming from SAS

Possibly Parallel Threads