Dear Contributors, thanks for collaboration. I am trying to reorganize data frame, that looks like this: n1.Index Date PX_LAST n2.Index Date.1 PX_LAST.1 n3.Index Date.2 PX_LAST.2 1 NA 04/02/07 1.34 NA 04/02/07 1.36 NA 04/02/07 1.33 2 NA 04/09/07 1.34 NA 04/09/07 1.36 NA 04/09/07 1.33 3 NA 04/16/07 1.34 NA 04/16/07 1.36 NA 04/16/07 1.33 4 NA 04/30/07 1.36 NA 04/30/07 1.40 NA 04/30/07 1.37 5 NA 05/07/07 1.36 NA 05/07/07 1.40 NA 05/07/07 1.37 6 NA 05/14/07 1.36 NA 05/14/07 1.40 NA 05/14/07 1.37 7 NA 05/22/07 1.36 NA 05/22/07 1.40 NA 05/22/07 1.37 While what I would like to obtain is: I would like to obtain stacked data as: n1.Index Date PX_LAST n1.Index 04/02/07 1.34 n1.Index 04/09/07 1.34 n1.Index 04/16/07 1.34 n1.Index 04/30/07 1.36 n1.Index 05/07/07 1.36 n1.Index 05/14/07 1.36 n1.Index 05/22/07 1.36 n2.Index 04/02/07 1.36 n2.Index 04/16/07 1.36 n2.Index 04/16/07 1.36 n2.Index 04/30/07 1.40 n2.Index 05/07/07 1.40 n2.Index 05/14/07 1.40 n2.Index 05/22/07 1.40 n3.Index 04/02/07 1.33 n3.Index 04/16/07 1.33 n3.Index 04/16/07 1.33 n3.Index 04/30/07 1.37 I have tried the function stack, but it uses only one argument. Then I have tested the melt function from the package reshape, but it seems not to be reproducing the correct organization of the data, as it takes date as the id values. PS: the n1 index names are not ordered in the original database, so I cannot fill in the NA with the names using a recursive formula. Thank you for any help you can provide. Francesca -- Francesca ---------------------------------- Francesca Pancotto, PhD Dipartimento di Economia Università di Bologna Piazza Scaravilli, 2 40126 Bologna Office: +39 051 2098135 Cell: +39 393 6019138 Web: http://www2.dse.unibo.it/francesca.pancotto/ ---------------------------------- [[alternative HTML version deleted]]
On 07/27/2011 06:28 PM, Francesca wrote:> Dear Contributors, > thanks for collaboration. > I am trying to reorganize data frame, that looks like this: > > n1.Index Date PX_LAST n2.Index Date.1 PX_LAST.1 > n3.Index Date.2 PX_LAST.2 > 1 NA 04/02/07 1.34 NA 04/02/07 1.36 > NA 04/02/07 1.33 > 2 NA 04/09/07 1.34 NA 04/09/07 > 1.36 NA 04/09/07 1.33 > 3 NA 04/16/07 1.34 NA 04/16/07 1.36 > NA 04/16/07 1.33 > 4 NA 04/30/07 1.36 NA 04/30/07 > 1.40 NA 04/30/07 1.37 > 5 NA 05/07/07 1.36 NA 05/07/07 > 1.40 NA 05/07/07 1.37 > 6 NA 05/14/07 1.36 NA 05/14/07 1.40 > NA 05/14/07 1.37 > 7 NA 05/22/07 1.36 NA 05/22/07 1.40 > NA 05/22/07 1.37 > > > While what I would like to obtain is: > I would like to obtain stacked data as: > > n1.Index Date PX_LAST > n1.Index 04/02/07 1.34 > n1.Index 04/09/07 1.34 > n1.Index 04/16/07 1.34 > n1.Index 04/30/07 1.36 > n1.Index 05/07/07 1.36 > n1.Index 05/14/07 1.36 > n1.Index 05/22/07 1.36 > n2.Index 04/02/07 1.36 > n2.Index 04/16/07 1.36 > n2.Index 04/16/07 1.36 > n2.Index 04/30/07 1.40 > n2.Index 05/07/07 1.40 > n2.Index 05/14/07 1.40 > n2.Index 05/22/07 1.40 > n3.Index 04/02/07 1.33 > n3.Index 04/16/07 1.33 > n3.Index 04/16/07 1.33 > n3.Index 04/30/07 1.37 > > I have tried the function stack, but it uses only one argument. Then I > have tested the melt function from the package reshape, but it > seems not to be reproducing the correct organization of the data, as > it takes date as the id values. > PS: the n1 index names are not ordered in the original database, so > I cannot fill in the NA with the names using a recursive formula.Hi Francesca, Oddly enough, I answered a similar question a few days ago. The function below turns one or more columns in a data frame into two columns, one a factor that defaults to the name(s) of the columns and the other the data that was in that column. It also "stretches" the remaining columns in the data frame to the same number of rows and sticks the two together. It doesn't do exactly what you show above, but it might be good enough. A bit of coding could get the factor levels the way you want. stretch.var<-function(data,to.stretch, stretch.names=c("newvar","scores")) { datadim<-dim(data) to.rep<-which(!(1:datadim[2] %in% to.stretch)) nrep<-length(to.rep) newDF<-data.frame(rep(data[,to.rep[1]],length(to.stretch))) if(nrep > 1) { for(repvar in 2:nrep) newDF[[repvar]]<-rep(data[[to.rep[repvar]]],length(to.stretch)) } newDF<-cbind(newDF,rep(names(data[,to.stretch]),each=datadim[1]), unlist(data[,to.stretch])) names(newDF)<-c(names(data[to.rep]),stretch.names) rownames(newDF)<-NULL return(newDF) } # read in the data fp<-read.table("fp.dat",header=TRUE) # pass only the columns that you want in the result stretch.var(fp[,c(2,3,6,9)],2:4,c("n1.index","PX_LAST")) Date n1.index PX_LAST 1 04/02/07 PX_LAST 1.34 2 04/09/07 PX_LAST 1.34 3 04/16/07 PX_LAST 1.34 4 04/30/07 PX_LAST 1.36 5 05/07/07 PX_LAST 1.36 6 05/14/07 PX_LAST 1.36 7 05/22/07 PX_LAST 1.36 8 04/02/07 PX_LAST.1 1.36 9 04/09/07 PX_LAST.1 1.36 10 04/16/07 PX_LAST.1 1.36 11 04/30/07 PX_LAST.1 1.40 12 05/07/07 PX_LAST.1 1.40 13 05/14/07 PX_LAST.1 1.40 14 05/22/07 PX_LAST.1 1.40 15 04/02/07 PX_LAST.2 1.33 16 04/09/07 PX_LAST.2 1.33 17 04/16/07 PX_LAST.2 1.33 18 04/30/07 PX_LAST.2 1.37 19 05/07/07 PX_LAST.2 1.37 20 05/14/07 PX_LAST.2 1.37 21 05/22/07 PX_LAST.2 1.37 Jim
David Winsemius
2011-Jul-27 12:24 UTC
[R] Reorganize(stack data) a dataframe inducing names
On Jul 27, 2011, at 4:28 AM, Francesca wrote:> Dear Contributors, > thanks for collaboration. > I am trying to reorganize data frame, that looks like this: > > n1.Index Date PX_LAST n2.Index Date.1 PX_LAST.1 > n3.Index Date.2 PX_LAST.2 > 1 NA 04/02/07 1.34 NA 04/02/07 > 1.36 > NA 04/02/07 1.33 > 2 NA 04/09/07 1.34 NA 04/09/07 > 1.36 NA 04/09/07 1.33 > 3 NA 04/16/07 1.34 NA 04/16/07 > 1.36 > NA 04/16/07 1.33 > 4 NA 04/30/07 1.36 NA 04/30/07 > 1.40 NA 04/30/07 1.37 > 5 NA 05/07/07 1.36 NA 05/07/07 > 1.40 NA 05/07/07 1.37 > 6 NA 05/14/07 1.36 NA 05/14/07 > 1.40 > NA 05/14/07 1.37 > 7 NA 05/22/07 1.36 NA 05/22/07 > 1.40 > NA 05/22/07 1.37 > > > While what I would like to obtain is: > I would like to obtain stacked data as: > > n1.Index Date PX_LAST > n1.Index 04/02/07 1.34 > n1.Index 04/09/07 1.34 > n1.Index 04/16/07 1.34 > n1.Index 04/30/07 1.36 > n1.Index 05/07/07 1.36 > n1.Index 05/14/07 1.36 > n1.Index 05/22/07 1.36 > n2.Index 04/02/07 1.36 > n2.Index 04/16/07 1.36 > n2.Index 04/16/07 1.36 > n2.Index 04/30/07 1.40 > n2.Index 05/07/07 1.40 > n2.Index 05/14/07 1.40 > n2.Index 05/22/07 1.40 > n3.Index 04/02/07 1.33 > n3.Index 04/16/07 1.33 > n3.Index 04/16/07 1.33 > n3.Index 04/30/07 1.37 > > I have tried the function stack, but it uses only one argument. Then I > have tested the melt function from the package reshape, but it > seems not to be reproducing the correct organization of the data, as > it takes date as the id values. > PS: the n1 index names are not ordered in the original database, so > I cannot fill in the NA with the names using a recursive formula. > Thank you for any help you can provide.(only on the last point, since you already have been offered a solution ...) You should read more rhelp questions and answers. This thread yesterday had three different ways that you could have replaced the values of those *.Index columns with their names: [R] Recoding Multiple Variables in a Data Frame in One Step Ehlers liked Dunlap's solution, but I thought those two were equally clever. Mine was clearly not the best.> Francesca > > -- > Francesca > > ---------------------------------- > Francesca Pancotto, PhD > Dipartimento di Economia > Universit? di Bologna > Piazza Scaravilli, 2 > 40126 Bologna > Office: +39 051 2098135 > Cell: +39 393 6019138 > Web: http://www2.dse.unibo.it/francesca.pancotto/ > ---------------------------------- > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
Dear Contributors thanks for any help you can provide. I searched the threads but I could not find any query that satisfied my needs. This is my database: index time values 13732 27965 DATA.Q211.SUM.Index 04/08/11 1.42 13733 27974 DATA.Q211.SUM.Index 05/10/11 1.45 13734 27984 DATA.Q211.SUM.Index 06/01/11 1.22 13746 28615 DATA.Q211.TDS.Index 04/07/11 1.35 13747 28624 DATA.Q211.TDS.Index 05/20/11 1.40 13754 29262 DATA.Q211.UBS.Index 05/02/11 1.30 13755 29272 DATA.Q211.UBS.Index 05/03/11 1.48 13761 29915 DATA.Q211.UCM.Index 04/28/11 1.43 13768 30565 DATA.Q211.VDE.Index 05/02/11 1.48 13775 31215 DATA.Q211.WF.Index 04/14/11 1.44 13776 31225 DATA.Q211.WF.Index 05/12/11 1.42 13789 31865 DATA.Q211.WPC.Index 04/01/11 1.40 13790 31875 DATA.Q211.WPC.Index 04/08/11 1.42 13791 31883 DATA.Q211.WPC.Index 05/10/11 1.43 13804 32515 DATA.Q211.XTB.Index 04/29/11 1.50 13805 32525 DATA.Q211.XTB.Index 05/30/11 1.40 13806 32532 DATA.Q211.XTB.Index 06/28/11 1.43 I need to select only the rows of this database that correspond to each of the first occurrences of the string represented in column index. In the example shown I would like to obtain a new data.frame which is index time values 13732 27965 DATA.Q211.SUM.Index 04/08/11 1.42 13746 28615 DATA.Q211.TDS.Index 04/07/11 1.35 13754 29262 DATA.Q211.UBS.Index 05/02/11 1.30 13761 29915 DATA.Q211.UCM.Index 04/28/11 1.43 13768 30565 DATA.Q211.VDE.Index 05/02/11 1.48 13775 31215 DATA.Q211.WF.Index 04/14/11 1.44 13789 31865 DATA.Q211.WPC.Index 04/01/11 1.40 13804 32515 DATA.Q211.XTB.Index 04/29/11 1.50 As you can see, it is not the whole string to change, rather a substring that is part of it. I want to select only the first values related to the row that presents for the first time the different part of the string(substring). I know how to select rows according to a substring condition on the index column, but I cannot use it here because the substring changes and moreover the number of occurrences per substring is variable. Thank you for any help you can provide. Francesca [[alternative HTML version deleted]]
Possibly Parallel Threads
- Calling Data frame objects with spaces in their names
- Computing stats on common parts of multiple dataframes
- Problem in using bdh function for Govt tickers
- How to calculate the derivatives at each data point?
- loess line predicting number where the line crosses zero twice