Hello, What would be the best set of R functions to parse and transform a file? My file looks as shown below. I would like to plot this data and I need to parse it into a single data frame that sorts of "transposes the data" with the following structure:> df <- data.frame(n=c(1,1,2,2),iter=c(1,2,1,2),step=as.factor(c('Step 1', 'Step2', 'Step 1', 'Step 2')),value=c(10, 10, 10, 10)) > str(df)'data.frame': 4 obs. of 4 variables: $ n : num 1 1 2 2 $ iter : num 1 2 1 2 $ step : Factor w/ 3 levels "Step 1","Step 2",..: 1 3 1 2 $ value: num 10 10 10 10 n=extracted from the file name "logdet_two_moons_n>>>>10<<<<.txt" iter=iter step=column Step1, Step2, Step3, Step4 value=value of the specific Step column And this is one possible data frame variation to be able to plot the time proportions for the different steps of my algorithm. TIA, Best regards, Giovanni Iteration Jmin Error Elapsed Corral Duality Gap Step1 Step2 Step3 Step4 2 2 0.000000e+00 1.912976e-03 1 0.000000e+00 1.779780e-03 7.214600e-05 1.243600e-05 2.246700e-05 ../test/genmoons_data/logdet_two_moons_n10.txt,2,2,1.754115e-02,0.000000e+00,9.799000e+03,0.000000e+00,5.586293e-01,0.000000e+00 Iteration Jmin Error Elapsed Corral Duality Gap Step1 Step2 Step3 Step4 4 9 0.000000e+00 1.280841e-03 2 -7.105427e-15 9.557570e-04 2.301610e-04 1.571100e-05 2.177300e-05 ../test/genmoons_data/logdet_two_moons_n20.txt,4,5,6.062756e-03,0.000000e+00,1.365970e+05,0.000000e+00,2.253051e+01,0.000000e+00 Iteration Jmin Error Elapsed Corral Duality Gap Step1 Step2 Step3 Step4 10 32 3.133476e-03 6.075853e-03 8 4.057531e-01 1.613035e-03 3.956920e-03 3.077200e-05 4.390900e-05 20 28 5.597685e-04 4.376530e-03 16 4.711146e-03 0.000000e+00 4.390998e-03 2.229600e-05 2.517100e-05 30 27 1.148159e-04 4.357923e-03 22 8.408166e-06 0.000000e+00 4.326610e-03 2.697700e-05 3.233200e-05 40 27 4.036778e-05 4.388260e-03 29 2.529294e-07 0.000000e+00 4.329726e-03 2.713000e-05 3.558400e-05 50 27 1.840383e-05 4.357373e-03 36 1.341111e-07 0.000000e+00 4.327526e-03 3.097000e-05 3.255700e-05 53 27 0.000000e+00 1.322382e-03 36 -2.842171e-14 0.000000e+00 1.327239e-03 1.420400e-05 2.043500e-05 ../test/genmoons_data/logdet_two_moons_n64.txt,53,69,3.330987e-02,0.000000e+00,2.229830e+07,0.000000e+00,6.694201e+02,0.000000e+00 Iteration Jmin Error Elapsed Corral Duality Gap Step1 Step2 Step3 Step4 10 70 7.739525e-03 2.389529e-02 8 1.494829e+00 2.975209e-03 1.873082e-02 4.713600e-05 5.837200e-05 20 74 3.379192e-03 2.084753e-02 15 3.372041e-01 0.000000e+00 2.084637e-02 4.302400e-05 3.907800e-05 30 76 1.322821e-03 2.093204e-02 21 1.018845e-01 0.000000e+00 2.083170e-02 4.704100e-05 5.707100e-05 40 78 1.176950e-03 2.095179e-02 28 2.447970e-02 0.000000e+00 2.088284e-02 4.890700e-05 4.955100e-05 50 78 2.233669e-04 2.050571e-02 35 1.573952e-02 0.000000e+00 2.045954e-02 4.046600e-05 3.899000e-05 60 78 2.167956e-04 2.095130e-02 39 8.362982e-03 0.000000e+00 2.082586e-02 6.699700e-05 8.506400e-05 70 78 2.085968e-04 2.085355e-02 46 5.135190e-03 0.000000e+00 2.083204e-02 5.432900e-05 4.078600e-05 80 78 2.570800e-04 2.044932e-02 51 5.470225e-04 0.000000e+00 2.033571e-02 5.334200e-05 5.318400e-05 81 78 0.000000e+00 2.099610e-03 51 1.421085e-14 0.000000e+00 2.100072e-03 9.147000e-06 2.324800e-05 [[alternative HTML version deleted]]
Hi,
I guess this helps you.
library(reshape)
dat1<-read.table(text="
Iteration??? Jmin??? Error??? Elapsed??? Corral??? Duality_Gap??? Step1???
Step2??? Step3??? Step4
10??? 32??? 3.133476e-03??? 6.075853e-03??? 8??? 4.057531e-01??? 1.613035e-03???
3.956920e-03??? 3.077200e-05??? 4.390900e-05
20??? 28??? 5.597685e-04??? 4.376530e-03??? 16??? 4.711146e-03???
0.000000e+00??? 4.390998e-03??? 2.229600e-05??? 2.517100e-05
30??? 27??? 1.148159e-04??? 4.357923e-03??? 22??? 8.408166e-06???
0.000000e+00??? 4.326610e-03??? 2.697700e-05??? 3.233200e-05
40??? 27??? 4.036778e-05??? 4.388260e-03??? 29??? 2.529294e-07???
0.000000e+00??? 4.329726e-03??? 2.713000e-05??? 3.558400e-05
50??? 27??? 1.840383e-05??? 4.357373e-03??? 36??? 1.341111e-07???
0.000000e+00??? 4.327526e-03??? 3.097000e-05??? 3.255700e-05
53??? 27??? 0.000000e+00??? 1.322382e-03??? 36??? -2.842171e-14???
0.000000e+00??? 1.327239e-03??? 1.420400e-05??? 2.043500e-05
",sep="",header=TRUE,stringsAsFactors=FALSE,fill=TRUE)
df2<-melt(dat1,"Iteration",c("Step1","Step2","Step3","Step4"))
df3<-df2[,c(2,1)]
df4<-data.frame(n=rep(c(1:6),times=4),iter=rep(1:4,each=6),df3)
?colnames(df4)[c(3,4)]<-c("step","value")
?head(df4)
? n iter? step value
1 1??? 1 Step1??? 10
2 2??? 1 Step1??? 20
3 3??? 1 Step1??? 30
4 4??? 1 Step1??? 40
5 5??? 1 Step1??? 50
6 6??? 1 Step1??? 53
A.K.
----- Original Message -----
From: Giovanni Azua <bravegag at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Monday, August 27, 2012 6:24 AM
Subject: [R] simplest way (set of functions) to parse a file
Hello,
What would be the best set of R functions to parse and transform a file?
My file looks as shown below. I would like to plot this data and I need to parse
it into a single data frame that sorts of "transposes the data" with
the following structure:
> df <- data.frame(n=c(1,1,2,2),iter=c(1,2,1,2),step=as.factor(c('Step
1', 'Step2', 'Step 1', 'Step 2')),value=c(10, 10,
10, 10))
> str(df)
'data.frame':??? 4 obs. of? 4 variables:
$ n? ? : num? 1 1 2 2
$ iter : num? 1 2 1 2
$ step : Factor w/ 3 levels "Step 1","Step 2",..: 1 3 1 2
$ value: num? 10 10 10 10
n=extracted from the file name
"logdet_two_moons_n>>>>10<<<<.txt"
iter=iter
step=column Step1, Step2, Step3, Step4
value=value of the specific Step column 
And this is one possible data frame variation to be able to plot the time
proportions for the different steps of my algorithm.
TIA,
Best regards,
Giovanni
Iteration??? Jmin??? Error??? Elapsed??? Corral??? Duality Gap??? Step1???
Step2??? Step3??? Step4
2??? 2??? 0.000000e+00??? 1.912976e-03??? 1??? 0.000000e+00??? 1.779780e-03???
7.214600e-05??? 1.243600e-05??? 2.246700e-05
../test/genmoons_data/logdet_two_moons_n10.txt,2,2,1.754115e-02,0.000000e+00,9.799000e+03,0.000000e+00,5.586293e-01,0.000000e+00
Iteration??? Jmin??? Error??? Elapsed??? Corral??? Duality Gap??? Step1???
Step2??? Step3??? Step4
4??? 9??? 0.000000e+00??? 1.280841e-03??? 2??? -7.105427e-15??? 9.557570e-04???
2.301610e-04??? 1.571100e-05??? 2.177300e-05
../test/genmoons_data/logdet_two_moons_n20.txt,4,5,6.062756e-03,0.000000e+00,1.365970e+05,0.000000e+00,2.253051e+01,0.000000e+00
Iteration??? Jmin??? Error??? Elapsed??? Corral??? Duality Gap??? Step1???
Step2??? Step3??? Step4
10??? 32??? 3.133476e-03??? 6.075853e-03??? 8??? 4.057531e-01??? 1.613035e-03???
3.956920e-03??? 3.077200e-05??? 4.390900e-05
20??? 28??? 5.597685e-04??? 4.376530e-03??? 16??? 4.711146e-03???
0.000000e+00??? 4.390998e-03??? 2.229600e-05??? 2.517100e-05
30??? 27??? 1.148159e-04??? 4.357923e-03??? 22??? 8.408166e-06???
0.000000e+00??? 4.326610e-03??? 2.697700e-05??? 3.233200e-05
40??? 27??? 4.036778e-05??? 4.388260e-03??? 29??? 2.529294e-07???
0.000000e+00??? 4.329726e-03??? 2.713000e-05??? 3.558400e-05
50??? 27??? 1.840383e-05??? 4.357373e-03??? 36??? 1.341111e-07???
0.000000e+00??? 4.327526e-03??? 3.097000e-05??? 3.255700e-05
53??? 27??? 0.000000e+00??? 1.322382e-03??? 36??? -2.842171e-14???
0.000000e+00??? 1.327239e-03??? 1.420400e-05??? 2.043500e-05
../test/genmoons_data/logdet_two_moons_n64.txt,53,69,3.330987e-02,0.000000e+00,2.229830e+07,0.000000e+00,6.694201e+02,0.000000e+00
Iteration??? Jmin??? Error??? Elapsed??? Corral??? Duality Gap??? Step1???
Step2??? Step3??? Step4
10??? 70??? 7.739525e-03??? 2.389529e-02??? 8??? 1.494829e+00??? 2.975209e-03???
1.873082e-02??? 4.713600e-05??? 5.837200e-05
20??? 74??? 3.379192e-03??? 2.084753e-02??? 15??? 3.372041e-01???
0.000000e+00??? 2.084637e-02??? 4.302400e-05??? 3.907800e-05
30??? 76??? 1.322821e-03??? 2.093204e-02??? 21??? 1.018845e-01???
0.000000e+00??? 2.083170e-02??? 4.704100e-05??? 5.707100e-05
40??? 78??? 1.176950e-03??? 2.095179e-02??? 28??? 2.447970e-02???
0.000000e+00??? 2.088284e-02??? 4.890700e-05??? 4.955100e-05
50??? 78??? 2.233669e-04??? 2.050571e-02??? 35??? 1.573952e-02???
0.000000e+00??? 2.045954e-02??? 4.046600e-05??? 3.899000e-05
60??? 78??? 2.167956e-04??? 2.095130e-02??? 39??? 8.362982e-03???
0.000000e+00??? 2.082586e-02??? 6.699700e-05??? 8.506400e-05
70??? 78??? 2.085968e-04??? 2.085355e-02??? 46??? 5.135190e-03???
0.000000e+00??? 2.083204e-02??? 5.432900e-05??? 4.078600e-05
80??? 78??? 2.570800e-04??? 2.044932e-02??? 51??? 5.470225e-04???
0.000000e+00??? 2.033571e-02??? 5.334200e-05??? 5.318400e-05
81??? 78??? 0.000000e+00??? 2.099610e-03??? 51??? 1.421085e-14???
0.000000e+00??? 2.100072e-03??? 9.147000e-06??? 2.324800e-05
??? [[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
try this:> input <- readLines(textConnection("Iteration Jmin ErrorElapsed Corral Duality Gap Step1 Step2 Step3 Step4 + 2 2 0.000000e+00 1.912976e-03 1 0.000000e+00 1.779780e-03 7.214600e-05 1.243600e-05 2.246700e-05 + ../test/genmoons_data/logdet_two_moons_n10.txt,2,2,1.754115e-02,0.000000e+00,9.799000e+03,0.000000e+00,5.586293e-01,0.000000e+00 + Iteration Jmin Error Elapsed Corral Duality Gap Step1 Step2 Step3 Step4 + 4 9 0.000000e+00 1.280841e-03 2 -7.105427e-15 9.557570e-04 2.301610e-04 1.571100e-05 2.177300e-05 + ../test/genmoons_data/logdet_two_moons_n20.txt,4,5,6.062756e-03,0.000000e+00,1.365970e+05,0.000000e+00,2.253051e+01,0.000000e+00 + Iteration Jmin Error Elapsed Corral Duality Gap Step1 Step2 Step3 Step4 + 10 32 3.133476e-03 6.075853e-03 8 4.057531e-01 1.613035e-03 3.956920e-03 3.077200e-05 4.390900e-05 + 20 28 5.597685e-04 4.376530e-03 16 4.711146e-03 0.000000e+00 4.390998e-03 2.229600e-05 2.517100e-05 + 30 27 1.148159e-04 4.357923e-03 22 8.408166e-06 0.000000e+00 4.326610e-03 2.697700e-05 3.233200e-05 + 40 27 4.036778e-05 4.388260e-03 29 2.529294e-07 0.000000e+00 4.329726e-03 2.713000e-05 3.558400e-05 + 50 27 1.840383e-05 4.357373e-03 36 1.341111e-07 0.000000e+00 4.327526e-03 3.097000e-05 3.255700e-05 + 53 27 0.000000e+00 1.322382e-03 36 -2.842171e-14 0.000000e+00 1.327239e-03 1.420400e-05 2.043500e-05 + ../test/genmoons_data/logdet_two_moons_n64.txt,53,69,3.330987e-02,0.000000e+00,2.229830e+07,0.000000e+00,6.694201e+02,0.000000e+00 + Iteration Jmin Error Elapsed Corral Duality Gap Step1 Step2 Step3 Step4 + 10 70 7.739525e-03 2.389529e-02 8 1.494829e+00 2.975209e-03 1.873082e-02 4.713600e-05 5.837200e-05 + 20 74 3.379192e-03 2.084753e-02 15 3.372041e-01 0.000000e+00 2.084637e-02 4.302400e-05 3.907800e-05 + 30 76 1.322821e-03 2.093204e-02 21 1.018845e-01 0.000000e+00 2.083170e-02 4.704100e-05 5.707100e-05 + 40 78 1.176950e-03 2.095179e-02 28 2.447970e-02 0.000000e+00 2.088284e-02 4.890700e-05 4.955100e-05 + 50 78 2.233669e-04 2.050571e-02 35 1.573952e-02 0.000000e+00 2.045954e-02 4.046600e-05 3.899000e-05 + 60 78 2.167956e-04 2.095130e-02 39 8.362982e-03 0.000000e+00 2.082586e-02 6.699700e-05 8.506400e-05 + 70 78 2.085968e-04 2.085355e-02 46 5.135190e-03 0.000000e+00 2.083204e-02 5.432900e-05 4.078600e-05 + 80 78 2.570800e-04 2.044932e-02 51 5.470225e-04 0.000000e+00 2.033571e-02 5.334200e-05 5.318400e-05 + 81 78 0.000000e+00 2.099610e-03 51 1.421085e-14 0.000000e+00 2.100072e-03 9.147000e-06 2.324800e-05 + ../test/genmoons_data/logdet_two_moons_n9999.txt,4,5,6.062756e-03,0.000000e+00,1.365970e+05,0.000000e+00,2.253051e+01,0.000000e+00"))> closeAllConnections() > > # remove headers > input <- input[!grepl("^Iteration", input)] > > # extract the file names > files <- input[grepl("^\\.\\.", input)] > > # add separation based on file name > x <- cumsum(grepl("^\\.\\.", input)) > > # prepend separation onto the input > input <- paste(x, input) > > # remove the file names > input <- input[!grepl("\\.\\.", input)] > > # extract 'n' from the file names > n <- as.integer(sub(".*_n([0-9]+).*", "\\1", files)) > > # read the modified data back in > tempFile <- tempfile() > writeLines(input, tempFile) > newInput <- read.table(tempFile) > > # replace first column with 'n' > newInput[[1]] <- n[newInput[[1]] + 1L] > > names(newInput) <- c('n', 'iter', 1,2,3,4,5,'Step1', 'Step2', 'Step3','Step4')> > require(reshape2) > x <- melt(newInput, id = c('n', 'iter'), measure c('Step1','Step2','Step3', 'Step4')) > xn iter variable value 1 10 2 Step1 0.001779780 2 20 4 Step1 0.000955757 3 64 10 Step1 0.001613035 4 64 20 Step1 0.000000000 5 64 30 Step1 0.000000000 6 64 40 Step1 0.000000000 7 64 50 Step1 0.000000000 8 64 53 Step1 0.000000000 9 9999 10 Step1 0.002975209 10 9999 20 Step1 0.000000000 11 9999 30 Step1 0.000000000 12 9999 40 Step1 0.000000000 13 9999 50 Step1 0.000000000 14 9999 60 Step1 0.000000000 15 9999 70 Step1 0.000000000 16 9999 80 Step1 0.000000000 17 9999 81 Step1 0.000000000 18 10 2 Step2 0.000072146 19 20 4 Step2 0.000230161 20 64 10 Step2 0.003956920 21 64 20 Step2 0.004390998 22 64 30 Step2 0.004326610 23 64 40 Step2 0.004329726 24 64 50 Step2 0.004327526 25 64 53 Step2 0.001327239 26 9999 10 Step2 0.018730820 27 9999 20 Step2 0.020846370 28 9999 30 Step2 0.020831700 29 9999 40 Step2 0.020882840 30 9999 50 Step2 0.020459540 31 9999 60 Step2 0.020825860 32 9999 70 Step2 0.020832040 33 9999 80 Step2 0.020335710 34 9999 81 Step2 0.002100072 On Mon, Aug 27, 2012 at 6:24 AM, Giovanni Azua <bravegag@gmail.com> wrote:> Hello, > > What would be the best set of R functions to parse and transform a file? > > My file looks as shown below. I would like to plot this data and I need to > parse it into a single data frame that sorts of "transposes the data" with > the following structure: > > > df <- data.frame(n=c(1,1,2,2),iter=c(1,2,1,2),step=as.factor(c('Step 1', > 'Step2', 'Step 1', 'Step 2')),value=c(10, 10, 10, 10)) > > str(df) > 'data.frame': 4 obs. of 4 variables: > $ n : num 1 1 2 2 > $ iter : num 1 2 1 2 > $ step : Factor w/ 3 levels "Step 1","Step 2",..: 1 3 1 2 > $ value: num 10 10 10 10 > > n=extracted from the file name "logdet_two_moons_n>>>>10<<<<.txt" > iter=iter > step=column Step1, Step2, Step3, Step4 > value=value of the specific Step column > > And this is one possible data frame variation to be able to plot the time > proportions for the different steps of my algorithm. > > TIA, > Best regards, > Giovanni > > Iteration Jmin Error Elapsed Corral Duality Gap Step1 > Step2 Step3 Step4 > 2 2 0.000000e+00 1.912976e-03 1 0.000000e+00 > 1.779780e-03 7.214600e-05 1.243600e-05 2.246700e-05 > > ../test/genmoons_data/logdet_two_moons_n10.txt,2,2,1.754115e-02,0.000000e+00,9.799000e+03,0.000000e+00,5.586293e-01,0.000000e+00 > Iteration Jmin Error Elapsed Corral Duality Gap Step1 > Step2 Step3 Step4 > 4 9 0.000000e+00 1.280841e-03 2 -7.105427e-15 > 9.557570e-04 2.301610e-04 1.571100e-05 2.177300e-05 > > ../test/genmoons_data/logdet_two_moons_n20.txt,4,5,6.062756e-03,0.000000e+00,1.365970e+05,0.000000e+00,2.253051e+01,0.000000e+00 > Iteration Jmin Error Elapsed Corral Duality Gap Step1 > Step2 Step3 Step4 > 10 32 3.133476e-03 6.075853e-03 8 4.057531e-01 > 1.613035e-03 3.956920e-03 3.077200e-05 4.390900e-05 > 20 28 5.597685e-04 4.376530e-03 16 4.711146e-03 > 0.000000e+00 4.390998e-03 2.229600e-05 2.517100e-05 > 30 27 1.148159e-04 4.357923e-03 22 8.408166e-06 > 0.000000e+00 4.326610e-03 2.697700e-05 3.233200e-05 > 40 27 4.036778e-05 4.388260e-03 29 2.529294e-07 > 0.000000e+00 4.329726e-03 2.713000e-05 3.558400e-05 > 50 27 1.840383e-05 4.357373e-03 36 1.341111e-07 > 0.000000e+00 4.327526e-03 3.097000e-05 3.255700e-05 > 53 27 0.000000e+00 1.322382e-03 36 -2.842171e-14 > 0.000000e+00 1.327239e-03 1.420400e-05 2.043500e-05 > > ../test/genmoons_data/logdet_two_moons_n64.txt,53,69,3.330987e-02,0.000000e+00,2.229830e+07,0.000000e+00,6.694201e+02,0.000000e+00 > Iteration Jmin Error Elapsed Corral Duality Gap Step1 > Step2 Step3 Step4 > 10 70 7.739525e-03 2.389529e-02 8 1.494829e+00 > 2.975209e-03 1.873082e-02 4.713600e-05 5.837200e-05 > 20 74 3.379192e-03 2.084753e-02 15 3.372041e-01 > 0.000000e+00 2.084637e-02 4.302400e-05 3.907800e-05 > 30 76 1.322821e-03 2.093204e-02 21 1.018845e-01 > 0.000000e+00 2.083170e-02 4.704100e-05 5.707100e-05 > 40 78 1.176950e-03 2.095179e-02 28 2.447970e-02 > 0.000000e+00 2.088284e-02 4.890700e-05 4.955100e-05 > 50 78 2.233669e-04 2.050571e-02 35 1.573952e-02 > 0.000000e+00 2.045954e-02 4.046600e-05 3.899000e-05 > 60 78 2.167956e-04 2.095130e-02 39 8.362982e-03 > 0.000000e+00 2.082586e-02 6.699700e-05 8.506400e-05 > 70 78 2.085968e-04 2.085355e-02 46 5.135190e-03 > 0.000000e+00 2.083204e-02 5.432900e-05 4.078600e-05 > 80 78 2.570800e-04 2.044932e-02 51 5.470225e-04 > 0.000000e+00 2.033571e-02 5.334200e-05 5.318400e-05 > 81 78 0.000000e+00 2.099610e-03 51 1.421085e-14 > 0.000000e+00 2.100072e-03 9.147000e-06 2.324800e-05 > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. [[alternative HTML version deleted]]