Dear useRs, Below is a sample of my dataset (I have more rows and columns). As you can see in the 2nd column, there are values, the name of the parameter ('Sq' in that case), some integer ('45' in that case) and the unit ('?m' or 'nm'). I know how to extract the rows of interest (those with values), but they are expressed in different units. All values following a line with the unit are expressed in that unit, but the number of lines is not constant (sometimes each value is expressed in a different unit so there will be a new unit line, but there are sometimes several values in a row expressed in the same unit so without unit lines in between). I hope this is clear (it should be with the example provided). This messy dataset comes from an external software so I don't have any means to format the ways the data are collated. I have to find a way to deal with it in R. What I would like to do is convert the values in nm to ?m; I just need to multiply by 1000. What I don't know is how to identify the values that are expressed in nm (all values that follow a line with 'nm' until there is a line with '?m'). I don't even know how I should search online because I don't know how this kind of operation is called. Any help is appreciated. Thank you in advance. Ivan my.data <- structure(list(V1 = c("2019/05/10", "#", "#", "#", "2019/05/10", "2019/05/10", "2019/05/10", "#", "#", "#", "2019/05/10", "#", "#", "#", "2019/05/10", "#", "#", "#", "2019/05/10", "2019/05/10"), V19 c("0.2012800083", "45", "Sq", "?m", "0.3634383236", "0.4360454777", "0.3767733568", "45", "Sq", "nm", "102.013048", "45", "Sq", "?m", "0.1413840498", "45", "Sq", "nm", "65.4459715", "46.45802917")), row.names c(NA, 20L), class = "data.frame") -- Dr. Ivan Calandra TraCEr, laboratory for Traceology and Controlled Experiments MONREPOS Archaeological Research Centre and Museum for Human Behavioural Evolution Schloss Monrepos 56567 Neuwied, Germany +49 (0) 2631 9772-243 https://www.researchgate.net/profile/Ivan_Calandra
From nm to micron, _divide_ by 1000.... (as you likely know) What are the units of the first value? Looks like micron in your example, but is there a rule? Basically, it is a "last observation carried forward" type problem, so something like this: my.data <- structure(list(V1 = c("2019/05/10", "#", "#", "#", "2019/05/10", "2019/05/10", "2019/05/10", "#", "#", "#", "2019/05/10", "#", "#", "#", "2019/05/10", "#", "#", "#", "2019/05/10", "2019/05/10"), V19 c("0.2012800083", "45", "Sq", "?m", "0.3634383236", "0.4360454777", "0.3767733568", "45", "Sq", "nm", "102.013048", "45", "Sq", "?m", "0.1413840498", "45", "Sq", "nm", "65.4459715", "46.45802917")), row.names c(NA, 20L), class = "data.frame") y <- my.data$V19 u <- ifelse(y=="nm" | y=="?m", y, NA) num <- my.data$V1 != "#" uu <- zoo::na.locf(u, na.rm=FALSE) data.frame(val = as.numeric(y[num]), units = uu[num]) giving val units 1 0.2012800 <NA> 2 0.3634383 ?m 3 0.4360455 ?m 4 0.3767734 ?m 5 102.0130480 nm 6 0.1413840 ?m 7 65.4459715 nm 8 46.4580292 nm and you can surely take it from there. -pd> On 10 May 2019, at 13:54 , Ivan Calandra <calandra at rgzm.de> wrote: > > Dear useRs, > > Below is a sample of my dataset (I have more rows and columns). > > As you can see in the 2nd column, there are values, the name of the parameter > ('Sq' in that case), some integer ('45' in that case) and the unit ('?m' or > 'nm'). > I know how to extract the rows of interest (those with values), but they are > expressed in different units. All values following a line with the unit are > expressed in that unit, but the number of lines is not constant (sometimes each > value is expressed in a different unit so there will be a new unit line, but > there are sometimes several values in a row expressed in the same unit so > without unit lines in between). I hope this is clear (it should be with the > example provided). > This messy dataset comes from an external software so I don't have any means to > format the ways the data are collated. I have to find a way to deal with it in > R. > > What I would like to do is convert the values in nm to ?m; I just need to > multiply by 1000. > > What I don't know is how to identify the values that are expressed in nm (all > values that follow a line with 'nm' until there is a line with '?m'). > > I don't even know how I should search online because I don't know how this kind > of operation is called. > Any help is appreciated. > > Thank you in advance. > Ivan > > > my.data <- structure(list(V1 = c("2019/05/10", "#", "#", "#", "2019/05/10", > "2019/05/10", "2019/05/10", "#", "#", "#", "2019/05/10", "#", "#", "#", > "2019/05/10", "#", "#", "#", "2019/05/10", "2019/05/10"), V19 > c("0.2012800083", "45", "Sq", "?m", "0.3634383236", "0.4360454777", > "0.3767733568", "45", "Sq", "nm", "102.013048", "45", "Sq", "?m", > "0.1413840498", "45", "Sq", "nm", "65.4459715", "46.45802917")), row.names > c(NA, 20L), class = "data.frame") > > -- > Dr. Ivan Calandra > TraCEr, laboratory for Traceology and Controlled Experiments > MONREPOS Archaeological Research Centre and > Museum for Human Behavioural Evolution > Schloss Monrepos > 56567 Neuwied, Germany > +49 (0) 2631 9772-243 > https://www.researchgate.net/profile/Ivan_Calandra > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
This is my approach. It is based entirely on what you said (multiply by 1000 to convert from nm to ?m, but I think it is divide by). It assumes that the starting value is in ?m. If the starting value is in nm, change the "factor=1" to "factor=1000". That ?m is micro-meters (10^-6) and nm is nano-meters (10^-9), so divide by would be correct. factor=1; for (i in 1:length(my.data$V19)) { print("Start");print(factor);print(my.data[i,]); if (my.data$V19[i] == "nm") { factor=1000; my.data$V19[i]="?m";print("nm");} else if (my.data$V19[i] == "?m") {factor=1;}; if (suppressWarnings(!is.na(as.numeric(my.data$V19[i])))) { my.data$V19[i] = as.character(as.numeric(my.data$V19[i]) * factor); print("changed"); } print(factor);print(my.data[i,]);print("End"); On Fri, May 10, 2019 at 6:54 AM Ivan Calandra <calandra at rgzm.de> wrote:> Dear useRs, > > Below is a sample of my dataset (I have more rows and columns). > > As you can see in the 2nd column, there are values, the name of the > parameter > ('Sq' in that case), some integer ('45' in that case) and the unit ('?m' or > 'nm'). > I know how to extract the rows of interest (those with values), but they > are > expressed in different units. All values following a line with the unit are > expressed in that unit, but the number of lines is not constant (sometimes > each > value is expressed in a different unit so there will be a new unit line, > but > there are sometimes several values in a row expressed in the same unit so > without unit lines in between). I hope this is clear (it should be with the > example provided). > This messy dataset comes from an external software so I don't have any > means to > format the ways the data are collated. I have to find a way to deal with > it in > R. > > What I would like to do is convert the values in nm to ?m; I just need to > multiply by 1000. > > What I don't know is how to identify the values that are expressed in nm > (all > values that follow a line with 'nm' until there is a line with '?m'). > > I don't even know how I should search online because I don't know how this > kind > of operation is called. > Any help is appreciated. > > Thank you in advance. > Ivan > > > my.data <- structure(list(V1 = c("2019/05/10", "#", "#", "#", "2019/05/10", > "2019/05/10", "2019/05/10", "#", "#", "#", "2019/05/10", "#", "#", "#", > "2019/05/10", "#", "#", "#", "2019/05/10", "2019/05/10"), V19 > c("0.2012800083", "45", "Sq", "?m", "0.3634383236", "0.4360454777", > "0.3767733568", "45", "Sq", "nm", "102.013048", "45", "Sq", "?m", > "0.1413840498", "45", "Sq", "nm", "65.4459715", "46.45802917")), row.names > > c(NA, 20L), class = "data.frame") > > -- > Dr. Ivan Calandra > TraCEr, laboratory for Traceology and Controlled Experiments > MONREPOS Archaeological Research Centre and > Museum for Human Behavioural Evolution > Schloss Monrepos > 56567 Neuwied, Germany > +49 (0) 2631 9772-243 > https://www.researchgate.net/profile/Ivan_Calandra > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- This is clearly another case of too many mad scientists, and not enough hunchbacks. Maranatha! <>< John McKown [[alternative HTML version deleted]]
Dear John, Thank you for your answer. However, it does not make sense to me, as it works only line by line of the data.frame, and I need something for "last observation carried forward" as Peter mentioned. The script does not work as is either, probably due to typos with semi-colons and "if... else" statements, so I cannot really test it. Best, Ivan -- Dr. Ivan Calandra TraCEr, laboratory for Traceology and Controlled Experiments MONREPOS Archaeological Research Centre and Museum for Human Behavioural Evolution Schloss Monrepos 56567 Neuwied, Germany +49 (0) 2631 9772-243 https://www.researchgate.net/profile/Ivan_Calandra On May 10, 2019 at 3:47 PM John McKown <john.archie.mckown at gmail.com> wrote:> This is my approach. It is based entirely on what you said (multiply by 1000 > to convert from nm to ?m, but I think it is divide by). It assumes that the > starting value is in ?m. If the starting value is in nm, change the > "factor=1" to "factor=1000". That ?m is micro-meters (10^-6) and nm is > nano-meters (10^-9), so divide by would be correct. > > factor=1; > for (i in 1:length(my.data$V19)) { > print("Start");print(factor);print(my.data[i,]); > if (my.data$V19[i] == "nm") { factor=1000; my.data$V19[i]="?m";print("nm");} > else if (my.data$V19[i] == "?m") {factor=1;}; > if (suppressWarnings(! is.na <http://is.na> (as.numeric(my.data$V19[i])))) { > my.data$V19[i] = as.character(as.numeric(my.data$V19[i]) * factor); > print("changed"); } > print(factor);print(my.data[i,]);print("End"); > > > > On Fri, May 10, 2019 at 6:54 AM Ivan Calandra < calandra at rgzm.de > <mailto:calandra at rgzm.de> > wrote: > > > Dear useRs, > > > > Below is a sample of my dataset (I have more rows and columns). > > > > As you can see in the 2nd column, there are values, the name of the > > parameter > > ('Sq' in that case), some integer ('45' in that case) and the unit ('?m' > > or > > 'nm'). > > I know how to extract the rows of interest (those with values), but they > > are > > expressed in different units. All values following a line with the unit > > are > > expressed in that unit, but the number of lines is not constant > > (sometimes each > > value is expressed in a different unit so there will be a new unit line, > > but > > there are sometimes several values in a row expressed in the same unit so > > without unit lines in between). I hope this is clear (it should be with > > the > > example provided). > > This messy dataset comes from an external software so I don't have any > > means to > > format the ways the data are collated. I have to find a way to deal with > > it in > > R. > > > > What I would like to do is convert the values in nm to ?m; I just need to > > multiply by 1000. > > > > What I don't know is how to identify the values that are expressed in nm > > (all > > values that follow a line with 'nm' until there is a line with '?m'). > > > > I don't even know how I should search online because I don't know how > > this kind > > of operation is called. > > Any help is appreciated. > > > > Thank you in advance. > > Ivan > > > > > > my.data <- structure(list(V1 = c("2019/05/10", "#", "#", "#", > > "2019/05/10", > > "2019/05/10", "2019/05/10", "#", "#", "#", "2019/05/10", "#", "#", "#", > > "2019/05/10", "#", "#", "#", "2019/05/10", "2019/05/10"), V19 > > c("0.2012800083", "45", "Sq", "?m", "0.3634383236", "0.4360454777", > > "0.3767733568", "45", "Sq", "nm", "102.013048", "45", "Sq", "?m", > > "0.1413840498", "45", "Sq", "nm", "65.4459715", "46.45802917")), > > row.names > > c(NA, 20L), class = "data.frame") > > > > -- > > Dr. Ivan Calandra > > TraCEr, laboratory for Traceology and Controlled Experiments > > MONREPOS Archaeological Research Centre and > > Museum for Human Behavioural Evolution > > Schloss Monrepos > > 56567 Neuwied, Germany > > +49 (0) 2631 9772-243 > > https://www.researchgate.net/profile/Ivan_Calandra > > <https://www.researchgate.net/profile/Ivan_Calandra> > > > > ______________________________________________ > > R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- To > > UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > <https://stat.ethz.ch/mailman/listinfo/r-help> > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > <http://www.R-project.org/posting-guide.html> > > and provide commented, minimal, self-contained, reproducible code. > > > > > -- > This is clearly another case of too many mad scientists, and not enough > hunchbacks. > > > Maranatha! <>< > John McKown >
Dear Peter, Thank you for your answer, the function na.locf() is exactly what I needed! I had started processing my dataset so the first lines (used as headers) were not included in the sample I have sent. But there is also a "unit" line before the first value. And yes, of course, divide by 1000. Best, Ivan -- Dr. Ivan Calandra TraCEr, laboratory for Traceology and Controlled Experiments MONREPOS Archaeological Research Centre and Museum for Human Behavioural Evolution Schloss Monrepos 56567 Neuwied, Germany +49 (0) 2631 9772-243 https://www.researchgate.net/profile/Ivan_Calandra On May 10, 2019 at 3:29 PM peter dalgaard <pdalgd at gmail.com> wrote:> From nm to micron, _divide_ by 1000.... (as you likely know) > > What are the units of the first value? Looks like micron in your example, but > is there a rule? > > Basically, it is a "last observation carried forward" type problem, so > something like this: > > > my.data <- structure(list(V1 = c("2019/05/10", "#", "#", "#", "2019/05/10", > "2019/05/10", "2019/05/10", "#", "#", "#", "2019/05/10", "#", "#", "#", > "2019/05/10", "#", "#", "#", "2019/05/10", "2019/05/10"), V19 > c("0.2012800083", "45", "Sq", "?m", "0.3634383236", "0.4360454777", > "0.3767733568", "45", "Sq", "nm", "102.013048", "45", "Sq", "?m", > "0.1413840498", "45", "Sq", "nm", "65.4459715", "46.45802917")), row.names > c(NA, 20L), class = "data.frame") > > y <- my.data$V19 > u <- ifelse(y=="nm" | y=="?m", y, NA) > num <- my.data$V1 != "#" > uu <- zoo::na.locf(u, na.rm=FALSE) > data.frame(val = as.numeric(y[num]), units = uu[num]) > > giving > val units > 1 0.2012800 <NA> > 2 0.3634383 ?m > 3 0.4360455 ?m > 4 0.3767734 ?m > 5 102.0130480 nm > 6 0.1413840 ?m > 7 65.4459715 nm > 8 46.4580292 nm > > and you can surely take it from there. > > -pd > > > > On 10 May 2019, at 13:54 , Ivan Calandra <calandra at rgzm.de> wrote: > > > > Dear useRs, > > > > Below is a sample of my dataset (I have more rows and columns). > > > > As you can see in the 2nd column, there are values, the name of the > > parameter > > ('Sq' in that case), some integer ('45' in that case) and the unit ('?m' or > > 'nm'). > > I know how to extract the rows of interest (those with values), but they are > > expressed in different units. All values following a line with the unit are > > expressed in that unit, but the number of lines is not constant (sometimes > > each > > value is expressed in a different unit so there will be a new unit line, but > > there are sometimes several values in a row expressed in the same unit so > > without unit lines in between). I hope this is clear (it should be with the > > example provided). > > This messy dataset comes from an external software so I don't have any means > > to > > format the ways the data are collated. I have to find a way to deal with it > > in > > R. > > > > What I would like to do is convert the values in nm to ?m; I just need to > > multiply by 1000. > > > > What I don't know is how to identify the values that are expressed in nm > > (all > > values that follow a line with 'nm' until there is a line with '?m'). > > > > I don't even know how I should search online because I don't know how this > > kind > > of operation is called. > > Any help is appreciated. > > > > Thank you in advance. > > Ivan > > > > > > my.data <- structure(list(V1 = c("2019/05/10", "#", "#", "#", "2019/05/10", > > "2019/05/10", "2019/05/10", "#", "#", "#", "2019/05/10", "#", "#", "#", > > "2019/05/10", "#", "#", "#", "2019/05/10", "2019/05/10"), V19 > > c("0.2012800083", "45", "Sq", "?m", "0.3634383236", "0.4360454777", > > "0.3767733568", "45", "Sq", "nm", "102.013048", "45", "Sq", "?m", > > "0.1413840498", "45", "Sq", "nm", "65.4459715", "46.45802917")), row.names > > c(NA, 20L), class = "data.frame") > > > > -- > > Dr. Ivan Calandra > > TraCEr, laboratory for Traceology and Controlled Experiments > > MONREPOS Archaeological Research Centre and > > Museum for Human Behavioural Evolution > > Schloss Monrepos > > 56567 Neuwied, Germany > > +49 (0) 2631 9772-243 > > https://www.researchgate.net/profile/Ivan_Calandra > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Office: A 4.23 > Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com > > > > > > > > >[[alternative HTML version deleted]]