Joachim Audenaert
2015-Apr-16 09:36 UTC
[R] melt function chooses wrong id variable with large datasets
Hello all, I'm using a large dataset consisting of 2 groups of data, 2 columns in excel with a header (group name) and 15 000 rows of data. I would like like to compare this data, so I transform my dataset with the melt function to get 1 column of data and 1 column of ID variables, then I can apply different statistical tests. With small datasets this works great, the melt function automatically chooses the name in row 1 as ID variable and melts the data, thus giving me a matrix with all ID variables in column one and the data accordingly in column 2. With this big dataset however it chooses the whole first column as ID variables in stead of the first row. Is there a reason why this happens and how can I make sure the first row is chosen as ID variabele and the lower rows as data? If I specify that I want the first row to be the id variable I also get error. melt(dataset,id.vars=dataset[1,], na.rm=TRUE) Error: id variables not found in data: norm, jaar Are there alternative ways to create a good reshaped dataset? Met vriendelijke groeten - With kind regards, Joachim Audenaert onderzoeker gewasbescherming - crop protection researcher PCS | proefcentrum voor sierteelt - ornamental plant research Schaessestraat 18, 9070 Destelbergen, Belgi? T: +32 (0)9 353 94 71 | F: +32 (0)9 353 94 95 E: joachim.audenaert at pcsierteelt.be | W: www.pcsierteelt.be Heb je je individuele begeleiding bemesting (CVBB) al aangevraagd? | Het PCS op LinkedIn Disclaimer | Please consider the environment before printing. Think green, keep it on the screen! [[alternative HTML version deleted]]
PIKAL Petr
2015-Apr-16 10:13 UTC
[R] melt function chooses wrong id variable with large datasets
Hi
There is something weird with your data and melt function.
AFAIK melt does not use first row as id.variables.
What is result of
str(dataset)
Instead of
melt(dataset,id.vars=dataset[1,], na.rm=TRUE)
melt expects something like
melt(dataset, id.vars=c("norm, "jaar"), na.rm=TRUE)
If you want more specific answer you shall show us part of your data, preferably
copy output of
dput(dataset[1:20,])
into your mail.
Cheers
Petr
> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Joachim
> Audenaert
> Sent: Thursday, April 16, 2015 11:37 AM
> To: r-help at r-project.org
> Subject: [R] melt function chooses wrong id variable with large
> datasets
>
> Hello all,
>
> I'm using a large dataset consisting of 2 groups of data, 2 columns in
> excel with a header (group name) and 15 000 rows of data. I would like
> like to compare this data, so I transform my dataset with the melt
> function to get 1 column of data and 1 column of ID variables, then I
> can apply different statistical tests. With small datasets this works
> great, the melt function automatically chooses the name in row 1 as ID
> variable and melts the data, thus giving me a matrix with all ID
> variables in column one and the data accordingly in column 2.
> With this big dataset however it chooses the whole first column as ID
> variables in stead of the first row. Is there a reason why this happens
> and how can I make sure the first row is chosen as ID variabele and the
> lower rows as data?
>
> If I specify that I want the first row to be the id variable I also get
> error.
>
> melt(dataset,id.vars=dataset[1,], na.rm=TRUE)
>
> Error: id variables not found in data: norm, jaar
>
> Are there alternative ways to create a good reshaped dataset?
>
> Met vriendelijke groeten - With kind regards,
>
> Joachim Audenaert
> onderzoeker gewasbescherming - crop protection researcher
>
> PCS | proefcentrum voor sierteelt - ornamental plant research
>
> Schaessestraat 18, 9070 Destelbergen, Belgi
> T: +32 (0)9 353 94 71 | F: +32 (0)9 353 94 95
> E: joachim.audenaert at pcsierteelt.be | W: www.pcsierteelt.be
>
> Heb je je individuele begeleiding bemesting (CVBB) al aangevraagd? |
> Het PCS op LinkedIn Disclaimer | Please consider the environment before
> printing. Think green, keep it on the screen!
> [[alternative HTML version deleted]]
________________________________
Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a jsou ur?eny
pouze jeho adres?t?m.
Jestli?e jste obdr?el(a) tento e-mail omylem, informujte laskav? neprodlen? jeho
odes?latele. Obsah tohoto emailu i s p??lohami a jeho kopie vyma?te ze sv?ho
syst?mu.
Nejste-li zam??len?m adres?tem tohoto emailu, nejste opr?vn?ni tento email
jakkoliv u??vat, roz?i?ovat, kop?rovat ?i zve?ej?ovat.
Odes?latel e-mailu neodpov?d? za eventu?ln? ?kodu zp?sobenou modifikacemi ?i
zpo?d?n?m p?enosu e-mailu.
V p??pad?, ?e je tento e-mail sou??st? obchodn?ho jedn?n?:
- vyhrazuje si odes?latel pr?vo ukon?it kdykoliv jedn?n? o uzav?en? smlouvy, a
to z jak?hokoliv d?vodu i bez uveden? d?vodu.
- a obsahuje-li nab?dku, je adres?t opr?vn?n nab?dku bezodkladn? p?ijmout;
Odes?latel tohoto e-mailu (nab?dky) vylu?uje p?ijet? nab?dky ze strany p??jemce
s dodatkem ?i odchylkou.
- trv? odes?latel na tom, ?e p??slu?n? smlouva je uzav?ena teprve v?slovn?m
dosa?en?m shody na v?ech jej?ch n?le?itostech.
- odes?latel tohoto emailu informuje, ?e nen? opr?vn?n uzav?rat za spole?nost
??dn? smlouvy s v?jimkou p??pad?, kdy k tomu byl p?semn? zmocn?n nebo p?semn?
pov??en a takov? pov??en? nebo pln? moc byly adres?tovi tohoto emailu p??padn?
osob?, kterou adres?t zastupuje, p?edlo?eny nebo jejich existence je adres?tovi
?i osob? j?m zastoupen? zn?m?.
This e-mail and any documents attached to it may be confidential and are
intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender.
Delete the contents of this e-mail with all attachments and its copies from your
system.
If you are not the intended recipient of this e-mail, you are not authorized to
use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused by
modifications of the e-mail or by delay with transfer of the email.
In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering into a
contract in any time, for any reason, and without stating any reasoning.
- if the e-mail contains an offer, the recipient is entitled to immediately
accept such offer; The sender of this e-mail (offer) excludes any acceptance of
the offer on the part of the recipient containing any amendment or variation.
- the sender insists on that the respective contract is concluded only upon an
express mutual agreement on all its aspects.
- the sender of this e-mail informs that he/she is not authorized to enter into
any contracts on behalf of the company except for cases in which he/she is
expressly authorized to do so in writing, and such authorization or power of
attorney is submitted to the recipient or the person represented by the
recipient, or the existence of such authorization is known to the recipient of
the person represented by the recipient.
Joachim Audenaert
2015-Apr-16 11:12 UTC
[R] melt function chooses wrong id variable with large datasets
Hello,
This is a part of my dataset:
structure(list(januari = c(38.1, 32.4, 34.5, 20.7, 21.5, 23.1,
29.7, 36.6, 36.1, 20.6, 20.4, 30.1, 38.7, 41.4, 37, 36, 37, 38,
23, 26.7), februari = c(31.5, 36.2, 38.2, 26.4, 20.9, 21.5, 30.2,
33.4, 32.6, 22.2, 21.7, 30, 35.7, 32.8, 39.3, 25.5, 23, 19.9,
21.3, 20.8), maart = c(34.2, 27, 24.2, 19.9, 19.7, 21.5, 30.6,
30, 19, 19.6, 20.6, 23.6, 17.9, 17.3, 21.4, 24.1, 20.9, 30.1,
32.6, 21.3), april = c(26.3, 29.6, 30.3, 23.6, 28.4, 20.7, 24.1,
27.3, 23.2, 18.3, 24.6, 27.4, 20.4, 18.1, 25.2, 19.8, 21, 23.7,
19.6, 18.1), mei = c(23.7, 24, 17.2, 23.2, 25.2, 17.2, 16, 15.6,
13.4, 16, 16.8, 14.6, 19.4, 21, 19.5, 18.5, 13.3, 13.7, 14.3,
14.1), juni = c(17.7, 14.2, 16.6, 15.7, 13.7, 14.7, 13.1, 12.9,
15.4, 11.9, 15.2, 15.3, 16.5, 16.1, 11.7, 11.2, 11.5, 10.8, 16.1,
14.8), juli = c(15.7, 14.5, 10.8, 10.5, 13.4, 12.2, 13.2, 13,
12.4, 13.1, 9.8, 10.5, 13.4, 11, 13.1, 15, 16.7, 16.1, 18.2,
15.7), augustus = c(12.9, 12.8, 15.2, 14.5, 17.2, 14.5, 14.4,
11, 13.1, 13.6, 14.6, 12.7, 13.6, 12.7, 15.5, 17.4, 15.2, 14.2,
17.7, 19.2), september = c(15.6, 15.5, 15.9, 15.1, 16, 19.4,
21.5, 23.7, 18.7, 23.8, 18, 16.2, 18.5, 20.6, 18.3, 22.5, 26.9,
19.4, 15.9, 20.5), oktober = c(21.4, 20.8, 14, 17, 23, 26.4,
19.6, 22.7, 26.9, 14.7, 15.2, 19.8, 26.9, 20.2, 14.3, 14.8, 18.5,
21.7, 21.4, 21.8), november = c(24.7, 26.2, 29, 21.6, 17.1, 16.9,
19.1, 24.7, 25.4, 19.8, 18.2, 16.3, 17, 17.7, 15.5, 14.7, 15.8,
19.9, 20.4, 23.3), december = c(19.8, 27, 21, 33, 22.6, 28.3,
21.1, 19, 17.3, 27, 30.2, 24.8, 17.9, 17.9, 20.7, 30.9, 36.2,
21, 20.2, 21.3), norm = c("45.8713463281901",
"24.047250681782984",
"3.7533684144746324", "38.594241119279324",
"26.391897460120358",
"61.746470001194638", "6.8321020448487992",
"11.933109250115226",
"51.951891096493924", "37.424611852237945",
"5.1587836676942374",
"36.552835044409434", "31.781209673851027",
"29.09146215582853",
"4.856812959269508", "5.3982910143166514",
"46.553976273304215",
"17.566272518985429", "20.552451905814117",
"61.894775704479279"
)), .Names = c("januari", "februari", "maart",
"april", "mei",
"juni", "juli", "augustus", "september",
"oktober", "november",
"december", "norm"), row.names = c(NA, 20L), class =
"data.frame")
I transform my dataset with the following script:
y <- melt(dataset,na.rm=TRUE)
variable <- y[,1]
value <- y[,2]
and can then perform a levene test as follows:
LEVENE <- leveneTest(value~variable,y)
When the dataset is small, lets say less than 100 values per column
everything works great. I get the message:
No id variables; using all as measure variables
When the dataset is much bigger I get the following message
Using norm as id variables, why does this function pick norm as id
variable? and how can I tell R that each column title is my variable
Met vriendelijke groeten - With kind regards,
Joachim Audenaert
onderzoeker gewasbescherming - crop protection researcher
PCS | proefcentrum voor sierteelt - ornamental plant research
Schaessestraat 18, 9070 Destelbergen, Belgi?
T: +32 (0)9 353 94 71 | F: +32 (0)9 353 94 95
E: joachim.audenaert at pcsierteelt.be | W: www.pcsierteelt.be
From: PIKAL Petr <petr.pikal at precheza.cz>
To: Joachim Audenaert <Joachim.Audenaert at pcsierteelt.be>,
"r-help at r-project.org" <r-help at r-project.org>
Date: 16/04/2015 12:13
Subject: RE: [R] melt function chooses wrong id variable with
large datasets
Hi
There is something weird with your data and melt function.
AFAIK melt does not use first row as id.variables.
What is result of
str(dataset)
Instead of
melt(dataset,id.vars=dataset[1,], na.rm=TRUE)
melt expects something like
melt(dataset, id.vars=c("norm, "jaar"), na.rm=TRUE)
If you want more specific answer you shall show us part of your data,
preferably copy output of
dput(dataset[1:20,])
into your mail.
Cheers
Petr
> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Joachim
> Audenaert
> Sent: Thursday, April 16, 2015 11:37 AM
> To: r-help at r-project.org
> Subject: [R] melt function chooses wrong id variable with large
> datasets
>
> Hello all,
>
> I'm using a large dataset consisting of 2 groups of data, 2 columns in
> excel with a header (group name) and 15 000 rows of data. I would like
> like to compare this data, so I transform my dataset with the melt
> function to get 1 column of data and 1 column of ID variables, then I
> can apply different statistical tests. With small datasets this works
> great, the melt function automatically chooses the name in row 1 as ID
> variable and melts the data, thus giving me a matrix with all ID
> variables in column one and the data accordingly in column 2.
> With this big dataset however it chooses the whole first column as ID
> variables in stead of the first row. Is there a reason why this happens
> and how can I make sure the first row is chosen as ID variabele and the
> lower rows as data?
>
> If I specify that I want the first row to be the id variable I also get
> error.
>
> melt(dataset,id.vars=dataset[1,], na.rm=TRUE)
>
> Error: id variables not found in data: norm, jaar
>
> Are there alternative ways to create a good reshaped dataset?
>
> Met vriendelijke groeten - With kind regards,
>
> Joachim Audenaert
> onderzoeker gewasbescherming - crop protection researcher
>
> PCS | proefcentrum voor sierteelt - ornamental plant research
>
> Schaessestraat 18, 9070 Destelbergen, Belgi
> T: +32 (0)9 353 94 71 | F: +32 (0)9 353 94 95
> E: joachim.audenaert at pcsierteelt.be | W: www.pcsierteelt.be
>
> Heb je je individuele begeleiding bemesting (CVBB) al aangevraagd? |
> Het PCS op LinkedIn Disclaimer | Please consider the environment before
> printing. Think green, keep it on the screen!
> [[alternative HTML version deleted]]
________________________________
Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a jsou
ur?eny pouze jeho adres?t?m.
Jestli?e jste obdr?el(a) tento e-mail omylem, informujte laskav?
neprodlen? jeho odes?latele. Obsah tohoto emailu i s p??lohami a jeho
kopie vyma?te ze sv?ho syst?mu.
Nejste-li zam??len?m adres?tem tohoto emailu, nejste opr?vn?ni tento email
jakkoliv u??vat, roz?i?ovat, kop?rovat ?i zve?ej?ovat.
Odes?latel e-mailu neodpov?d? za eventu?ln? ?kodu zp?sobenou modifikacemi
?i zpo?d?n?m p?enosu e-mailu.
V p??pad?, ?e je tento e-mail sou??st? obchodn?ho jedn?n?:
- vyhrazuje si odes?latel pr?vo ukon?it kdykoliv jedn?n? o uzav?en?
smlouvy, a to z jak?hokoliv d?vodu i bez uveden? d?vodu.
- a obsahuje-li nab?dku, je adres?t opr?vn?n nab?dku bezodkladn? p?ijmout;
Odes?latel tohoto e-mailu (nab?dky) vylu?uje p?ijet? nab?dky ze strany
p??jemce s dodatkem ?i odchylkou.
- trv? odes?latel na tom, ?e p??slu?n? smlouva je uzav?ena teprve
v?slovn?m dosa?en?m shody na v?ech jej?ch n?le?itostech.
- odes?latel tohoto emailu informuje, ?e nen? opr?vn?n uzav?rat za
spole?nost ??dn? smlouvy s v?jimkou p??pad?, kdy k tomu byl p?semn?
zmocn?n nebo p?semn? pov??en a takov? pov??en? nebo pln? moc byly
adres?tovi tohoto emailu p??padn? osob?, kterou adres?t zastupuje,
p?edlo?eny nebo jejich existence je adres?tovi ?i osob? j?m zastoupen?
zn?m?.
This e-mail and any documents attached to it may be confidential and are
intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its
sender. Delete the contents of this e-mail with all attachments and its
copies from your system.
If you are not the intended recipient of this e-mail, you are not
authorized to use, disseminate, copy or disclose this e-mail in any
manner.
The sender of this e-mail shall not be liable for any possible damage
caused by modifications of the e-mail or by delay with transfer of the
email.
In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering into a
contract in any time, for any reason, and without stating any reasoning.
- if the e-mail contains an offer, the recipient is entitled to
immediately accept such offer; The sender of this e-mail (offer) excludes
any acceptance of the offer on the part of the recipient containing any
amendment or variation.
- the sender insists on that the respective contract is concluded only
upon an express mutual agreement on all its aspects.
- the sender of this e-mail informs that he/she is not authorized to enter
into any contracts on behalf of the company except for cases in which
he/she is expressly authorized to do so in writing, and such authorization
or power of attorney is submitted to the recipient or the person
represented by the recipient, or the existence of such authorization is
known to the recipient of the person represented by the recipient.
Heb je je individuele begeleiding bemesting (CVBB) al aangevraagd? | Het
PCS op LinkedIn
Disclaimer | Please consider the environment before printing. Think green,
keep it on the screen!
[[alternative HTML version deleted]]