Without more details it is hard to answer but it is suspicious that it
is dropping one of your predictors and the standard errors of the other
are very large. This suggests you should investigate the joint
distribution of your predictors and the events.
Michael
On 02/05/2019 13:37, Haddison Mureithi wrote:> Hello guys this problem was never answered and I happened to come across
> the same problem , kindly help. This is a simple R program that I have been
> trying to run. I keep running into the "singular matrix" error. I
end up
> with no sensible results. Can anyone suggest any changes or a way around
> this?
>
> I am a total rookie when working with R.
>
> Thanks,
> Haddison
>
>> library(survival)
> Loading required package: splines
>> args(coxph)
> function (formula, data, weights, subset, na.action, init, control,
> method = c("efron", "breslow", "exact"),
singular.ok = TRUE,
> robust = FALSE, model = FALSE, x = FALSE, y = TRUE, tt, ...)
> NULL
>>
test1<-read.table("S:/FISHDO/03_Phase_I_Field_Work/Data_6_28_2011/Working
> Folder/R_files/4SondesJuly24.csv", header=T, sep=",")
>> sondes<-coxph(Surv(Start, Stop, Depart)~DOLoomis + DOI55 + DODamen,
> data=test1)
> Warning messages:
> 1: In fitter(X, Y, strats, offset, init, control, weights = weights, :
> Loglik converged before variable 1,2 ; beta may be infinite.
> 2: In coxph(Surv(Start, Stop, Depart) ~ DOLoomis + DOI55 + DODamen, :
> X matrix deemed to be singular; variable 3
>> summary(sondes)
> Call:
> coxph(formula = Surv(Start, Stop, Depart) ~ DOLoomis + DOI55 +
> DODamen, data = test1)
>
> n= 1737, number of events= 58
> (1 observation deleted due to missingness)
>
> coef exp(coef) se(coef) z Pr(>|z|)
> DOLoomis -2.152e+00 1.163e-01 1.161e+05 0 1
> DOI55 4.560e-01 1.578e+00 3.755e+04 0 1
> DODamen NA NA 0.000e+00 NA NA
>
> exp(coef) exp(-coef) lower .95 upper .95
> DOLoomis 0.1163 8.5995 0 Inf
> DOI55 1.5777 0.6338 0 Inf
> DODamen NA NA NA NA
>
> Concordance= 0.5 (se = 0 )
> Rsquare= 0 (max possible= 0.01 )
> Likelihood ratio test= 0 on 2 df, p=1
> Wald test = 0 on 2 df, p=1
> Score (logrank) test = 0 on 2 df, p=1
>
> On Wed, 1 May 2019, 1:00 pm , <r-help-request at r-project.org>
wrote:
>
>> Send R-help mailing list submissions to
>> r-help at r-project.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> or, via email, send a message with subject or body 'help' to
>> r-help-request at r-project.org
>>
>> You can reach the person managing the list at
>> r-help-owner at r-project.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of R-help digest..."
>>
>>
>> Today's Topics:
>>
>> 1. Re: Bug in R 3.6.0? (Martin Maechler)
>> 2. Re: Bug in R 3.6.0? (ocjt at free.fr)
>> 3. Time series (trend over time) for irregular sampling dates
>> and multiple sites (=?UTF-8?Q?Catarina_Serra_Gon=C3=A7alves?=)
>> 4. Re: Time series (trend over time) for irregular sampling
>> dates and multiple sites (Bert Gunter)
>> 5. Passing formula as parameter to `lm` within `sapply` causes
>> error [BUG?] (Jens Heumann)
>> 6. (no subject) (Haddison Mureithi)
>> 7. Help with loop for column means into new column by a subset
>> Factor w/131 levels (Bill Poling)
>> 8. Re: Help with loop for column means into new column by a
>> subset Factor w/131 levels (Bill Poling)
>> 9. transpose and split dataframe (Matthew)
>> 10. Re: transpose and split dataframe (David L Carlson)
>> 11. Re: Passing formula as parameter to `lm` within `sapply`
>> causes error [BUG?] (David Winsemius)
>> 12. Fwd: Re: transpose and split dataframe (Matthew)
>> 13. Re: transpose and split dataframe (Jim Lemon)
>> 14. Re: Time series (trend over time) for irregular sampling
>> dates and multiple sites (Abs Spurdle)
>> 15. Re: Fwd: Re: transpose and split dataframe (David L Carlson)
>> 16. Re: Passing formula as parameter to `lm` within `sapply`
>> causes error [BUG?] (Duncan Murdoch)
>> 17. Re: Time series (trend over time) for irregular sampling
>> dates and multiple sites (Abs Spurdle)
>> 18. Re: Time series (trend over time) for irregular sampling
>> dates and multiple sites (Abs Spurdle)
>> 19. Re: Passing formula as parameter to `lm` within `sapply`
>> causes error [BUG?] (Jens Heumann)
>> 20. Re: Passing formula as parameter to `lm` within `sapply`
>> causes error [BUG?] (peter dalgaard)
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Tue, 30 Apr 2019 16:54:10 +0200
>> From: Martin Maechler <maechler at stat.math.ethz.ch>
>> To: Morgan Morgan <morgan.emailbox at gmail.com>
>> Cc: <r-help at r-project.org>
>> Subject: Re: [R] Bug in R 3.6.0?
>> Message-ID: <23752.24978.45927.96764 at stat.math.ethz.ch>
>> Content-Type: text/plain; charset="utf-8"
>>
>>>>>>> Morgan Morgan
>>>>>>> on Mon, 29 Apr 2019 21:42:36 +0100 writes:
>>
>> > Hi,
>> > I am using the R 3.6.0 on windows. The issue that I report
below
>> does not
>> > exist with previous version of R.
>> > In order to reproduce the error you must install a package of
your
>> choice
>> > from source (tar.gz).
>>
>> > -Create a .Rprofile file with the following command in it :
>> setwd("D:/")
>> > -Close your R session and re-open it. Your working directory
must be
>> now set
>> > to D:
>> > -Install a package of your choice from source, example :
>> >
install.packages("data.table",type="source")
>>
>> > In my case the package fail to install and I get the
following error
>> > message:
>>
>> > ** R
>> > ** inst
>> > ** byte-compile and prepare package for lazy loading
>> > Error in tools:::.read_description(file) :
>> > file 'DESCRIPTION' does not exist
>> > Calls: suppressPackageStartupMessages ... withCallingHandlers
->
>> > .getRequiredPackages -> <Anonymous> ->
<Anonymous>
>> > Execution halted
>> > ERROR: lazy loading failed for package 'data.table'
>> > * removing
'C:/Users/Morgan/Documents/R/win-library/3.6/data.table'
>> > * restoring previous
>> >
'C:/Users/Morgan/Documents/R/win-library/3.6/data.table'
>> > Warning in install.packages :
>> > installation of package ?data.table? had non-zero exit status
>>
>> > Now remove the .Rprofile file, restart your R session and try
to
>> install th
>> e
>> > package with the same command.
>> > In that case everything should be installed just fine.
>>
>> > FYI the issue happens on macOS as well and I suspect it also
does on
>> all
>> > linux systems.
>>
>> > My question: Is this expected or is it a bug?
>>
>> It is a bug, thank you very much for reporting it.
>>
>> I've been told privately by ?mer An (thank you!) who's been
>> affected as well, that this problem seems to affect others, and
>> that there's a thread about this over at the Rstudio support site
>>
>>
>>
https://support.rstudio.com/hc/en-us/community/posts/200704708-Build-tool-does-not-recognize-DESCRIPTION-file
>>
>> There, users mention that (all?) packages are affected which
>> have a multiline 'Description:' field in their DESCRIPTION
file.
>> Of course, many if not most packages have this property.
>>
>> Indeed, I can reproduce the problem (e.g. with my 'sfsmisc'
>> package) if I ("silly enough to") add a setwd() call to my
>> Rprofile file (the one I set via env.var R_PROFILE or
R_PROFILE_USER).
>>
>> This is clearly a bug, and indeed a bad one.
>>
>> It seems all R core (and other R expert users who have tried R
>> 3.6.0 alpha, beta, and RC versions) have *not* seen the bug as they
>> are intuitively smart not to mess with R's working directory in
>> a global R profile file ...
>>
>> For now you definitively have to work around by not doing what's
>> the problem : do *NOT* setwd() in your ~/.Rprofile or other
>> such R init files.
>>
>> Best,
>> Martin Maechler
>> ETH Zurich and R Core Team
>>
>>
>>
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Tue, 30 Apr 2019 16:15:46 +0200
>> From: <ocjt at free.fr>
>> To: "'Morgan Morgan'" <morgan.emailbox at
gmail.com>,
>> <r-help at r-project.org>
>> Subject: Re: [R] Bug in R 3.6.0?
>> Message-ID: <002d01d4ff5f$34816be0$9d8443a0$@free.fr>
>> Content-Type: text/plain; charset="utf-8"
>>
>> Hello,
>>
>> I have exactly the same problem when I install one of my own packages:
>>
>> Error in tools:::.read_description(file) :
>> file 'DESCRIPTION' does not exist
>> Calls: suppressPackageStartupMessages ... withCallingHandlers ->
>> .getRequiredPackages -> <Anonymous> -> <Anonymous>
>> Ex?cution arr?t?e
>> ERROR: lazy loading failed for package 'RRegArch'
>>
>> Best,
>> Ollivier
>>
>>
>> -----Message d'origine-----
>> De : R-help <r-help-bounces at r-project.org> De la part de
Morgan Morgan
>> Envoy? : lundi 29 avril 2019 22:43
>> ? : r-help at r-project.org
>> Objet : [R] Bug in R 3.6.0?
>>
>> Hi,
>>
>> I am using the R 3.6.0 on windows. The issue that I report below does
not
>> exist with previous version of R.
>> In order to reproduce the error you must install a package of your
choice
>> from source (tar.gz).
>>
>> -Create a .Rprofile file with the following command in it :
setwd("D:/")
>> -Close your R session and re-open it. Your working directory must be
now
>> set to D:
>> -Install a package of your choice from source, example :
>> install.packages("data.table",type="source")
>>
>> In my case the package fail to install and I get the following error
>> message:
>>
>> ** R
>> ** inst
>> ** byte-compile and prepare package for lazy loading Error in
>> tools:::.read_description(file) :
>> file 'DESCRIPTION' does not exist
>> Calls: suppressPackageStartupMessages ... withCallingHandlers ->
>> .getRequiredPackages -> <Anonymous> -> <Anonymous>
Execution halted
>> ERROR: lazy loading failed for package 'data.table'
>> * removing
'C:/Users/Morgan/Documents/R/win-library/3.6/data.table'
>> * restoring previous
>> 'C:/Users/Morgan/Documents/R/win-library/3.6/data.table'
>> Warning in install.packages :
>> installation of package ?data.table? had non-zero exit status
>>
>> Now remove the .Rprofile file, restart your R session and try to
install
>> the package with the same command.
>> In that case everything should be installed just fine.
>>
>> FYI the issue happens on macOS as well and I suspect it also does on
all
>> linux systems.
>>
>> My question: Is this expected or is it a bug?
>>
>> Thank you
>> Best regards,
>> Morgan
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>>
>> ------------------------------
>>
>> Message: 3
>> Date: Wed, 1 May 2019 00:57:43 +1000
>> From: =?UTF-8?Q?Catarina_Serra_Gon=C3=A7alves?= <catarinasg at
gmail.com>
>> To: r-help at r-project.org
>> Subject: [R] Time series (trend over time) for irregular sampling
>> dates and multiple sites
>> Message-ID:
>> <
>> CAOQWJbvY+JKy80sksmfC8tu-C+5qq-tzwAd21XbyGvJAyYjQPQ at
mail.gmail.com>
>> Content-Type: text/plain; charset="utf-8"
>>
>> I have a dataset of marine debris items (number of items standardized
per
>> effort: Items/(number of volunteers*Hours*Lenght)) taken from 2 main
>> locations (WA and Queensland) in Australia (8 Sub Sites in total: 4 in
WA
>> and 4 in Queensland) at irregular sampling intervals over a period 15
>> years.
>>
>> I want to test if there is a change over the years on the amount of
debris
>> in these locations and more specifically a change after the
implementation
>> of a mitigation strategy (in 2013).
>> Here?s the head of the data:[image: enter image description here]
>> <https://i.stack.imgur.com/VNIpb.png>Description of each one of
the
>> varables in the dataframe:
>>
>> *eventid *= each sampling (clean-up) event Location = Queensland and
New
>> South Wales Sites = all the 9 sampling beaches
>>
>> *Date *= specific dates for the clean-up events (day-month-year)
>>
>> *Date1 *= specific dates for the clean-up events (day-month-year) on
the
>> POSICXT format Year= Year of sampling event (2004 to 2018)
>>
>> *Month*= Month of the sampling event (jan to dec)
>>
>> *nMonth*= a number was determined to the respective month of the
sampling
>> event (1 to 12)
>>
>> *Day*= Day of sampling (1 to 31) Days = Days since the first date of
clean
>> up = just another way of using the dates
>>
>> *MARPOL *= before and after implementation (factor with 2 levels)
>>
>> *DaysC *= days between sampling events for the same sites = number of
days
>> since the previous clean-up event
>>
>> *DaysI *= Days since intervention, all the dates before implementation
are
>> zero, and after we count the number of days since the implementation
date
>> (1 jan 2013)
>>
>> *DaysIa*= same as DayI but instead of zero for before the intervention
we
>> have negative values (days)
>>
>> *Items *= number of fishing and shipping items counted in each clean-up
>> event
>>
>> *Hours *= hours spent by all volunteers together at each clean up event
>>
>> *Lenght *= Lenght of beach sampled by all volunteers together at each
clean
>> up event volunteers = all volunteers at each clean up event
>>
>> *HoursVolunteer *= hours spent bt each volunteer at each clean up event
>> (Hours/volunteers)
>>
>> *Ieffort *= the items standarized by the effort (hours, volunteers and
>> lenght)
>>
>> *GrossWeight & **GrossTotal are not relevant *
>> ------------------------------
>> Problems:
>>
>> My data has a few problems: (1) I think I will need to fix the effects
of
>> seasonal variation (Monthly) and (2) of possible spatial correlation
>> (probability of finding an item is higher after finding one since they
can
>> come from the same ship). (3) How do I handle the fact that the
>> measurements were not taken at a regular interval?
>>
>> I was trying to use GAMs to analyse the data and see the trends over
time.
>> The model I came across is the following:
>>
>> m4<- gamm(Ieffort ~ s(DaysIa)+MARPOL+ s(nMonth, bs = "ps",
k = 12),
>> random=list(Site=~1,Location=~1),data = d)
>>
>> *thank you in advance.*
>> -
>> *Catarina Serra Gon?alves *
>> PhD candidate
>>
>> Adrift Lab <https://adriftlab.org>
>> University of Tasmania <http://www.utas.edu.au/> | Institute for
Marine
>> and
>> Antarctic Studies <http://www.imas.utas.edu.au/>
>> Launceston, TAS | Australia
>>
>> Personal website <https://catarinasg.wixsite.com/acserra>
>> <https://catarinasg.wixsite.com/acserra>| E-mail <acserra at
utas.edu.au> |
>> Twitter <https://twitter.com/CatarinaSerraG>
>> Research Gate
>> <https://www.researchgate.net/profile/Catarina_Serra_Goncalves> |
Google
>> Scholar
<https://scholar.google.pt/citations?user=8nBrRFwAAAAJ&hl=en>
>>
>> [[alternative HTML version deleted]]
>>
>>
>>
>>
>> ------------------------------
>>
>> Message: 4
>> Date: Tue, 30 Apr 2019 08:28:37 -0700
>> From: Bert Gunter <bgunter.4567 at gmail.com>
>> To: =?UTF-8?Q?Catarina_Serra_Gon=C3=A7alves?= <catarinasg at
gmail.com>
>> Cc: R-help <r-help at r-project.org>
>> Subject: Re: [R] Time series (trend over time) for irregular sampling
>> dates and multiple sites
>> Message-ID:
>> <CAGxFJbT2YSB1xcs0MajpeqtHbbn4T1ycYoSOBEFvMucFme1t>>
g at mail.gmail.com>
>> Content-Type: text/plain; charset="utf-8"
>>
>> I have 0 expertise, but I suggest that you check out the SPatioTemporal
>> taskview on CRAN (or possibly others, like environmetrics). You might
also
>> want to move this to the R-Sig-geo list,where you probably are more
likely
>> to find relevant expertise.
>>
>> Cheers,
>> Bert
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming
along and
>> sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic
strip )
>>
>>
>> On Tue, Apr 30, 2019 at 8:13 AM Catarina Serra Gon?alves <
>> catarinasg at gmail.com> wrote:
>>
>>> I have a dataset of marine debris items (number of items
standardized per
>>> effort: Items/(number of volunteers*Hours*Lenght)) taken from 2
main
>>> locations (WA and Queensland) in Australia (8 Sub Sites in total: 4
in WA
>>> and 4 in Queensland) at irregular sampling intervals over a period
15
>>> years.
>>>
>>> I want to test if there is a change over the years on the amount of
>> debris
>>> in these locations and more specifically a change after the
>> implementation
>>> of a mitigation strategy (in 2013).
>>> Here?s the head of the data:[image: enter image description here]
>>> <https://i.stack.imgur.com/VNIpb.png>Description of each one
of the
>>> varables in the dataframe:
>>>
>>> *eventid *= each sampling (clean-up) event Location = Queensland
and New
>>> South Wales Sites = all the 9 sampling beaches
>>>
>>> *Date *= specific dates for the clean-up events (day-month-year)
>>>
>>> *Date1 *= specific dates for the clean-up events (day-month-year)
on the
>>> POSICXT format Year= Year of sampling event (2004 to 2018)
>>>
>>> *Month*= Month of the sampling event (jan to dec)
>>>
>>> *nMonth*= a number was determined to the respective month of the
sampling
>>> event (1 to 12)
>>>
>>> *Day*= Day of sampling (1 to 31) Days = Days since the first date
of
>> clean
>>> up = just another way of using the dates
>>>
>>> *MARPOL *= before and after implementation (factor with 2 levels)
>>>
>>> *DaysC *= days between sampling events for the same sites = number
of
>> days
>>> since the previous clean-up event
>>>
>>> *DaysI *= Days since intervention, all the dates before
implementation
>> are
>>> zero, and after we count the number of days since the
implementation date
>>> (1 jan 2013)
>>>
>>> *DaysIa*= same as DayI but instead of zero for before the
intervention we
>>> have negative values (days)
>>>
>>> *Items *= number of fishing and shipping items counted in each
clean-up
>>> event
>>>
>>> *Hours *= hours spent by all volunteers together at each clean up
event
>>>
>>> *Lenght *= Lenght of beach sampled by all volunteers together at
each
>> clean
>>> up event volunteers = all volunteers at each clean up event
>>>
>>> *HoursVolunteer *= hours spent bt each volunteer at each clean up
event
>>> (Hours/volunteers)
>>>
>>> *Ieffort *= the items standarized by the effort (hours, volunteers
and
>>> lenght)
>>>
>>> *GrossWeight & **GrossTotal are not relevant *
>>> ------------------------------
>>> Problems:
>>>
>>> My data has a few problems: (1) I think I will need to fix the
effects of
>>> seasonal variation (Monthly) and (2) of possible spatial
correlation
>>> (probability of finding an item is higher after finding one since
they
>> can
>>> come from the same ship). (3) How do I handle the fact that the
>>> measurements were not taken at a regular interval?
>>>
>>> I was trying to use GAMs to analyse the data and see the trends
over
>> time.
>>> The model I came across is the following:
>>>
>>> m4<- gamm(Ieffort ~ s(DaysIa)+MARPOL+ s(nMonth, bs =
"ps", k = 12),
>>> random=list(Site=~1,Location=~1),data = d)
>>>
>>> *thank you in advance.*
>>> -
>>> *Catarina Serra Gon?alves *
>>> PhD candidate
>>>
>>> Adrift Lab <https://adriftlab.org>
>>> University of Tasmania <http://www.utas.edu.au/> | Institute
for Marine
>>> and
>>> Antarctic Studies <http://www.imas.utas.edu.au/>
>>> Launceston, TAS | Australia
>>>
>>> Personal website <https://catarinasg.wixsite.com/acserra>
>>> <https://catarinasg.wixsite.com/acserra>| E-mail <acserra
at utas.edu.au>
>> |
>>> Twitter <https://twitter.com/CatarinaSerraG>
>>> Research Gate
>>>
<https://www.researchgate.net/profile/Catarina_Serra_Goncalves> | Google
>>> Scholar
<https://scholar.google.pt/citations?user=8nBrRFwAAAAJ&hl=en>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> [[alternative HTML version deleted]]
>>
>>
>>
>>
>> ------------------------------
>>
>> Message: 5
>> Date: Tue, 30 Apr 2019 17:24:33 +0200
>> From: Jens Heumann <jens.heumann at students.unibe.ch>
>> To: <r-help at r-project.org>
>> Subject: [R] Passing formula as parameter to `lm` within `sapply`
>> causes error [BUG?]
>> Message-ID: <75abba2b-c528-460e-df92-08f8479ba399 at
students.unibe.ch>
>> Content-Type: text/plain; charset="utf-8";
Format="flowed"
>>
>> Hi,
>>
>> `lm` won't take formula as a parameter when it is within a
`sapply`; see
>> example below. Please, could anyone either point me to a syntax error
or
>> confirm that this might be a bug?
>>
>> Best,
>> Jens
>>
>> [Disclaimer: This is my first post here, following advice of how to
>> proceed with possible bugs from here:
https://www.r-project.org/bugs.html]
>>
>>
>> SUMMARY
>>
>> While `lm` alone accepts formula parameter `FO` well, the same within a
>> `sapply` causes an error. When putting everything as parameter but
>> formula `FO`, it's still working, though. All parameters work fine
>> within a similar `for` loop.
>>
>>
>> MCVE (see data / R-version at bottom)
>>
>> > summary(lm(y ~ x, df1, df1[["z"]] == 1,
df1[["w"]]))$coef[1, ]
>> Estimate Std. Error t value Pr(>|t|)
>> 1.6269038 0.9042738 1.7991275 0.3229600
>> > summary(lm(FO, data, data[[st]] == st1, data[[ws]]))$coef[1, ]
>> Estimate Std. Error t value Pr(>|t|)
>> 1.6269038 0.9042738 1.7991275 0.3229600
>> > sapply(unique(df1$z), function(s)
>> + summary(lm(y ~ x, df1, df1[["z"]] == s,
df1[[ws]]))$coef[1, ])
>> [,1] [,2] [,3]
>> Estimate 1.6269038 -0.1404174 -0.010338774
>> Std. Error 0.9042738 0.4577001 1.858138516
>> t value 1.7991275 -0.3067890 -0.005564049
>> Pr(>|t|) 0.3229600 0.8104951 0.996457853
>> > sapply(unique(data[[st]]), function(s)
>> + summary(lm(FO, data, data[[st]] == s, data[[ws]]))$coef[1, ]) #
!!!
>> Error in eval(substitute(subset), data, env) : object 's' not
found
>> > sapply(unique(data[[st]]), function(s)
>> + summary(lm(y ~ x, data, data[[st]] == s, data[[ws]]))$coef[1, ])
>> [,1] [,2] [,3]
>> Estimate 1.6269038 -0.1404174 -0.010338774
>> Std. Error 0.9042738 0.4577001 1.858138516
>> t value 1.7991275 -0.3067890 -0.005564049
>> Pr(>|t|) 0.3229600 0.8104951 0.996457853
>> > m <- matrix(NA, 4, length(unique(data[[st]])))
>> > for (s in unique(data[[st]])) {
>> + m[, s] <- summary(lm(FO, data, data[[st]] == s,
data[[ws]]))$coef[1, ]
>> + }
>> > m
>> [,1] [,2] [,3]
>> [1,] 1.6269038 -0.1404174 -0.010338774
>> [2,] 0.9042738 0.4577001 1.858138516
>> [3,] 1.7991275 -0.3067890 -0.005564049
>> [4,] 0.3229600 0.8104951 0.996457853
>>
>> # DATA
#################################################################
>>
>> df1 <- structure(list(x = c(1.37095844714667, -0.564698171396089,
>> 0.363128411337339,
>> 0.63286260496104, 0.404268323140999, -0.106124516091484,
1.51152199743894,
>> -0.0946590384130976, 2.01842371387704), y = c(1.30824434809425,
>> 0.740171482827397, 2.64977380403845, -0.755998096151299,
0.125479556323628,
>> -0.239445852485142, 2.14747239550901, -0.37891195982917,
-0.638031707027734
>> ), z = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), w = c(0.7, 0.8,
>> 1.2, 0.9, 1.3, 1.2, 0.8, 1, 1)), class = "data.frame",
row.names = c(NA,
>> -9L))
>>
>> FO <- y ~ x; data <- df1; st <- "z"; ws <-
"w"; st1 <- 1
>>
>>
########################################################################
>>
>> > R.version
>> _
>> platform x86_64-w64-mingw32
>> arch x86_64
>> os mingw32
>> system x86_64, mingw32
>> status
>> major 3
>> minor 6.0
>> year 2019
>> month 04
>> day 26
>> svn rev 76424
>> language R
>> version.string R version 3.6.0 (2019-04-26)
>> nickname Planting of a Tree
>>
>>
#########################################################################
>>
>> NOTE: Question on SO two days ago
>> (
>>
https://stackoverflow.com/questions/55893189/passing-formula-as-parameter-to-lm-within-sapply-causes-error-bug-confirmation)
>>
>> brought many views but neither answer nor bug confirmation.
>>
>>
>>
>>
>> ------------------------------
>>
>> Message: 6
>> Date: Mon, 29 Apr 2019 21:38:00 +0300
>> From: Haddison Mureithi <mureithihaddison at gmail.com>
>> To: r-help at r-project.org
>> Subject: [R] (no subject)
>> Message-ID:
>> <CABVwvn6y_M2M1o41HryKYp>>
LQcbsajdtginyw_RPVf81o4BmqQ at mail.gmail.com>
>> Content-Type: text/plain; charset="utf-8"
>>
>> Hello guys this problem was never answered and I happened to come
across
>> the same problem , kindly help. This is a simple R program that I have
been
>> trying to run. I keep running into the "singular matrix"
error. I end up
>> with no sensible results. Can anyone suggest any changes or a way
around
>> this?
>>
>> I am a total rookie when working with R.
>>
>> Thanks,
>> Rasika
>>
>>> library(survival)
>> Loading required package: splines
>>> args(coxph)
>> function (formula, data, weights, subset, na.action, init, control,
>> method = c("efron", "breslow",
"exact"), singular.ok = TRUE,
>> robust = FALSE, model = FALSE, x = FALSE, y = TRUE, tt, ...)
>> NULL
>>>
test1<-read.table("S:/FISHDO/03_Phase_I_Field_Work/Data_6_28_2011/Working
>> Folder/R_files/4SondesJuly24.csv", header=T, sep=",")
>>> sondes<-coxph(Surv(Start, Stop, Depart)~DOLoomis + DOI55 +
DODamen,
>> data=test1)
>> Warning messages:
>> 1: In fitter(X, Y, strats, offset, init, control, weights = weights, :
>> Loglik converged before variable 1,2 ; beta may be infinite.
>> 2: In coxph(Surv(Start, Stop, Depart) ~ DOLoomis + DOI55 + DODamen, :
>> X matrix deemed to be singular; variable 3
>>> summary(sondes)
>> Call:
>> coxph(formula = Surv(Start, Stop, Depart) ~ DOLoomis + DOI55 +
>> DODamen, data = test1)
>>
>> n= 1737, number of events= 58
>> (1 observation deleted due to missingness)
>>
>> coef exp(coef) se(coef) z Pr(>|z|)
>> DOLoomis -2.152e+00 1.163e-01 1.161e+05 0 1
>> DOI55 4.560e-01 1.578e+00 3.755e+04 0 1
>> DODamen NA NA 0.000e+00 NA NA
>>
>> exp(coef) exp(-coef) lower .95 upper .95
>> DOLoomis 0.1163 8.5995 0 Inf
>> DOI55 1.5777 0.6338 0 Inf
>> DODamen NA NA NA NA
>>
>> Concordance= 0.5 (se = 0 )
>> Rsquare= 0 (max possible= 0.01 )
>> Likelihood ratio test= 0 on 2 df, p=1
>> Wald test = 0 on 2 df, p=1
>> Score (logrank) test = 0 on 2 df, p=1
>>
>> [[alternative HTML version deleted]]
>>
>>
>>
>>
>> ------------------------------
>>
>> Message: 7
>> Date: Tue, 30 Apr 2019 16:50:48 +0000
>> From: Bill Poling <Bill.Poling at zelis.com>
>> To: "r-help (r-help at r-project.org)" <r-help at
r-project.org>
>> Subject: [R] Help with loop for column means into new column by a
>> subset Factor w/131 levels
>> Message-ID:
>> <
>> BN7PR02MB50737455E93F882B58EAA4F4EA3A0 at
BN7PR02MB5073.namprd02.prod.outlook.com
>>>
>>
>> Content-Type: text/plain; charset="windows-1252"
>>
>> Good afternoon.
>>
>> #RStudio Version 1.1.456
>> sessionInfo()
>> #R version 3.5.3 (2019-03-11)
>> #Platform: x86_64-w64-mingw32/x64 (64-bit)
>> #Running under: Windows >= 8 x64 (build 9200)
>>
>>
>>
>> #I have a DF of 8 columns and 14025 rows
>>
>> str(hcd2tmp2)
>>
>> # 'data.frame':14025 obs. of 8 variables:
>> # $ Submitted_Charge: num 21021 15360 40561 29495 7904 ...
>> # $ Allowed_Amt : num 18393 6254 40561 29495 7904 ...
>> # $ Submitted_Units : num 60 240 420 45 120 215 215 15 57 2 ...
>> # $ Procedure_Code1 : Factor w/ 131 levels
"A9606","J0129",..: 43 113 117
>> 125 24 85 85 90 86 25 ...
>> # $ AllowByLimit : num 4.268 0.949 7.913 6.124 3.524 ...
>> # $ UnitsByDose : num 600 240 420 450 120 215 215 750 570 500 ...
>> # $ LimitByUnits : num 4310 6591 5126 4816 2243 ...
>> # $ HCPCSCodeDose1 : num 10 1 1 10 1 1 1 50 10 250 ...
>>
>> #I would like to create four additional columns that are the mean of
four
>> current columns in the DF.
>> #Current columns
>> #Allowed_Amt
>> #LimitByUnits
>> #AllowByLimit
>> #UnitsByDose
>>
>> #The goal is to be able to identify rows where (for instance)
Allowed_Amt
>> is greater than the average (aka outliers).
>>
>> #The trick Is I want the means of those columns based on a Factor value
>> #The Factor is:
>> #Procedure_Code1 : Factor w/ 131 levels
"A9606","J0129"
>>
>> #So each of my four new columns will have 131 distinct values based on
the
>> mean for the specific Procedure_Code1 grouping
>>
>> #In SQL it would look something like this:
>>
>> #SELECT *,
>> # NewCol1 = mean(Allowed_Amt) OVER (PARTITION BY Procedure_Code1),
>> # NewCol2 = mean(LimitByUnits) OVER (PARTITION BY Procedure_Code1),
>> # NewCol3 = mean(AllowByLimit) OVER (PARTITION BY Procedure_Code1),
>> # NewCol4 = mean(UnitsByDose) OVER (PARTITION BY Procedure_Code1)
>> #INTO NewTable
>> #FROM Oldtable
>>
>> #Here are some sample data
>>
>> head(hcd2tmp2, n=40)
>> # Submitted_Charge Allowed_Amt Submitted_Units Procedure_Code1
>> AllowByLimit UnitsByDose LimitByUnits HCPCSCodeDose1
>> # 1 21020.70 18393.12 60 J1745
>> 4.2679810 600 4309.56 10
>> # 2 15360.00 6254.40 240 J9299
>> 0.9488785 240 6591.36 1
>> # 3 40561.32 40561.32 420 J9306
>> 7.9133539 420 5125.68 1
>> # 4 29495.25 29495.25 45 J9355
>> 6.1244417 450 4815.99 10
>> # 5 7904.30 7904.30 120 J0897
>> 3.5243000 120 2242.80 1
>> # 6 15331.95 10614.31 215 J9034
>> 2.0586686 215 5155.91 1
>> # 7 15331.95 10614.31 215 J9034
>> 2.0586686 215 5155.91 1
>> # 8 461.90 0.00 15 J9045
>> 0.0000000 750 46.38 50
>> # 9 27340.96 15092.21 57 J9035
>> 3.2600227 570 4629.48 10
>> # 10 768.00 576.00 2 J1190
>> 1.3617343 500 422.99 250
>> # 11 101.00 38.38 5 J2250
>> 59.9687500 5 0.64 1
>> # 12 17458.40 0.00 200 J9033
>> 0.0000000 200 5990.00 1
>> # 13 7885.10 7569.70 1 J1745
>> 105.3835445 10 71.83 10
>> # 14 2015.00 1155.78 4 J2785
>> 5.0051100 0 230.92 0
>> # 15 443.72 443.72 12 J9045
>> 11.9601078 600 37.10 50
>> # 16 113750.00 113750.00 600 J2350
>> 3.3025003 600 34443.60 1
>> # 17 3582.85 3582.85 10 J2469
>> 30.5573561 250 117.25 25
>> # 18 5152.65 5152.65 50 J2796
>> 1.4362988 500 3587.45 10
>> # 19 5152.65 5152.65 50 J2796
>> 1.4362988 500 3587.45 10
>> # 20 39664.09 0.00 74 J9355
>> 0.0000000 740 7919.63 10
>> # 21 166.71 102.53 9 J9045
>> 3.6841538 450 27.83 50
>> # 22 13823.61 9676.53 1 J2505
>> 2.0785247 6 4655.48 6
>> # 23 90954.00 26436.53 360 J1786
>> 1.7443775 3600 15155.28 10
>> # 24 4800.00 3494.40 800 J3262
>> 0.8861838 800 3943.20 1
>> # 25 216.00 105.84 4 J0696
>> 42.3360000 1000 2.50 250
>> # 26 5300.00 4770.00 1 J0178
>> 4.9677151 1 960.20 1
>> # 27 35203.00 35203.00 200 J9271
>> 3.5772498 200 9840.80 1
>> # 28 17589.15 17589.15 300 J3380
>> 2.9696855 300 5922.90 1
>> # 29 18394.64 17842.79 1 J9355
>> 166.7238834 10 107.02 10
>> # 30 770.00 731.50 10 J2469
>> 6.2388060 250 117.25 25
>> # 31 461.90 0.00 15 J9045
>> 0.0000000 750 46.38 50
>> # 32 8160.00 3342.40 80 J1459
>> 1.0260818 40000 3257.44 500
>> # 33 1653.48 314.16 6 J9305
>> 0.7661505 60 410.05 10
>> # 34 13036.50 0.00 194 J9034
>> 0.0000000 194 4652.31 1
>> # 35 10486.87 0.00 156 J9034
>> 0.0000000 156 3741.04 1
>> # 36 15360.00 6254.40 240 J9299
>> 0.9488785 240 6591.36 1
>> # 37 1616.83 1616.83 150 J1453
>> 5.2528590 150 307.80 1
>> # 38 80685.74 34772.43 96 J9035
>> 4.4597077 960 7797.02 10
>> # 39 85220.58 35925.13 287 J9299
>> 4.5577715 287 7882.17 1
>> # 40 3860.17 1627.27 13 J9299
>> 4.5577963 13 357.03 1
>>
>>
>> #I hope this is enough inforamtion to warrant your support
>> #Thank you
>> #WHP
>>
>>
>>
>> Confidentiality Notice This message is sent from Zelis.
...{{dropped:13}}
>>
>>
>>
>>
>> ------------------------------
>>
>> Message: 8
>> Date: Tue, 30 Apr 2019 18:45:40 +0000
>> From: Bill Poling <Bill.Poling at zelis.com>
>> To: "r-help (r-help at r-project.org)" <r-help at
r-project.org>
>> Subject: Re: [R] Help with loop for column means into new column by a
>> subset Factor w/131 levels
>> Message-ID:
>> <
>> BN7PR02MB5073D732498AB265872F5750EA3A0 at
BN7PR02MB5073.namprd02.prod.outlook.com
>>>
>>
>> Content-Type: text/plain; charset="windows-1252"
>>
>> I ran this routine but I was thinking there must be a more elegant way
of
>> doing this.
>>
>>
>> #
>>
https://community.rstudio.com/t/how-to-average-mean-variables-in-r-based-on-the-level-of-another-variable-and-save-this-as-a-new-variable/8764/8
>>
>> hcd2tmp2_summmary <- hcd2tmp2 %>%
>> select(.) %>%
>> group_by(Procedure_Code1) %>%
>> summarize(average = mean(Allowed_Amt))
>> # A tibble: 131 x 2
>> # Procedure_Code1 average
>> # <fct> <dbl>
>> # 1 A9606 57785.
>> # 2 J0129 5420.
>> # 3 J0178 4700.
>> # 4 J0180 13392.
>> # 5 J0202 56328.
>> # 6 J0256 17366.
>> # 7 J0257 7563.
>> # 8 J0485 2450.
>> # 9 J0490 6398.
>> # 10 J0585 4492.
>> # ... with 121 more rows
>>
>> hcd2tmp2 <- hcd2tmp %>%
>> group_by(Procedure_Code1) %>%
>> summarise(Avg_Allowed_Amt = mean(Allowed_Amt))
>>
>> view(hcd2tmp2)
>>
>>
>> hcd2tmp3 <- hcd2tmp %>%
>> group_by(Procedure_Code1) %>%
>> summarise(Avg_AllowByLimit = mean(AllowByLimit))
>>
>> view(hcd2tmp3)
>>
>>
>> hcd2tmp4 <- hcd2tmp %>%
>> group_by(Procedure_Code1) %>%
>> summarise(Avg_UnitsByDose = mean(UnitsByDose))
>>
>> view(hcd2tmp4)
>>
>> hcd2tmp5 <- hcd2tmp %>%
>> group_by(Procedure_Code1) %>%
>> summarise(Avg_LimitByUnits = mean(LimitByUnits))
>>
>> view(hcd2tmp5)
>>
>> #Joins----
>>
>>
>> hcd2tmp <- left_join(hcd2tmp2, hcd2tmp, by >>
c("Procedure_Code1"="Procedure_Code1"))
>> hcd2tmp <- left_join(hcd2tmp3, hcd2tmp, by >>
c("Procedure_Code1"="Procedure_Code1"))
>> hcd2tmp <- left_join(hcd2tmp4, hcd2tmp, by >>
c("Procedure_Code1"="Procedure_Code1"))
>> hcd2tmp <- left_join(hcd2tmp5, hcd2tmp, by >>
c("Procedure_Code1"="Procedure_Code1"))
>>
>> view(hcd2tmp)
>>
>> hcd2tmp$Avg_LimitByUnits <- round(hcd2tmp$Avg_LimitByUnits, digits =
2)
>> hcd2tmp$Avg_Allowed_Amt <- round(hcd2tmp$Avg_Allowed_Amt, digits =
2)
>> hcd2tmp$Avg_AllowByLimit <- round(hcd2tmp$Avg_AllowByLimit, digits =
2)
>> hcd2tmp$Avg_UnitsByDose <- round(hcd2tmp$Avg_UnitsByDose, digits =
2)
>>
>> view(hcd2tmp)
>>
>> #Over under columns----
>> hcd2tmp$AllowByLimitFlag <- hcd2tmp$AllowByLimit >
hcd2tmp$Avg_AllowByLimit
>> hcd2tmp$LimitByUnitsFlag <- hcd2tmp$LimitByUnits >
hcd2tmp$Avg_LimitByUnits
>> hcd2tmp$Allowed_AmtFlag <- hcd2tmp$Allowed_Amt >
hcd2tmp$Avg_Allowed_Amt
>> hcd2tmp$UnitsByDoseFlag <- hcd2tmp$UnitsByDose >
hcd2tmp$Avg_UnitsByDose
>>
>> view(hcd2tmp)
>>
>>
>> -----Original Message-----
>> From: Bill Poling
>> Sent: Tuesday, April 30, 2019 12:51 PM
>> To: r-help (r-help at r-project.org) <r-help at r-project.org>
>> Cc: Bill Poling <Bill.Poling at zelis.com>
>> Subject: Help with loop for column means into new column by a subset
>> Factor w/131 levels
>>
>> Good afternoon.
>>
>> #RStudio Version 1.1.456
>> sessionInfo()
>> #R version 3.5.3 (2019-03-11)
>> #Platform: x86_64-w64-mingw32/x64 (64-bit) #Running under: Windows
>= 8
>> x64 (build 9200)
>>
>>
>>
>> #I have a DF of 8 columns and 14025 rows
>>
>> str(hcd2tmp2)
>>
>> # 'data.frame':14025 obs. of 8 variables:
>> # $ Submitted_Charge: num 21021 15360 40561 29495 7904 ...
>> # $ Allowed_Amt : num 18393 6254 40561 29495 7904 ...
>> # $ Submitted_Units : num 60 240 420 45 120 215 215 15 57 2 ...
>> # $ Procedure_Code1 : Factor w/ 131 levels
"A9606","J0129",..: 43 113 117
>> 125 24 85 85 90 86 25 ...
>> # $ AllowByLimit : num 4.268 0.949 7.913 6.124 3.524 ...
>> # $ UnitsByDose : num 600 240 420 450 120 215 215 750 570 500 ...
>> # $ LimitByUnits : num 4310 6591 5126 4816 2243 ...
>> # $ HCPCSCodeDose1 : num 10 1 1 10 1 1 1 50 10 250 ...
>>
>> #I would like to create four additional columns that are the mean of
four
>> current columns in the DF.
>> #Current columns
>> #Allowed_Amt
>> #LimitByUnits
>> #AllowByLimit
>> #UnitsByDose
>>
>> #The goal is to be able to identify rows where (for instance)
Allowed_Amt
>> is greater than the average (aka outliers).
>>
>> #The trick Is I want the means of those columns based on a Factor value
>> #The Factor is:
>> #Procedure_Code1 : Factor w/ 131 levels
"A9606","J0129"
>>
>> #So each of my four new columns will have 131 distinct values based on
the
>> mean for the specific Procedure_Code1 grouping
>>
>> #In SQL it would look something like this:
>>
>> #SELECT *,
>> # NewCol1 = mean(Allowed_Amt) OVER (PARTITION BY Procedure_Code1),
>> # NewCol2 = mean(LimitByUnits) OVER (PARTITION BY Procedure_Code1),
>> # NewCol3 = mean(AllowByLimit) OVER (PARTITION BY Procedure_Code1),
>> # NewCol4 = mean(UnitsByDose) OVER (PARTITION BY Procedure_Code1)
>> #INTO NewTable
>> #FROM Oldtable
>>
>> #Here are some sample data
>>
>> head(hcd2tmp2, n=40)
>> # Submitted_Charge Allowed_Amt Submitted_Units Procedure_Code1
>> AllowByLimit UnitsByDose LimitByUnits HCPCSCodeDose1
>> # 1 21020.70 18393.12 60 J1745
>> 4.2679810 600 4309.56 10
>> # 2 15360.00 6254.40 240 J9299
>> 0.9488785 240 6591.36 1
>> # 3 40561.32 40561.32 420 J9306
>> 7.9133539 420 5125.68 1
>> # 4 29495.25 29495.25 45 J9355
>> 6.1244417 450 4815.99 10
>> # 5 7904.30 7904.30 120 J0897
>> 3.5243000 120 2242.80 1
>> # 6 15331.95 10614.31 215 J9034
>> 2.0586686 215 5155.91 1
>> # 7 15331.95 10614.31 215 J9034
>> 2.0586686 215 5155.91 1
>> # 8 461.90 0.00 15 J9045
>> 0.0000000 750 46.38 50
>> # 9 27340.96 15092.21 57 J9035
>> 3.2600227 570 4629.48 10
>> # 10 768.00 576.00 2 J1190
>> 1.3617343 500 422.99 250
>> # 11 101.00 38.38 5 J2250
>> 59.9687500 5 0.64 1
>> # 12 17458.40 0.00 200 J9033
>> 0.0000000 200 5990.00 1
>> # 13 7885.10 7569.70 1 J1745
>> 105.3835445 10 71.83 10
>> # 14 2015.00 1155.78 4 J2785
>> 5.0051100 0 230.92 0
>> # 15 443.72 443.72 12 J9045
>> 11.9601078 600 37.10 50
>> # 16 113750.00 113750.00 600 J2350
>> 3.3025003 600 34443.60 1
>> # 17 3582.85 3582.85 10 J2469
>> 30.5573561 250 117.25 25
>> # 18 5152.65 5152.65 50 J2796
>> 1.4362988 500 3587.45 10
>> # 19 5152.65 5152.65 50 J2796
>> 1.4362988 500 3587.45 10
>> # 20 39664.09 0.00 74 J9355
>> 0.0000000 740 7919.63 10
>> # 21 166.71 102.53 9 J9045
>> 3.6841538 450 27.83 50
>> # 22 13823.61 9676.53 1 J2505
>> 2.0785247 6 4655.48 6
>> # 23 90954.00 26436.53 360 J1786
>> 1.7443775 3600 15155.28 10
>> # 24 4800.00 3494.40 800 J3262
>> 0.8861838 800 3943.20 1
>> # 25 216.00 105.84 4 J0696
>> 42.3360000 1000 2.50 250
>> # 26 5300.00 4770.00 1 J0178
>> 4.9677151 1 960.20 1
>> # 27 35203.00 35203.00 200 J9271
>> 3.5772498 200 9840.80 1
>> # 28 17589.15 17589.15 300 J3380
>> 2.9696855 300 5922.90 1
>> # 29 18394.64 17842.79 1 J9355
>> 166.7238834 10 107.02 10
>> # 30 770.00 731.50 10 J2469
>> 6.2388060 250 117.25 25
>> # 31 461.90 0.00 15 J9045
>> 0.0000000 750 46.38 50
>> # 32 8160.00 3342.40 80 J1459
>> 1.0260818 40000 3257.44 500
>> # 33 1653.48 314.16 6 J9305
>> 0.7661505 60 410.05 10
>> # 34 13036.50 0.00 194 J9034
>> 0.0000000 194 4652.31 1
>> # 35 10486.87 0.00 156 J9034
>> 0.0000000 156 3741.04 1
>> # 36 15360.00 6254.40 240 J9299
>> 0.9488785 240 6591.36 1
>> # 37 1616.83 1616.83 150 J1453
>> 5.2528590 150 307.80 1
>> # 38 80685.74 34772.43 96 J9035
>> 4.4597077 960 7797.02 10
>> # 39 85220.58 35925.13 287 J9299
>> 4.5577715 287 7882.17 1
>> # 40 3860.17 1627.27 13 J9299
>> 4.5577963 13 357.03 1
>>
>>
>> #I hope this is enough inforamtion to warrant your support
>> #Thank you
>> #WHP
>>
>>
>>
>> Confidentiality Notice This message is sent from Zelis.
...{{dropped:13}}
>>
>>
>>
>>
>> ------------------------------
>>
>> Message: 9
>> Date: Tue, 30 Apr 2019 15:24:57 -0400
>> From: Matthew <mccormack at molbio.mgh.harvard.edu>
>> To: "r-help (r-help at r-project.org)" <r-help at
r-project.org>
>> Subject: [R] transpose and split dataframe
>> Message-ID:
>> <0d6ac524-4291-ab03-6bcb-592b3996cc74 at
molbio.mgh.harvard.edu>
>> Content-Type: text/plain; charset="utf-8";
Format="flowed"
>>
>> I have a data frame that is a lot bigger but for simplicity sake we can
>> say it looks like this:
>>
>> Regulator hits
>> AT1G69490 AT4G31950,AT5G24110,AT1G26380,AT1G05675
>> AT2G55980 AT2G85403,AT4G89223
>>
>> In other words:
>>
>> data.frame : 2 obs. of 2 variables
>> $Regulator: Factor w/ 2 levels
>> $hits : Factor w/ 6 levels
>>
>> I want to transpose it so that Regulator is now the column headings
>> and each of the AGI numbers now separated by commas is a row. So,
>> AT1G69490 is now the header of the first column and AT4G31950 is row 1
>> of column 1, AT5G24110 is row 2 of column 1, etc. AT2G55980 is header
of
>> column 2 and AT2G85403 is row 1 of column 2, etc.
>>
>> I have tried playing around with strsplit(TF2list[2:2]) and
>> strsplit(as.character(TF2list[2:2]), but I am getting nowhere.
>>
>> Matthew
>>
>>
>>
>>
>> ------------------------------
>>
>> Message: 10
>> Date: Tue, 30 Apr 2019 21:04:50 +0000
>> From: David L Carlson <dcarlson at tamu.edu>
>> To: "r-help at r-project.org" <r-help at
r-project.org>, Matthew
>> <mccormack at molbio.mgh.harvard.edu>
>> Subject: Re: [R] transpose and split dataframe
>> Message-ID: <db8cede89a724defb691cea72a25b092 at tamu.edu>
>> Content-Type: text/plain; charset="utf-8"
>>
>> I neglected to copy this to the list:
>>
>> I think we need more information. Can you give us the structure of the
>> data with str(YourDataFrame). Alternatively you could copy a small
piece
>> into your email message by copying and pasting the results of the
following
>> code:
>>
>> dput(head(YourDataFrame))
>>
>> The data frame you present could not be a data frame since you say
"hits"
>> is a factor with a variable number of elements. If each value of
"hits" was
>> a single character string, it would only have 2 factor levels not 6 and
>> your efforts to parse the string would make more sense. Transposing to
a
>> data frame would only be possible if each column was padded with NAs to
>> make them equal in length. Since your example tries use the name
TF2list,
>> it is possible that you do not have a data frame but a list and you
have no
>> factor levels, just character vectors.
>>
>> If you are not familiar with R, it may be helpful to tell us what your
>> overall goal is rather than an intermediate step. Very likely R can
easily
>> handle what you want by doing things a different way.
>>
>> ----------------------------------------
>> David L Carlson
>> Department of Anthropology
>> Texas A&M University
>> College Station, TX 77843-4352
>>
>>
>>
>> -----Original Message-----
>> From: R-help <r-help-bounces at r-project.org> On Behalf Of
Matthew
>> Sent: Tuesday, April 30, 2019 2:25 PM
>> To: r-help (r-help at r-project.org) <r-help at r-project.org>
>> Subject: [R] transpose and split dataframe
>>
>> I have a data frame that is a lot bigger but for simplicity sake we can
>> say it looks like this:
>>
>> Regulator hits
>> AT1G69490 AT4G31950,AT5G24110,AT1G26380,AT1G05675
>> AT2G55980 AT2G85403,AT4G89223
>>
>> In other words:
>>
>> data.frame : 2 obs. of 2 variables
>> $Regulator: Factor w/ 2 levels
>> $hits : Factor w/ 6 levels
>>
>> I want to transpose it so that Regulator is now the column headings
>> and each of the AGI numbers now separated by commas is a row. So,
>> AT1G69490 is now the header of the first column and AT4G31950 is row 1
>> of column 1, AT5G24110 is row 2 of column 1, etc. AT2G55980 is header
of
>> column 2 and AT2G85403 is row 1 of column 2, etc.
>>
>> I have tried playing around with strsplit(TF2list[2:2]) and
>> strsplit(as.character(TF2list[2:2]), but I am getting nowhere.
>>
>> Matthew
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> ------------------------------
>>
>> Message: 11
>> Date: Tue, 30 Apr 2019 15:03:09 -0600
>> From: David Winsemius <dwinsemius at comcast.net>
>> To: Jens Heumann <jens.heumann at students.unibe.ch>
>> Cc: r-help at r-project.org
>> Subject: Re: [R] Passing formula as parameter to `lm` within `sapply`
>> causes error [BUG?]
>> Message-ID: <924255D4-912E-4C24-8E85-6E313EC50203 at comcast.net>
>> Content-Type: text/plain; charset="utf-8"
>>
>> Try using do.call
>>
>> ?
>> David
>>
>> Sent from my iPhone
>>
>>> On Apr 30, 2019, at 9:24 AM, Jens Heumann <
>> jens.heumann at students.unibe.ch> wrote:
>>>
>>> Hi,
>>>
>>> `lm` won't take formula as a parameter when it is within a
`sapply`; see
>> example below. Please, could anyone either point me to a syntax error
or
>> confirm that this might be a bug?
>>>
>>> Best,
>>> Jens
>>>
>>> [Disclaimer: This is my first post here, following advice of how to
>> proceed with possible bugs from here:
https://www.r-project.org/bugs.html]
>>>
>>>
>>> SUMMARY
>>>
>>> While `lm` alone accepts formula parameter `FO` well, the same
within a
>> `sapply` causes an error. When putting everything as parameter but
formula
>> `FO`, it's still working, though. All parameters work fine within a
similar
>> `for` loop.
>>>
>>>
>>> MCVE (see data / R-version at bottom)
>>>
>>>> summary(lm(y ~ x, df1, df1[["z"]] == 1,
df1[["w"]]))$coef[1, ]
>>> Estimate Std. Error t value Pr(>|t|)
>>> 1.6269038 0.9042738 1.7991275 0.3229600
>>>> summary(lm(FO, data, data[[st]] == st1, data[[ws]]))$coef[1, ]
>>> Estimate Std. Error t value Pr(>|t|)
>>> 1.6269038 0.9042738 1.7991275 0.3229600
>>>> sapply(unique(df1$z), function(s)
>>> + summary(lm(y ~ x, df1, df1[["z"]] == s,
df1[[ws]]))$coef[1, ])
>>> [,1] [,2] [,3]
>>> Estimate 1.6269038 -0.1404174 -0.010338774
>>> Std. Error 0.9042738 0.4577001 1.858138516
>>> t value 1.7991275 -0.3067890 -0.005564049
>>> Pr(>|t|) 0.3229600 0.8104951 0.996457853
>>>> sapply(unique(data[[st]]), function(s)
>>> + summary(lm(FO, data, data[[st]] == s, data[[ws]]))$coef[1, ])
# !!!
>>> Error in eval(substitute(subset), data, env) : object 's'
not found
>>>> sapply(unique(data[[st]]), function(s)
>>> + summary(lm(y ~ x, data, data[[st]] == s, data[[ws]]))$coef[1,
])
>>> [,1] [,2] [,3]
>>> Estimate 1.6269038 -0.1404174 -0.010338774
>>> Std. Error 0.9042738 0.4577001 1.858138516
>>> t value 1.7991275 -0.3067890 -0.005564049
>>> Pr(>|t|) 0.3229600 0.8104951 0.996457853
>>>> m <- matrix(NA, 4, length(unique(data[[st]])))
>>>> for (s in unique(data[[st]])) {
>>> + m[, s] <- summary(lm(FO, data, data[[st]] == s,
data[[ws]]))$coef[1,
>> ]
>>> + }
>>>> m
>>> [,1] [,2] [,3]
>>> [1,] 1.6269038 -0.1404174 -0.010338774
>>> [2,] 0.9042738 0.4577001 1.858138516
>>> [3,] 1.7991275 -0.3067890 -0.005564049
>>> [4,] 0.3229600 0.8104951 0.996457853
>>>
>>> # DATA
#################################################################
>>>
>>> df1 <- structure(list(x = c(1.37095844714667,
-0.564698171396089,
>> 0.363128411337339,
>>> 0.63286260496104, 0.404268323140999, -0.106124516091484,
>> 1.51152199743894,
>>> -0.0946590384130976, 2.01842371387704), y = c(1.30824434809425,
>>> 0.740171482827397, 2.64977380403845, -0.755998096151299,
>> 0.125479556323628,
>>> -0.239445852485142, 2.14747239550901, -0.37891195982917,
>> -0.638031707027734
>>> ), z = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), w = c(0.7, 0.8,
>>> 1.2, 0.9, 1.3, 1.2, 0.8, 1, 1)), class = "data.frame",
row.names = c(NA,
>>> -9L))
>>>
>>> FO <- y ~ x; data <- df1; st <- "z"; ws <-
"w"; st1 <- 1
>>>
>>>
########################################################################
>>>
>>>> R.version
>>> _
>>> platform x86_64-w64-mingw32
>>> arch x86_64
>>> os mingw32
>>> system x86_64, mingw32
>>> status
>>> major 3
>>> minor 6.0
>>> year 2019
>>> month 04
>>> day 26
>>> svn rev 76424
>>> language R
>>> version.string R version 3.6.0 (2019-04-26)
>>> nickname Planting of a Tree
>>>
>>>
#########################################################################
>>>
>>> NOTE: Question on SO two days ago (
>>
https://stackoverflow.com/questions/55893189/passing-formula-as-parameter-to-lm-within-sapply-causes-error-bug-confirmation)
>> brought many views but neither answer nor bug confirmation.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>>
>> ------------------------------
>>
>> Message: 12
>> Date: Tue, 30 Apr 2019 17:31:28 -0400
>> From: Matthew <mccormack at molbio.mgh.harvard.edu>
>> To: "r-help at r-project.org" <r-help at r-project.org>
>> Subject: [R] Fwd: Re: transpose and split dataframe
>> Message-ID:
>> <e4a9e321-b437-eed6-344b-472319e85fec at
molbio.mgh.harvard.edu>
>> Content-Type: text/plain; charset="utf-8"
>>
>> Thanks for your reply. I was trying to simplify it a little, but must
>> have got it wrong. Here is the real dataframe, TF2list:
>>
>> str(TF2list)
>> 'data.frame': 152 obs. of 2 variables:
>> $ Regulator: Factor w/ 87 levels
"AT1G02065","AT1G13960",..: 17 6 6 54
>> 54 82 82 82 82 82 ...
>> $ hits : Factor w/ 97 levels
>>
"AT1G05675,AT3G12910,AT1G22810,AT1G14540,AT1G21120,AT1G07160,AT5G22520,AT1G56250,AT2G31345,AT5G22530,AT4G11170,A"|
>>
>> __truncated__,..: 65 57 90 57 87 57 56 91 31 17 ...
>>
>> And the first few lines resulting from dput(head(TF2list)):
>>
>> dput(head(TF2list))
>> structure(list(Regulator = structure(c(17L, 6L, 6L, 54L, 54L,
>> 82L), .Label = c("AT1G02065", "AT1G13960",
"AT1G18860", "AT1G23380",
>> "AT1G29280", "AT1G29860", "AT1G30650",
"AT1G55600", "AT1G62300",
>> "AT1G62990", "AT1G64000", "AT1G66550",
"AT1G66560", "AT1G66600",
>> "AT1G68150", "AT1G69310", "AT1G69490",
"AT1G69810", "AT1G70510", ...
>>
>> This is another way of looking at the first 4 entries (Regulator is
>> tab-separated from hits):
>>
>> Regulator
>> hits
>> 1
>> AT1G69490
>>
>>
AT4G31950,AT5G24110,AT1G26380,AT1G05675,AT3G12910,AT5G64905,AT1G22810,AT1G79680,AT3G02840,AT5G25260,AT5G57220,AT2G37430,AT2G26560,AT1G56250,AT3G23230,AT1G16420,AT1G78410,AT4G22030,AT5G05300,AT1G69930,AT4G03460,AT4G11470,AT5G25250,AT5G36925,AT2G30750,AT1G16150,AT1G02930,AT2G19190,AT4G11890,AT1G72520,AT4G31940,AT5G37490,AT5G52760,AT5G66020,AT3G57460,AT4G23220,AT3G15518,AT2G43620,AT2G02010,AT1G35210,AT5G46295,AT1G17147,AT1G11925,AT2G39200,AT1G02920,AT2G40180,AT1G59865,AT4G35180,AT4G15417,AT1G51820,AT1G06135,AT1G36622,AT5G42830
>> 2
>> AT1G29860
>>
>>
AT4G31950,AT5G24110,AT1G05675,AT3G12910,AT5G64905,AT1G22810,AT1G14540,AT1G79680,AT1G07160,AT3G23250,AT5G25260,AT1G53625,AT5G57220,AT2G37430,AT3G54150,AT1G56250,AT3G23230,AT1G16420,AT1G78410,AT4G22030,AT1G69930,AT4G03460,AT4G11470,AT5G25250,AT5G36925,AT4G14450,AT2G30750,AT1G16150,AT1G02930,AT2G19190,AT4G11890,AT1G72520,AT4G31940,AT5G37490,AT4G08555,AT5G66020,AT5G26920,AT3G57460,AT4G23220,AT3G15518,AT2G43620,AT1G35210,AT5G46295,AT1G17147,AT1G11925,AT2G39200,AT1G02920,AT4G35180,AT4G15417,AT1G51820,AT4G40020,AT1G06135
>>
>> 3
>> AT1G2986
>>
>>
AT5G64905,AT1G21120,AT1G07160,AT5G25260,AT1G53625,AT1G56250,AT2G31345,AT4G11170,AT1G66090,AT1G26410,AT3G55840,AT1G69930,AT4G03460,AT5G25250,AT5G36925,AT1G26420,AT5G42380,AT1G16150,AT2G22880,AT1G02930,AT4G11890,AT1G72520,AT5G66020,AT2G43620,AT2G44370,AT4G15975,AT1G35210,AT5G46295,AT1G11925,AT2G39200,AT1G02920,AT4G14370,AT4G35180,AT4G15417,AT2G18690,AT5G11140,AT1G06135,AT5G42830
>>
>> So, the goal would be to
>>
>> first: Transpose the existing dataframe so that the factor Regulator
>> becomes a column name (column 1 name = AT1G69490, column2 name
>> AT1G29860, etc.) and the hits associated with each Regulator become
>> rows. Hits is a comma separated 'list' ( I do not not know if
>> technically it is an R list.), so it would have to be comma
>> 'unseparated' with each entry becoming a row (col 1 row 1 =
AT4G31950,
>> col 1 row 2 - AT5G24410, etc); like this :
>>
>> AT1G69490
>> AT4G31950
>> AT5G24110
>> AT1G05675
>> AT5G64905
>>
>> ... I did not include all the rows)
>>
>> I think it would be best to actually make the first entry a separate
>> dataframe ( 1 column with name = AT1G69490 and number of rows depending
>> on the number of hits), then make the second column (column name
>> AT1G29860, and number of rows depending on the number of hits) into a
>> new dataframe and do a full join of of the two dataframes; continue by
>> making the third column (column name = AT1G2986) into a dataframe and
>> full join it with the previous; continue for the 152 observations so
>> that then end result is a dataframe with 152 columns and number of rows
>> depending on the entry with the greatest number of hits. The full joins
>> I can do with dplyr, but getting up to that point seems rather
difficult.
>>
>> This would get me what my ultimate goal would be; each Regulator is a
>> column name (152 columns) and a given row has either NA or the same
hit.
>>
>> This seems very difficult to me, but I appreciate any attempt.
>>
>> Matthew
>>
>> On 4/30/2019 4:34 PM, David L Carlson wrote:
>>> External Email - Use Caution
>>>
>>> I think we need more information. Can you give us the structure of
the
>> data with str(YourDataFrame). Alternatively you could copy a small
piece
>> into your email message by copying and pasting the results of the
following
>> code:
>>>
>>> dput(head(YourDataFrame))
>>>
>>> The data frame you present could not be a data frame since you say
>> "hits" is a factor with a variable number of elements. If
each value of
>> "hits" was a single character string, it would only have 2
factor levels
>> not 6 and your efforts to parse the string would make more sense.
>> Transposing to a data frame would only be possible if each column was
>> padded with NAs to make them equal in length. Since your example tries
use
>> the name TF2list, it is possible that you do not have a data frame but
a
>> list and you have no factor levels, just character vectors.
>>>
>>> If you are not familiar with R, it may be helpful to tell us what
your
>> overall goal is rather than an intermediate step. Very likely R can
easily
>> handle what you want by doing things a different way.
>>>
>>> ----------------------------------------
>>> David L Carlson
>>> Department of Anthropology
>>> Texas A&M University
>>> College Station, TX 77843-4352
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: R-help<r-help-bounces at r-project.org> On Behalf Of
Matthew
>>> Sent: Tuesday, April 30, 2019 2:25 PM
>>> To: r-help (r-help at r-project.org)<r-help at r-project.org>
>>> Subject: [R] transpose and split dataframe
>>>
>>> I have a data frame that is a lot bigger but for simplicity sake we
can
>>> say it looks like this:
>>>
>>> Regulator hits
>>> AT1G69490 AT4G31950,AT5G24110,AT1G26380,AT1G05675
>>> AT2G55980 AT2G85403,AT4G89223
>>>
>>> In other words:
>>>
>>> data.frame : 2 obs. of 2 variables
>>> $Regulator: Factor w/ 2 levels
>>> $hits : Factor w/ 6 levels
>>>
>>> I want to transpose it so that Regulator is now the column
headings
>>> and each of the AGI numbers now separated by commas is a row. So,
>>> AT1G69490 is now the header of the first column and AT4G31950 is
row 1
>>> of column 1, AT5G24110 is row 2 of column 1, etc. AT2G55980 is
header of
>>> column 2 and AT2G85403 is row 1 of column 2, etc.
>>>
>>> I have tried playing around with strsplit(TF2list[2:2]) and
>>> strsplit(as.character(TF2list[2:2]), but I am getting nowhere.
>>>
>>> Matthew
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guidehttp://
>> www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> [[alternative HTML version deleted]]
>>
>>
>>
>>
>> ------------------------------
>>
>> Message: 13
>> Date: Wed, 1 May 2019 07:46:32 +1000
>> From: Jim Lemon <drjimlemon at gmail.com>
>> To: Matthew <mccormack at molbio.mgh.harvard.edu>
>> Cc: "r-help (r-help at r-project.org)" <r-help at
r-project.org>
>> Subject: Re: [R] transpose and split dataframe
>> Message-ID:
>> <CA+8X3fUjv3APb>>
UcsNQAD61pmOSbvoYBFsW3caZW7p11eD7umg at mail.gmail.com>
>> Content-Type: text/plain; charset="utf-8"
>>
>> Hi Matthew,
>> Is this what you are trying to do?
>>
>> mmdf<-read.table(text="Regulator hits
>> AT1G69490 AT4G31950,AT5G24110,AT1G26380,AT1G05675
>> AT2G55980 AT2G85403,AT4G89223",header=TRUE,
>> stringsAsFactors=FALSE)
>> # split the second column at the commas
>> hitsplit<-strsplit(mmdf$hits,",")
>> # define a function that will fill with NAs
>> NAfill<-function(x,n) return(x[1:n])
>> # get the maximum length of hits
>> maxlen<-max(unlist(lapply(hitsplit,length)))
>> # fill the list with NAs
>> hitsplit<-lapply(hitsplit,NAfill,maxlen)
>> # change the names of the list
>> names(hitsplit)<-mmdf$Regulator
>> # convert to a data frame
>> tmmdf<-as.data.frame(hitsplit)
>>
>> Jim
>>
>> On Wed, May 1, 2019 at 5:25 AM Matthew <mccormack at
molbio.mgh.harvard.edu>
>> wrote:
>>>
>>> I have a data frame that is a lot bigger but for simplicity sake we
can
>>> say it looks like this:
>>>
>>> Regulator hits
>>> AT1G69490 AT4G31950,AT5G24110,AT1G26380,AT1G05675
>>> AT2G55980 AT2G85403,AT4G89223
>>>
>>> In other words:
>>>
>>> data.frame : 2 obs. of 2 variables
>>> $Regulator: Factor w/ 2 levels
>>> $hits : Factor w/ 6 levels
>>>
>>> I want to transpose it so that Regulator is now the column
headings
>>> and each of the AGI numbers now separated by commas is a row. So,
>>> AT1G69490 is now the header of the first column and AT4G31950 is
row 1
>>> of column 1, AT5G24110 is row 2 of column 1, etc. AT2G55980 is
header of
>>> column 2 and AT2G85403 is row 1 of column 2, etc.
>>>
>>> I have tried playing around with strsplit(TF2list[2:2]) and
>>> strsplit(as.character(TF2list[2:2]), but I am getting nowhere.
>>>
>>> Matthew
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>>
>> ------------------------------
>>
>> Message: 14
>> Date: Wed, 1 May 2019 09:58:34 +1200
>> From: Abs Spurdle <spurdle.a at gmail.com>
>> To: =?UTF-8?Q?Catarina_Serra_Gon=C3=A7alves?= <catarinasg at
gmail.com>
>> Cc: r-help <r-help at r-project.org>
>> Subject: Re: [R] Time series (trend over time) for irregular sampling
>> dates and multiple sites
>> Message-ID:
>> <
>> CAB8pepxHYbCXQPX5CaUQ868kMAp80z+zSXH7LHak+xDabJOjKg at
mail.gmail.com>
>> Content-Type: text/plain; charset="utf-8"
>>
>>> My data has a few problems: (1) I think I will need to fix the
effects of
>>> seasonal variation (Monthly) and (2) of possible spatial
correlation
>>> (probability of finding an item is higher after finding one since
they
>> can
>>> come from the same ship). (3) How do I handle the fact that the
>>> measurements were not taken at a regular interval?
>>
>> Can I ask two questions:
>> (1) Is the data autocorrelated (or "Seasonal") over time?
>> If not then this problem is a lot simpler.
>> (2) Can you expand on the following statement?
>> "possible spatial correlation (probability of finding an item is
higher
>> after finding one since they can come from the same ship"
>>
>> [[alternative HTML version deleted]]
>>
>>
>>
>>
>> ------------------------------
>>
>> Message: 15
>> Date: Tue, 30 Apr 2019 22:29:24 +0000
>> From: David L Carlson <dcarlson at tamu.edu>
>> To: Matthew <mccormack at molbio.mgh.harvard.edu>, "r-help
at r-project.org"
>> <r-help at r-project.org>
>> Subject: Re: [R] Fwd: Re: transpose and split dataframe
>> Message-ID: <1d59b3c0584a40c1b322b0efd5de7646 at tamu.edu>
>> Content-Type: text/plain; charset="utf-8"
>>
>> If you read the data frame with read.csv() or one of the other read()
>> functions, use the asis=TRUE argument to prevent conversion to factors.
If
>> not do the conversion first:
>>
>> # Convert factors to characters
>> DataMatrix <- sapply(TF2list, as.character)
>> # Split the vector of hits
>> DataList <- sapply(DataMatrix[, 2], strsplit, split=",")
>> # Use the values in Regulator to name the parts of the list
>> names(DataList) <- DataMatrix[,"Regulator"]
>>
>> # Now create a data frame
>> # How long is the longest list of hits?
>> mx <- max(sapply(DataList, length))
>> # Now add NAs to vectors shorter than mx
>> DataList2 <- lapply(DataList, function(x) c(x, rep(NA,
mx-length(x))))
>> # Finally convert back to a data frame
>> TF2list2 <- do.call(data.frame, DataList2)
>>
>> Try this on a portion of the list, say 25 lines and print each object
to
>> see what is happening.
>>
>> ----------------------------------------
>> David L Carlson
>> Department of Anthropology
>> Texas A&M University
>> College Station, TX 77843-4352
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: R-help <r-help-bounces at r-project.org> On Behalf Of
Matthew
>> Sent: Tuesday, April 30, 2019 4:31 PM
>> To: r-help at r-project.org
>> Subject: [R] Fwd: Re: transpose and split dataframe
>>
>> Thanks for your reply. I was trying to simplify it a little, but must
>> have got it wrong. Here is the real dataframe, TF2list:
>>
>> str(TF2list)
>> 'data.frame': 152 obs. of 2 variables:
>> $ Regulator: Factor w/ 87 levels
"AT1G02065","AT1G13960",..: 17 6 6 54
>> 54 82 82 82 82 82 ...
>> $ hits : Factor w/ 97 levels
>>
"AT1G05675,AT3G12910,AT1G22810,AT1G14540,AT1G21120,AT1G07160,AT5G22520,AT1G56250,AT2G31345,AT5G22530,AT4G11170,A"|
>>
>> __truncated__,..: 65 57 90 57 87 57 56 91 31 17 ...
>>
>> And the first few lines resulting from dput(head(TF2list)):
>>
>> dput(head(TF2list))
>> structure(list(Regulator = structure(c(17L, 6L, 6L, 54L, 54L,
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ---
> This email has been checked for viruses by AVG.
> https://www.avg.com
>
>
--
Michael
http://www.dewey.myzen.co.uk/home.html