Displaying 20 results from an estimated 300 matches similar to: "dealing with a messy dataset"
2017 Oct 05
0
dealing with a messy dataset
It looks like fixed width. I just used the last position of each
field to get the size and used the 'readr' package;
> input <- "And XVIII 000214.5+450520 0.69 17 9 0.00
-8.7 26.8 6.44 6.78 < 6.65 -44 0.5 MESSIER031 0.6
1.54
+ PAndAS-03 000356.4+405319 0.10 17 0.00 -3.6 27.8
4.38 2.8 MESSIER031
2017 Oct 05
3
dealing with a messy dataset
dear Jim,
Thanks for your reply and your proposition.
I forgot to provide the header of the dataframe, here it is:
================================================================================
Byte-by-byte Description of file: lvg_table2.dat
--------------------------------------------------------------------------------
Bytes Format Units Label Explanations
2017 Oct 05
0
dealing with a messy dataset
You should be able to use that header information to create the
correct parameters to the read_fwf function to read in the data.
Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.
On Thu, Oct 5, 2017 at 11:02 AM, jean-philippe
<jeanphilippe.fontaine at gssi.infn.it> wrote:
> dear Jim,
>
> Thanks
2017 Oct 05
0
dealing with a messy dataset
Is this a fixed width format?
If so, read.fwf() in base, or read_fwf() in the readr package will solve the problem. You may need to trim trailing spaces though.
B.
> On Oct 5, 2017, at 10:12 AM, jean-philippe <jeanphilippe.fontaine at gssi.infn.it> wrote:
>
> dear R-users,
>
>
> I am facing a quite regular and basic problem when it comes to dealing with datasets,
2017 Oct 05
1
dealing with a messy dataset
dear Jim,
Yes I fixed the problem. Thanks again all of you for your contribution!
This worked :
start <- c(1, 20, 35, 41, 44, 48, 53, 59, 64, 70, 76, 78, 83, 88,
+ 93, 114, 122, 127)
data1<-read_fwf("lvg_table2.txt",skip=70, fwf_widths(diff(start)))
Well now I know how to deal with fixed-width files :)
Cheers
Jean-Philippe
On 05/10/2017 18:42, jim
2011 May 04
2
first occurrence of a value?
Hello,
A simple question perhaps, but how do I, within each row, find the first
occurence of the number 1 in the df below? I want to use this position to
programmatically create the variable 'year'. I'v come up with a solution, but I
find it downright ugly. Is there a simpler way? I was hoping for a useful
built-in function that I don;t yet know about.
df <-
2004 Sep 27
1
Peer Review - Linuxfest Presentation Outline
Hello all,
I've been invited to do a presentation on Asterisk for the Ohio
Linuxfest in Columbus this weekend (http://www.ohiolinux.org). Rough
estimates are that nearly 500 people will be attending. I've been working
on an outline for a couple of weeks and I would like to have some peer
review of the information presented.
I am going to have to cut down the content to make it fit in
2010 Jul 13
6
permutation-based FDR
Hola a todos,
Tengo un pequeño problemilla...
Tengo unas 9000 variables que he contrastado con 1 en concreto con el test
de wilcoxon. He calculado el p-valor, y queria corregirlo con el
permutation-based FDR. He encontrado una funcion con R comp.fdr()que hace
esta corrección, pero te pide que le pongas las variables con las
observaciones y te hace el test (según he entendido). Yo solo quiero
2019 Aug 29
2
Feature request: non-dropping regmatches/strextract
Thank you! I greatly appreciate your consideration, though of course it is up to you. I think many people switch to stringr/stringi simply because functions in those packages have some consistent design choices, for example, they do not drop empty/missing matches, which facilitates array-based programming. For example, in the cases where one needs to make a new column in a data.frame (data.table,
2015 Oct 07
2
Buildbot Noise
On 7 October 2015 at 22:44, Eric Christopher <echristo at gmail.com> wrote:
> I think this is a poor analogy. You're also ignoring the solution I gave you
> in my previous mail for slow bots.
I'm not ignoring it, I'm acting upon it. But it takes time. I don't
have infinite resources.
> If you can't give some basic stability guarantees then the bot
> is only
2013 Nov 08
3
Date handling in R is hard to understand
Dear All,
I usually work with time series data. The data may come in AM/PM date
format or on 24 hour time basis. R can not recognize the two differences
automatically - at least for me. I have to specifically tell R in which
time format the data is. It seems that Pandas knows how to handle date
without being told the format. The problem arises when I try to shift time
by a certain time. Say
2017 Nov 21
3
Best way to study internals of R ( mix of C, C++, Fortran, and R itself)?
How difficult is it to get a good feel for the internals of R, if you want
to learn the general code base, but also the CPU intensive stuff ( much of
it in C or Fortran?) and the ways in which the general code and the CPU
intensive stuff is connected together?
R has a very large audience, but my understanding is that only a small
group have a good understanding of the internals (and some of those
2019 Aug 29
2
Feature request: non-dropping regmatches/strextract
Thank you, I am aware that there are packages that can accomplish this. I mentioned stringr::str_extract as a function that does not drop empty matches. I think that the behavior of regmatches(..., regexpr(...))?in base R should permit an option to prevent dropping of empty matches both for sake of consistency with the rest of the language (missing data does not yield a dropped index in other
2014 Dec 11
2
[LLVMdev] Debugging on unavailable hardware
Hi Renato,
Thank you very much for the directions, I am going to recommit my fix.
What are hardware used in buildbots? Are these common boards like
PandaBoard or some thing special? What is RAM installed?
Thanks,
--Serge
2014-12-11 2:36 GMT+06:00 Renato Golin <renato.golin at linaro.org>:
> On 10 December 2014 at 19:06, Serge Pavlov <sepavloff at gmail.com> wrote:
> > In
2019 Aug 15
1
Feature request: non-dropping regmatches/strextract
Using a non-capturing group, "(?:...)" instead of "(...)", simplifies my
example a bit
> x <- c("Groucho <groucho at marx.com>", "<chico at marx.com>", "Harpo")
> strcapture("([[:alpha:]]+)?(?: *<([[:alpha:]. ]+@[[:alpha:]. ]+)>)?", x,
proto=data.frame(Name=character(), Address=character(),
2008 Jun 05
2
Nouveau on GeForce 7950GTX (NV49)
Hi everyone,
I just finished installing the current git of the nouveau driver for
my NV49 based 7950GTX. I attempted this because I've been continually
frustrated by the constantly degrading 2D performance of X under the
nvidia and nv drivers. Webpages that do any kind of compositing in
Mozilla will burn CPU all day long.
I'm happy to say that I'm writing because I didn't have
2015 Jun 01
2
Mi script R es muy lento
Hola Carlos,
bueno la verdad es que mi pregunta era algo general, cuando no has usado
data.table no parece muy intuitivo pasar de la forma de programar a la que
estás más acostumbrado (bucles, notación matricial...) a esa otra. Aun no
tengo un cálculo complejo concreto pero lo tendré que hacer... solo quería
saber si se puede, y parece que sí, asi que será cuestión de empaparse un
poco de
2024 Jul 15
2
reticulate + virtual environments
Hi,
I am using reticulate and a virtual environment (not conda) to run
Python scripts from RStudio. However, when I try to use my own
(existing) virtual environment, reticulate does not use it. If I run my
scripts, the installed modules (e.g., py_install("pandas",
"mmstat4.hu.data")) are not found. I believe this happens because
reticulate is using r-reticulate instead of
2024 Jul 15
1
reticulate + virtual environments
Have you tried https://rstudio.github.io/reticulate/ ?
Generally speaking, complex nonstandard package specific questions
such as yours rarely get a reply here -- there are 20,000+ packages
(and counting) after all! As reticulate was created by and integrated
with RStudio/Posit, I would think their site and help resources might
be a better venue. Of course, if you don't use RStudio, you may
2017 Nov 21
0
Best way to study internals of R ( mix of C, C++, Fortran, and R itself)?
1) What is easy for one person may be very hard for another, so your question is really unanswerable. You do need to know C and Fortran to get through the source code. Get started soon reading the R Internals document if it sounds interesting to you... you are bound to learn something even if you don't stick with it. If you have questions about the internals though, you should read the Posting