thr3ads.net - similar to: "gsub does not support \b?"

Displaying 20 results from an estimated 20000 matches similar to: "gsub does not support \b?"

gsub/strsplit with multiple patterns/splits

2012 May 30

gsub/strsplit with multiple patterns/splits

Hi, I have a vector like this: DF <- c("Aetna, Inc.", "Alexander's Inc.", "Allegheny Energy, Inc") For each element in the vector I would like to remove the "incorporated" info, so that my vector looks like this: DF <- c("Aetna", "Alexander's", "Allegheny Energy") That means that I have to strip: strip <-

data frame select max group by like function

2010 Mar 09

data frame select max group by like function

Hi, I have a data frame with 3 columns: ID, year and score. How can I select for each unique ID, the year that has the max score? For example, for data frame ID, year, score tom, 1995, 88 rick, 1994, 90 mary, 2000, 97 tom, 1998, 60 mary, 1998,100 I shall have ID, year, score tom, 1995, 88 rick, 1994, 90 mary, 1998,100 Thanks, Richard [[alternative HTML version deleted]]

toupper does not work in sub + regex

2009 Apr 13

toupper does not work in sub + regex

Hi, I don't know what I am doing wrong to the toupper does not seem working in sub + regex. The following returns 's' not the upper class 'S' as I expect: sub("q_([a-z])[a-zA-Z]*",toupper('\\1'),"q_sviRaw") Can someone tell me where I did wrong? Thanks, Richard [[alternative HTML version deleted]]

get top 50 correlated item from a correlation matrix for each item

2009 Feb 12

get top 50 correlated item from a correlation matrix for each item

Hi, I have a correlation matrix of about 3000 items, i.e., a 3000*3000 matrix. For each of the 3000 items, I want to get the top 50 items that have the highest correlation with it (excluding itself) and generate a data frame with 3 columns like ("ID", "ID2", "cor"), where ID is those 3000 items each repeat 50 times, and ID2 is the top 50 correlated items with ID,

aggregate text column by a few rows

2010 Oct 07

aggregate text column by a few rows

Hi, R function aggregate can only take summary stats functions, can I aggregate text columns? For example, for the dataframe below, > a <- rbind(data.frame(id=1, name='Tom', hobby='fishing'),data.frame(id=1, name='Tom', hobby='reading'),data.frame(id=2, name='Mary', hobby='reading'),data.frame(id=3, name='John',

get top n rows group by a column from a dataframe

2010 Sep 16

get top n rows group by a column from a dataframe

Hi, is there an R function like sql's TOP key word? I have a dataframe that has 3 columns: company, person, salary How do I get top 5 highest paid person for each company, and if I have fewer than 5 people for a company, just return all of them? Thanks, Richard [[alternative HTML version deleted]]

Regex question to find a string that contains 5-9 alpha-numeric characters, at least one of which is a number

2009 Jun 08

Regex question to find a string that contains 5-9 alpha-numeric characters, at least one of which is a number

Hi, This is not exactly an R question but I am trying to use gsub to replace a string that contains 5-9 alpha-numeric characters, at least one of which is a number. Is there a good way to write it in a one line regex? Thanks, Richard

combinatorial programming problem

2006 May 26

combinatorial programming problem

Hola! I am programming a class (S3) "symarray" for storing the results of functions symmetric in its k arguments. Intended use is for association indices for more than two variables, for instance coresistivity against antibiotics. There is one programming problem I haven't solved, making an inverse of the index function indx() --- se code below. It could for instance return the

Extract cell of many values from dataframe cells and sample from them.

2012 Nov 08

Extract cell of many values from dataframe cells and sample from them.

Hi, First my apologies for a non-working piece of code in a previous submission, I have corrected this error. I'm doing is individual based modelling of a pathogen and it's host. The way I've thought of doing this is with two dataframes, one of the pathogen and it's genes and effector genes, and one of the host and it's resistance genes. During the simulation, these things

Nearest Neighbor Algorithm in R -- again.

2004 Feb 02

Nearest Neighbor Algorithm in R -- again.

Several of the methods I use for analyzing large data sets, such as WinGamma: determining the level of noise in data Relief-F: estimating the influence of variables depend on finding the k nearest neighbors of a point in a data frame or matrix efficiently. (For large data sets it is not feasible to compute the 'dist' matrix anyway.) Seeing the proposed solution to "[R] distance

gsub and regex to tidy comma-limited values

2009 Mar 14

gsub and regex to tidy comma-limited values

I am cleaning up comma-limited values, so that only one comma separates each value. Using the example below, as much as I try with regex, I can't remove the last comma. I hope to have a one-liner solution, if possible. gsub("^,*|,*$|(,)*", "\\1", ",,,apple,,orange,,,,,lemon,strawberry,,,,") [1] "apple,orange,lemon,strawberry,"

multiple plots in same graph window

2009 Apr 21

multiple plots in same graph window

Hi, I'm trying to make multiple plots in a same graph window in R. The multiple graphs are showing up in the right positions on the window, but I'm having the problem that the graphic window is being refreshed every time a new plot is drawn, so that I end up with only the last graph coming up; the previous ones are all erased If I try to print in a .eps file directly, then I end up

aggregate a Date column does not work?

2010 Nov 22

aggregate a Date column does not work?

Hi, I am trying to aggregate max a Date type column but have weird result, how do I fix this? > a <- rbind( + data.frame(name='Tom', payday=as.Date('1999-01-01')), + data.frame(name='Tom', payday=as.Date('2000-01-01')), + data.frame(name='Pete', payday=as.Date('1998-01-01')), + data.frame(name='Pete',

pattern in history

2006 Apr 11

pattern in history

Hi, Sometimes I need to consult the history of commands that are matching a regex, so I modified the utils::history function for that purpose. I found it useful. I append the code ( I only added the two lines with #**) Romain. history2 <- function (pattern="", max.show = 25, reverse = FALSE, unique = pattern!="", ...) { file1 <- tempfile("Rrawhist")

sample from list

2012 Nov 06

sample from list

Hi all, I have a list of genes present in 500 individuals, the individuals are the elements: Genes <- lapply(1:nrow(inds),function(x) sample(1:10000,inds$No_of_Genes,replace=TRUE)) (This was later written to a dataframe as well as kept as the list object: inds2 <- data.frame(inds,Genes=I(Genes))) I also have a vector of how many of those genes are expressed in the individuals, this can

Efficiency Question - Nested lapply or nested for loop

2010 Oct 08

Efficiency Question - Nested lapply or nested for loop

My data looks like this: > data name G_hat_0_0 G_hat_1_0 G_hat_2_0 G_0 G_hat_0_1 G_hat_1_1 G_hat_2_1 G_1 1 rs0 0.488000 0.448625 0.063375 1 0.480875 0.454500 0.064625 1 2 rs1 0.002375 0.955375 0.042250 1 0.000000 0.062875 0.937125 2 3 rs2 0.050375 0.835875 0.113750 1 0.877250 0.115875 0.006875 0 4 rs3 0.000000 0.074750 0.925250 2 0.897750 0.102000

followup -- deficiencies in readline capability

2002 Apr 30

followup -- deficiencies in readline capability

Why would R lack history capability? Someone in a private electronic mail message suggested the possibility that I was running R in a non-writable directory. This is not the case, as the following logfile shows (where "$ " is my shell prompt): $ ls -ld `pwd` drwxrwxrwx 15 sys sys 2560 Apr 30 08:10 /tmp $ R --vanilla R : Copyright 2002, The R Development Core Team

search for string insider a string

2009 Mar 13

search for string insider a string

Hi, sorry if it is a too stupid question, but how do I a string search in R: I have a dataframe A with A$test like: test1 bcdtestblabla2.1bla cdtestblablabla3.88blabla and I want to search for string that start with 'dtest' and ends with number and return the location of that substring and the number, so the end result would be: NA NA 3 2.1 2 3.88 I find grep can

arithmetic problem

2009 May 30

arithmetic problem

Hello list I have a problem with a dataset (see toy example below) where I am trying to find the difference between two (or more numbers) and discard those observations which fall outside a set interval. An example and further explanation: values ind 1 2655 7A5 2 3028 7A5 3 689 ABBA-1 4 1336 ABBA-1 5 1560 ABBA-1 6 2820 ABLIM1 7 3339 ABLIM1 8

Hmisc label function applied to data frame

2010 Dec 02

Hmisc label function applied to data frame

Hello, I'm attempting to create a data frame with correlations between every pair of variables in a data frame, so that I can then sort by the value of the correlation coefficient and see which pairs of variables are most strongly correlated. The sm2vec function in the corpcor library works very nicely as shown here: library(Hmisc) library(corpcor) # Create example data x1 = runif(50) x2 =

similar to: gsub does not support \b?