Hi, I have a dataframe with a rather complicated descriptive column (V9):> test3[(1:3), ]V1 V4 V5 10 1 4559 7173 17 1 58954 59871 19 1 357522 358458 V9 10 ID=NM_182905.1;Name=NM_182905;Alias=FLJ00038;Note=hypothetical protein LOC375690 17 ID=NM_001005484;Alias=OR4F5;Note=olfactory receptor%2C family 4%2C subfamily F 19 ID=NM_001005224.1;Name=NM_001005224;Alias=OR4F3;Note=olfactory receptor%2C family 4%2C subfamily F>I have problems to extract two strings from this column (V9). First I need the "ID=..." and second I need the "Alias=..." both in seperate columns. I tried it with substr() but due to the different length and no wildcard allowance it did not work. Would be glad for any help! Thanks in advance. -- View this message in context: http://r.789695.n4.nabble.com/Substring-of-a-character-column-tp2313210p2313210.html Sent from the R help mailing list archive at Nabble.com.
On Wed, Aug 4, 2010 at 6:00 AM, LogLord <nils.schoof at web.de> wrote:> > Hi, > > I have a dataframe with a rather complicated descriptive column (V9): > >> test3[(1:3), ] > ? ? V1 ? ? V4 ? ? V5 > 10 1 ? 4559 ? 7173 > 17 1 ?58954 ?59871 > 19 1 357522 358458 > > V9 > 10 ID=NM_182905.1;Name=NM_182905;Alias=FLJ00038;Note=hypothetical protein > LOC375690 > 17 ID=NM_001005484;Alias=OR4F5;Note=olfactory receptor%2C family 4%2C > subfamily F > 19 ID=NM_001005224.1;Name=NM_001005224;Alias=OR4F3;Note=olfactory > receptor%2C family 4%2C subfamily F >> > > I have problems to extract two strings from this column (V9). First I need > the "ID=..." and second I need the "Alias=..." both in seperate columns. I > tried it with substr() but due to the different length and no wildcard > allowance it did not work. >A similar question was asked last month. See: http://permalink.gmane.org/gmane.comp.lang.r.general/197059 and the other posts in the same thread for other solutions.
Hi, a <- c("ID=NM_182905.1;Name=NM_182905;Alias=FLJ00038;Note=hypothetical protein + LOC375690 + ","ID=NM_001005484;Alias=OR4F5;Note=olfactory receptor%2C family 4%2C + subfamily F + ","ID=NM_001005224.1;Name=NM_001005224;Alias=OR4F3;Note=olfactory + receptor%2C family 4%2C subfamily F + ") fonction <- function(data,string) { liste <- strsplit(data,";") return(lapply(liste,function(x) grep(string,x))) } fonction(a,"ID=") fonction(a,"Alias=") HTH, Alain On 04-Aug-10 12:00, LogLord wrote:> Hi, > > I have a dataframe with a rather complicated descriptive column (V9): > >> test3[(1:3), > V1 V4 V5 > 10 1 4559 7173 > 17 1 58954 59871 > 19 1 357522 358458 > > V9 > 10 ID=NM_182905.1;Name=NM_182905;Alias=FLJ00038;Note=hypothetical protein > LOC375690 > 17 ID=NM_001005484;Alias=OR4F5;Note=olfactory receptor%2C family 4%2C > subfamily F > 19 ID=NM_001005224.1;Name=NM_001005224;Alias=OR4F3;Note=olfactory > receptor%2C family 4%2C subfamily F > I have problems to extract two strings from this column (V9). First I need > the "ID=..." and second I need the "Alias=..." both in seperate columns. I > tried it with substr() but due to the different length and no wildcard > allowance it did not work. > > Would be glad for any help! > > Thanks in advance.-- Alain Guillet Statistician and Computer Scientist SMCS - IMMAQ - Universit? catholique de Louvain Bureau c.316 Voie du Roman Pays, 20 B-1348 Louvain-la-Neuve Belgium tel: +32 10 47 30 50