roslinazairimah zakaria
2017-Mar-15 13:49 UTC
[R] Extract student ID that match certain criteria
Hi Caitlin, I tried so many ways as suggested but unsuccessful...and I realise that I need to filter the student ID and their CGPA, but if I change the ID into character I lost the CGPA value. It is easy to do in excel, however a bit time consuming and trying to do in R. I have these data: dput(dt_all2) structure(list(FAC_CODE = structure(c(2L, 2L, 2L, 4L, 1L, 1L, 4L, 7L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 1L, 2L, 5L, 6L), .Label = c("FKASA", "FKEE", "FKKSA", "FKM", "FKP", "FSKKP", "FTK"), class = "factor"), STUDENT_ID = structure(c(9L, 6L, 7L, 17L, 2L, 3L, 18L, 19L, 13L, 12L, 14L, 15L, 16L, 10L, 8L, 1L, 5L, 11L, 4L), .Label c("AA14068", "AB15103", "AB15124", "CC14107", "EA13043", "EB14059", "EB14073", "EB14101", "EC14021", "EC15063", "FB14085", "KA13142", "KA13143", "KA13156", "KE13034", "KE13046", "MA14071", "MA14115", "PA13048" ), class = "factor"), PROGRAM = structure(c(2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L ), .Label = c("DIPLOMA", "IJAZAH SARJANA MUDA"), class = "factor"), CGPA = c(2.42, 3.27, 1.98, 2.85, 2.24, 3.01, 3.31, 2.88, 3.61, 3.69, 3.2, 3.85, 3.63, 2.67, 2.35, 2.74, 1.96, 2.89, 2.59)), .Names = c("FAC_CODE", "STUDENT_ID", "PROGRAM", "CGPA" ), class = "data.frame", row.names = c(NA, -19L)) and I want to filter my data as follows:> dput(dt_all3)structure(list(FAC_CODE = structure(c(2L, 2L, 4L, 4L, 5L, 1L, 6L, 3L, 3L, 3L, 3L, 3L, 2L), .Label = c("FKASA", "FKEE", "FKKSA", "FKM", "FKP", "FTK"), class = "factor"), STUDENT_ID = structure(c(4L, 3L, 11L, 12L, 5L, 1L, 13L, 7L, 6L, 8L, 9L, 10L, 2L), .Label = c("AA14068", "EA13043", "EC14021", "EC15063", "FB14085", "KA13142", "KA13143", "KA13156", "KE13034", "KE13046", "MA14071", "MA14115", "PA13048" ), class = "factor"), PROGRAM = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "IJAZAH SARJANA MUDA", class "factor"), CGPA = c(2.67, 2.42, 2.85, 3.31, 2.89, 2.74, 2.88, 3.61, 3.69, 3.2, 3.85, 3.63, 1.96)), .Names = c("FAC_CODE", "STUDENT_ID", "PROGRAM", "CGPA"), class = "data.frame", row.names = c(NA, -13L )) I would like to select the student id where the third and fourth value represent the year they register data is eg. AA15..., AE14,... and I would also to select their cgpa value. Thank you. On Mon, Mar 13, 2017 at 2:26 PM, roslinazairimah zakaria < roslinaump at gmail.com> wrote:> Thank you so much for your help. > > On Mon, Mar 13, 2017 at 1:52 PM, bioprogrammer <bioprogrammer at gmail.com> > wrote: > >> Hi. >> >> I would use the "substr" function: >> >> https://stat.ethz.ch/R-manual/R-devel/library/base/html/substr.html >> >> ...assuming you're working with character data. >> >> Another useful skill involves working with regular expressions. >> >> http://www.endmemo.com/program/R/grep.php >> >> http://regular-expressions.mobi/tutorial.html >> >> Hope these help :) >> >> ~Caitlin >> >> >> >> >> >> Sent from my T-Mobile 4G LTE Device >> >> >> -------- Original message -------- >> From: roslinazairimah zakaria <roslinaump at gmail.com> >> Date:03/12/2017 10:18 PM (GMT-07:00) >> To: Bert Gunter <bgunter.4567 at gmail.com> >> Cc: r-help mailing list <r-help at r-project.org> >> Subject: Re: [R] Extract student ID that match certain criteria >> >> Another question, >> >> How do I extract ID based on the third and fourth letter: >> >> I have for example, AA14004, AB15035, CB14024, PA14009, PA14009 etc >> >> I would like to extract ID no. of AB14..., CB14..., PA14... >> >> On Mon, Mar 13, 2017 at 12:37 PM, roslinazairimah zakaria < >> roslinaump at gmail.com> wrote: >> >> > Hi Bert, >> > >> > Thank you so much for your help. However I don't really sure what is >> the >> > use of y values. Can we do without it? >> > >> > x <- as.character(FKASA$STUDENT_ID) >> > y <- c(1,786) >> > My.Data <- data.frame (x,y) >> > >> > My.Data[grep("^AA14", My.Data$x), ] >> > >> > I got the following data: >> > >> > x y >> > 1 AA14068 1 >> > 7 AA14090 1 >> > 11 AA14099 1 >> > 14 AA14012 786 >> > 15 AA14039 1 >> > 22 AA14251 786 >> > >> > On Mon, Mar 13, 2017 at 11:51 AM, Bert Gunter <bgunter.4567 at gmail.com> >> > wrote: >> > >> >> 1. Your code is incorrect. All entries are character strings and must >> be >> >> quoted. >> >> >> >> 2. See ?grep and note in particular (in the "Value" section): >> >> >> >> "grep(value = TRUE) returns a character vector containing the selected >> >> elements of x (after coercion, preserving names but no other >> >> attributes)." >> >> >> >> >> >> 3. While the fixed = TRUE option will work here, you may wish to learn >> >> about "regular expressions", which can come in very handy for >> >> character string manipulation tasks. ?regex in R has a terse, but I >> >> have found comprehensible, discussion. There are many good gentler >> >> tutorials on the web, also. >> >> >> >> >> >> Cheers, >> >> Bert >> >> >> >> Bert Gunter >> >> >> >> "The trouble with having an open mind is that people keep coming along >> >> and sticking things into it." >> >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> >> >> >> >> On Sun, Mar 12, 2017 at 8:32 PM, roslinazairimah zakaria >> >> <roslinaump at gmail.com> wrote: >> >> > Dear r-users, >> >> > >> >> > I have this list of student ID, >> >> > >> >> > dt <- c(AA14068, AA13194, AE11054, AA12251, AA13228, AA13286, >> AA14090, >> >> > AA13256, AA13260, AA13291, AA14099, AA15071, AA13143, AA14012, >> AA14039, >> >> > AA15018, AA13234, AA13149, AA13282, AA13218) >> >> > >> >> > and I would like to extract all student of ID AA14... only. >> >> > >> >> > I search and tried substrt, subset and select but it fail. >> >> > >> >> > substr(FKASA$STUDENT_ID, 2, nchar(string1)) >> >> > Error in nchar(string1) : 'nchar()' requires a character vector >> >> >> subset(FKASA, STUDENT_ID=="AA14" ) >> >> > [1] FAC_CODE FACULTY STUDENT_ID NAME PROGRAM >> KURSUS >> >> > CGPA ACT_SS ACT_VAL ACT_CS ACT_LED ACT_PS >> >> > ACT_IM >> >> > [14] ACT_ENT ACT_CRE ACT_UNI ACT_VOL... >> >> > >> >> > Thank you so much for your help. >> >> > >> >> > How do I do it? >> >> > -- >> >> > *Roslinazairimah Zakaria* >> >> > *Tel: +609-5492370 <+60%209-549%202370>; Fax. No.+609-5492766 >> <+60%209-549%202766>* >> >> > >> >> > *Email: roslinazairimah at ump.edu.my <roslinazairimah at ump.edu.my>; >> >> > roslinaump at gmail.com <roslinaump at gmail.com>* >> >> > Faculty of Industrial Sciences & Technology >> >> > University Malaysia Pahang >> >> > Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia >> >> > >> >> > [[alternative HTML version deleted]] >> >> > >> >> > ______________________________________________ >> >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >> > https://stat.ethz.ch/mailman/listinfo/r-help >> >> > PLEASE do read the posting guide http://www.R-project.org/posti >> >> ng-guide.html >> >> > and provide commented, minimal, self-contained, reproducible code. >> >> >> > >> > >> > >> > -- >> > *Roslinazairimah Zakaria* >> > *Tel: +609-5492370 <+60%209-549%202370> <+60%209-549%202370>; Fax. No. >> +609-5492766 <+60%209-549%202766> >> > <+60%209-549%202766>* >> > >> > *Email: roslinazairimah at ump.edu.my <roslinazairimah at ump.edu.my>; >> > roslinaump at gmail.com <roslinaump at gmail.com>* >> > Faculty of Industrial Sciences & Technology >> > University Malaysia Pahang >> > Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia >> > >> >> >> >> -- >> *Roslinazairimah Zakaria* >> *Tel: +609-5492370 <+60%209-549%202370>; Fax. No.+609-5492766 >> <+60%209-549%202766>* >> >> *Email: roslinazairimah at ump.edu.my <roslinazairimah at ump.edu.my>; >> roslinaump at gmail.com <roslinaump at gmail.com>* >> Faculty of Industrial Sciences & Technology >> University Malaysia Pahang >> Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posti >> ng-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > *Roslinazairimah Zakaria* > *Tel: +609-5492370 <+60%209-549%202370>; Fax. No.+609-5492766 > <+60%209-549%202766>* > > *Email: roslinazairimah at ump.edu.my <roslinazairimah at ump.edu.my>; > roslinaump at gmail.com <roslinaump at gmail.com>* > Faculty of Industrial Sciences & Technology > University Malaysia Pahang > Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia >-- *Roslinazairimah Zakaria* *Tel: +609-5492370; Fax. No.+609-5492766* *Email: roslinazairimah at ump.edu.my <roslinazairimah at ump.edu.my>; roslinaump at gmail.com <roslinaump at gmail.com>* Faculty of Industrial Sciences & Technology University Malaysia Pahang Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia [[alternative HTML version deleted]]
Hello, I believe your request is a bit confusing since you say you want to filter the student id but then you have many years in dt_all3 and only one program ("IJAZAH SARJANA MUDA"). So I've written two simple functions, one to filter by year and the other by program. fun1 <- function(x, year){ inx <- substr(x[["STUDENT_ID"]], 3, 4) == as.character(year) x[inx, ] } fun2 <- function(x, program){ inx <- x[["PROGRAM"]] == program x[inx, ] } fun1(dt_all2, 14) # filter by year = 14 fun2(dt_all2, "IJAZAH SARJANA MUDA") Hope this helps, Rui Barradas Em 15-03-2017 13:49, roslinazairimah zakaria escreveu:> Hi Caitlin, > > I tried so many ways as suggested but unsuccessful...and I realise that I > need to filter the student ID and their CGPA, but if I change the ID into > character I lost the CGPA value. It is easy to do in excel, however a bit > time consuming and trying to do in R. > > I have these data: > > dput(dt_all2) > structure(list(FAC_CODE = structure(c(2L, 2L, 2L, 4L, 1L, 1L, > 4L, 7L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 1L, 2L, 5L, 6L), .Label = c("FKASA", > "FKEE", "FKKSA", "FKM", "FKP", "FSKKP", "FTK"), class = "factor"), > STUDENT_ID = structure(c(9L, 6L, 7L, 17L, 2L, 3L, 18L, 19L, > 13L, 12L, 14L, 15L, 16L, 10L, 8L, 1L, 5L, 11L, 4L), .Label > c("AA14068", > "AB15103", "AB15124", "CC14107", "EA13043", "EB14059", "EB14073", > "EB14101", "EC14021", "EC15063", "FB14085", "KA13142", "KA13143", > "KA13156", "KE13034", "KE13046", "MA14071", "MA14115", "PA13048" > ), class = "factor"), PROGRAM = structure(c(2L, 1L, 1L, 2L, > 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L > ), .Label = c("DIPLOMA", "IJAZAH SARJANA MUDA"), class = "factor"), > CGPA = c(2.42, 3.27, 1.98, 2.85, 2.24, 3.01, 3.31, 2.88, > 3.61, 3.69, 3.2, 3.85, 3.63, 2.67, 2.35, 2.74, 1.96, 2.89, > 2.59)), .Names = c("FAC_CODE", "STUDENT_ID", "PROGRAM", "CGPA" > ), class = "data.frame", row.names = c(NA, -19L)) > > and I want to filter my data as follows: > >> dput(dt_all3) > structure(list(FAC_CODE = structure(c(2L, 2L, 4L, 4L, 5L, 1L, > 6L, 3L, 3L, 3L, 3L, 3L, 2L), .Label = c("FKASA", "FKEE", "FKKSA", > "FKM", "FKP", "FTK"), class = "factor"), STUDENT_ID = structure(c(4L, > 3L, 11L, 12L, 5L, 1L, 13L, 7L, 6L, 8L, 9L, 10L, 2L), .Label = c("AA14068", > "EA13043", "EC14021", "EC15063", "FB14085", "KA13142", "KA13143", > "KA13156", "KE13034", "KE13046", "MA14071", "MA14115", "PA13048" > ), class = "factor"), PROGRAM = structure(c(1L, 1L, 1L, 1L, 1L, > 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "IJAZAH SARJANA MUDA", class > "factor"), > CGPA = c(2.67, 2.42, 2.85, 3.31, 2.89, 2.74, 2.88, 3.61, > 3.69, 3.2, 3.85, 3.63, 1.96)), .Names = c("FAC_CODE", "STUDENT_ID", > "PROGRAM", "CGPA"), class = "data.frame", row.names = c(NA, -13L > )) > > I would like to select the student id where the third and fourth value > represent the year they register data is eg. AA15..., AE14,... and I would > also to select their cgpa value. > > Thank you. > > On Mon, Mar 13, 2017 at 2:26 PM, roslinazairimah zakaria < > roslinaump at gmail.com> wrote: > >> Thank you so much for your help. >> >> On Mon, Mar 13, 2017 at 1:52 PM, bioprogrammer <bioprogrammer at gmail.com> >> wrote: >> >>> Hi. >>> >>> I would use the "substr" function: >>> >>> https://stat.ethz.ch/R-manual/R-devel/library/base/html/substr.html >>> >>> ...assuming you're working with character data. >>> >>> Another useful skill involves working with regular expressions. >>> >>> http://www.endmemo.com/program/R/grep.php >>> >>> http://regular-expressions.mobi/tutorial.html >>> >>> Hope these help :) >>> >>> ~Caitlin >>> >>> >>> >>> >>> >>> Sent from my T-Mobile 4G LTE Device >>> >>> >>> -------- Original message -------- >>> From: roslinazairimah zakaria <roslinaump at gmail.com> >>> Date:03/12/2017 10:18 PM (GMT-07:00) >>> To: Bert Gunter <bgunter.4567 at gmail.com> >>> Cc: r-help mailing list <r-help at r-project.org> >>> Subject: Re: [R] Extract student ID that match certain criteria >>> >>> Another question, >>> >>> How do I extract ID based on the third and fourth letter: >>> >>> I have for example, AA14004, AB15035, CB14024, PA14009, PA14009 etc >>> >>> I would like to extract ID no. of AB14..., CB14..., PA14... >>> >>> On Mon, Mar 13, 2017 at 12:37 PM, roslinazairimah zakaria < >>> roslinaump at gmail.com> wrote: >>> >>>> Hi Bert, >>>> >>>> Thank you so much for your help. However I don't really sure what is >>> the >>>> use of y values. Can we do without it? >>>> >>>> x <- as.character(FKASA$STUDENT_ID) >>>> y <- c(1,786) >>>> My.Data <- data.frame (x,y) >>>> >>>> My.Data[grep("^AA14", My.Data$x), ] >>>> >>>> I got the following data: >>>> >>>> x y >>>> 1 AA14068 1 >>>> 7 AA14090 1 >>>> 11 AA14099 1 >>>> 14 AA14012 786 >>>> 15 AA14039 1 >>>> 22 AA14251 786 >>>> >>>> On Mon, Mar 13, 2017 at 11:51 AM, Bert Gunter <bgunter.4567 at gmail.com> >>>> wrote: >>>> >>>>> 1. Your code is incorrect. All entries are character strings and must >>> be >>>>> quoted. >>>>> >>>>> 2. See ?grep and note in particular (in the "Value" section): >>>>> >>>>> "grep(value = TRUE) returns a character vector containing the selected >>>>> elements of x (after coercion, preserving names but no other >>>>> attributes)." >>>>> >>>>> >>>>> 3. While the fixed = TRUE option will work here, you may wish to learn >>>>> about "regular expressions", which can come in very handy for >>>>> character string manipulation tasks. ?regex in R has a terse, but I >>>>> have found comprehensible, discussion. There are many good gentler >>>>> tutorials on the web, also. >>>>> >>>>> >>>>> Cheers, >>>>> Bert >>>>> >>>>> Bert Gunter >>>>> >>>>> "The trouble with having an open mind is that people keep coming along >>>>> and sticking things into it." >>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >>>>> >>>>> >>>>> On Sun, Mar 12, 2017 at 8:32 PM, roslinazairimah zakaria >>>>> <roslinaump at gmail.com> wrote: >>>>>> Dear r-users, >>>>>> >>>>>> I have this list of student ID, >>>>>> >>>>>> dt <- c(AA14068, AA13194, AE11054, AA12251, AA13228, AA13286, >>> AA14090, >>>>>> AA13256, AA13260, AA13291, AA14099, AA15071, AA13143, AA14012, >>> AA14039, >>>>>> AA15018, AA13234, AA13149, AA13282, AA13218) >>>>>> >>>>>> and I would like to extract all student of ID AA14... only. >>>>>> >>>>>> I search and tried substrt, subset and select but it fail. >>>>>> >>>>>> substr(FKASA$STUDENT_ID, 2, nchar(string1)) >>>>>> Error in nchar(string1) : 'nchar()' requires a character vector >>>>>>> subset(FKASA, STUDENT_ID=="AA14" ) >>>>>> [1] FAC_CODE FACULTY STUDENT_ID NAME PROGRAM >>> KURSUS >>>>>> CGPA ACT_SS ACT_VAL ACT_CS ACT_LED ACT_PS >>>>>> ACT_IM >>>>>> [14] ACT_ENT ACT_CRE ACT_UNI ACT_VOL... >>>>>> >>>>>> Thank you so much for your help. >>>>>> >>>>>> How do I do it? >>>>>> -- >>>>>> *Roslinazairimah Zakaria* >>>>>> *Tel: +609-5492370 <+60%209-549%202370>; Fax. No.+609-5492766 >>> <+60%209-549%202766>* >>>>>> >>>>>> *Email: roslinazairimah at ump.edu.my <roslinazairimah at ump.edu.my>; >>>>>> roslinaump at gmail.com <roslinaump at gmail.com>* >>>>>> Faculty of Industrial Sciences & Technology >>>>>> University Malaysia Pahang >>>>>> Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia >>>>>> >>>>>> [[alternative HTML version deleted]] >>>>>> >>>>>> ______________________________________________ >>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>> PLEASE do read the posting guide http://www.R-project.org/posti >>>>> ng-guide.html >>>>>> and provide commented, minimal, self-contained, reproducible code. >>>>> >>>> >>>> >>>> >>>> -- >>>> *Roslinazairimah Zakaria* >>>> *Tel: +609-5492370 <+60%209-549%202370> <+60%209-549%202370>; Fax. No. >>> +609-5492766 <+60%209-549%202766> >>>> <+60%209-549%202766>* >>>> >>>> *Email: roslinazairimah at ump.edu.my <roslinazairimah at ump.edu.my>; >>>> roslinaump at gmail.com <roslinaump at gmail.com>* >>>> Faculty of Industrial Sciences & Technology >>>> University Malaysia Pahang >>>> Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia >>>> >>> >>> >>> >>> -- >>> *Roslinazairimah Zakaria* >>> *Tel: +609-5492370 <+60%209-549%202370>; Fax. No.+609-5492766 >>> <+60%209-549%202766>* >>> >>> *Email: roslinazairimah at ump.edu.my <roslinazairimah at ump.edu.my>; >>> roslinaump at gmail.com <roslinaump at gmail.com>* >>> Faculty of Industrial Sciences & Technology >>> University Malaysia Pahang >>> Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posti >>> ng-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> >> >> -- >> *Roslinazairimah Zakaria* >> *Tel: +609-5492370 <+60%209-549%202370>; Fax. No.+609-5492766 >> <+60%209-549%202766>* >> >> *Email: roslinazairimah at ump.edu.my <roslinazairimah at ump.edu.my>; >> roslinaump at gmail.com <roslinaump at gmail.com>* >> Faculty of Industrial Sciences & Technology >> University Malaysia Pahang >> Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia >> > > >
roslinazairimah zakaria
2017-Mar-15 22:34 UTC
[R] Extract student ID that match certain criteria
Hi Rui, Both functions work beautifully. I really appreciate your help and others very much. Thank you On Wed, Mar 15, 2017 at 10:46 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:> Hello, > > I believe your request is a bit confusing since you say you want to filter > the student id but then you have many years in dt_all3 and only one program > ("IJAZAH SARJANA MUDA"). So I've written two simple functions, one to > filter by year and the other by program. > > > fun1 <- function(x, year){ > inx <- substr(x[["STUDENT_ID"]], 3, 4) == as.character(year) > x[inx, ] > } > > fun2 <- function(x, program){ > inx <- x[["PROGRAM"]] == program > x[inx, ] > } > > fun1(dt_all2, 14) # filter by year = 14 > fun2(dt_all2, "IJAZAH SARJANA MUDA") > > Hope this helps, > > Rui Barradas > > > > Em 15-03-2017 13:49, roslinazairimah zakaria escreveu: > >> Hi Caitlin, >> >> I tried so many ways as suggested but unsuccessful...and I realise that I >> need to filter the student ID and their CGPA, but if I change the ID into >> character I lost the CGPA value. It is easy to do in excel, however a bit >> time consuming and trying to do in R. >> >> I have these data: >> >> dput(dt_all2) >> structure(list(FAC_CODE = structure(c(2L, 2L, 2L, 4L, 1L, 1L, >> 4L, 7L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 1L, 2L, 5L, 6L), .Label = c("FKASA", >> "FKEE", "FKKSA", "FKM", "FKP", "FSKKP", "FTK"), class = "factor"), >> STUDENT_ID = structure(c(9L, 6L, 7L, 17L, 2L, 3L, 18L, 19L, >> 13L, 12L, 14L, 15L, 16L, 10L, 8L, 1L, 5L, 11L, 4L), .Label >> c("AA14068", >> "AB15103", "AB15124", "CC14107", "EA13043", "EB14059", "EB14073", >> "EB14101", "EC14021", "EC15063", "FB14085", "KA13142", "KA13143", >> "KA13156", "KE13034", "KE13046", "MA14071", "MA14115", "PA13048" >> ), class = "factor"), PROGRAM = structure(c(2L, 1L, 1L, 2L, >> 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L >> ), .Label = c("DIPLOMA", "IJAZAH SARJANA MUDA"), class = "factor"), >> CGPA = c(2.42, 3.27, 1.98, 2.85, 2.24, 3.01, 3.31, 2.88, >> 3.61, 3.69, 3.2, 3.85, 3.63, 2.67, 2.35, 2.74, 1.96, 2.89, >> 2.59)), .Names = c("FAC_CODE", "STUDENT_ID", "PROGRAM", "CGPA" >> ), class = "data.frame", row.names = c(NA, -19L)) >> >> and I want to filter my data as follows: >> >> dput(dt_all3) >>> >> structure(list(FAC_CODE = structure(c(2L, 2L, 4L, 4L, 5L, 1L, >> 6L, 3L, 3L, 3L, 3L, 3L, 2L), .Label = c("FKASA", "FKEE", "FKKSA", >> "FKM", "FKP", "FTK"), class = "factor"), STUDENT_ID = structure(c(4L, >> 3L, 11L, 12L, 5L, 1L, 13L, 7L, 6L, 8L, 9L, 10L, 2L), .Label = c("AA14068", >> "EA13043", "EC14021", "EC15063", "FB14085", "KA13142", "KA13143", >> "KA13156", "KE13034", "KE13046", "MA14071", "MA14115", "PA13048" >> ), class = "factor"), PROGRAM = structure(c(1L, 1L, 1L, 1L, 1L, >> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "IJAZAH SARJANA MUDA", class >> "factor"), >> CGPA = c(2.67, 2.42, 2.85, 3.31, 2.89, 2.74, 2.88, 3.61, >> 3.69, 3.2, 3.85, 3.63, 1.96)), .Names = c("FAC_CODE", "STUDENT_ID", >> "PROGRAM", "CGPA"), class = "data.frame", row.names = c(NA, -13L >> )) >> >> I would like to select the student id where the third and fourth value >> represent the year they register data is eg. AA15..., AE14,... and I would >> also to select their cgpa value. >> >> Thank you. >> >> On Mon, Mar 13, 2017 at 2:26 PM, roslinazairimah zakaria < >> roslinaump at gmail.com> wrote: >> >> Thank you so much for your help. >>> >>> On Mon, Mar 13, 2017 at 1:52 PM, bioprogrammer <bioprogrammer at gmail.com> >>> wrote: >>> >>> Hi. >>>> >>>> I would use the "substr" function: >>>> >>>> https://stat.ethz.ch/R-manual/R-devel/library/base/html/substr.html >>>> >>>> ...assuming you're working with character data. >>>> >>>> Another useful skill involves working with regular expressions. >>>> >>>> http://www.endmemo.com/program/R/grep.php >>>> >>>> http://regular-expressions.mobi/tutorial.html >>>> >>>> Hope these help :) >>>> >>>> ~Caitlin >>>> >>>> >>>> >>>> >>>> >>>> Sent from my T-Mobile 4G LTE Device >>>> >>>> >>>> -------- Original message -------- >>>> From: roslinazairimah zakaria <roslinaump at gmail.com> >>>> Date:03/12/2017 10:18 PM (GMT-07:00) >>>> To: Bert Gunter <bgunter.4567 at gmail.com> >>>> Cc: r-help mailing list <r-help at r-project.org> >>>> Subject: Re: [R] Extract student ID that match certain criteria >>>> >>>> Another question, >>>> >>>> How do I extract ID based on the third and fourth letter: >>>> >>>> I have for example, AA14004, AB15035, CB14024, PA14009, PA14009 etc >>>> >>>> I would like to extract ID no. of AB14..., CB14..., PA14... >>>> >>>> On Mon, Mar 13, 2017 at 12:37 PM, roslinazairimah zakaria < >>>> roslinaump at gmail.com> wrote: >>>> >>>> Hi Bert, >>>>> >>>>> Thank you so much for your help. However I don't really sure what is >>>>> >>>> the >>>> >>>>> use of y values. Can we do without it? >>>>> >>>>> x <- as.character(FKASA$STUDENT_ID) >>>>> y <- c(1,786) >>>>> My.Data <- data.frame (x,y) >>>>> >>>>> My.Data[grep("^AA14", My.Data$x), ] >>>>> >>>>> I got the following data: >>>>> >>>>> x y >>>>> 1 AA14068 1 >>>>> 7 AA14090 1 >>>>> 11 AA14099 1 >>>>> 14 AA14012 786 >>>>> 15 AA14039 1 >>>>> 22 AA14251 786 >>>>> >>>>> On Mon, Mar 13, 2017 at 11:51 AM, Bert Gunter <bgunter.4567 at gmail.com> >>>>> wrote: >>>>> >>>>> 1. Your code is incorrect. All entries are character strings and must >>>>>> >>>>> be >>>> >>>>> quoted. >>>>>> >>>>>> 2. See ?grep and note in particular (in the "Value" section): >>>>>> >>>>>> "grep(value = TRUE) returns a character vector containing the selected >>>>>> elements of x (after coercion, preserving names but no other >>>>>> attributes)." >>>>>> >>>>>> >>>>>> 3. While the fixed = TRUE option will work here, you may wish to learn >>>>>> about "regular expressions", which can come in very handy for >>>>>> character string manipulation tasks. ?regex in R has a terse, but I >>>>>> have found comprehensible, discussion. There are many good gentler >>>>>> tutorials on the web, also. >>>>>> >>>>>> >>>>>> Cheers, >>>>>> Bert >>>>>> >>>>>> Bert Gunter >>>>>> >>>>>> "The trouble with having an open mind is that people keep coming along >>>>>> and sticking things into it." >>>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >>>>>> >>>>>> >>>>>> On Sun, Mar 12, 2017 at 8:32 PM, roslinazairimah zakaria >>>>>> <roslinaump at gmail.com> wrote: >>>>>> >>>>>>> Dear r-users, >>>>>>> >>>>>>> I have this list of student ID, >>>>>>> >>>>>>> dt <- c(AA14068, AA13194, AE11054, AA12251, AA13228, AA13286, >>>>>>> >>>>>> AA14090, >>>> >>>>> AA13256, AA13260, AA13291, AA14099, AA15071, AA13143, AA14012, >>>>>>> >>>>>> AA14039, >>>> >>>>> AA15018, AA13234, AA13149, AA13282, AA13218) >>>>>>> >>>>>>> and I would like to extract all student of ID AA14... only. >>>>>>> >>>>>>> I search and tried substrt, subset and select but it fail. >>>>>>> >>>>>>> substr(FKASA$STUDENT_ID, 2, nchar(string1)) >>>>>>> Error in nchar(string1) : 'nchar()' requires a character vector >>>>>>> >>>>>>>> subset(FKASA, STUDENT_ID=="AA14" ) >>>>>>>> >>>>>>> [1] FAC_CODE FACULTY STUDENT_ID NAME PROGRAM >>>>>>> >>>>>> KURSUS >>>> >>>>> CGPA ACT_SS ACT_VAL ACT_CS ACT_LED ACT_PS >>>>>>> ACT_IM >>>>>>> [14] ACT_ENT ACT_CRE ACT_UNI ACT_VOL... >>>>>>> >>>>>>> Thank you so much for your help. >>>>>>> >>>>>>> How do I do it? >>>>>>> -- >>>>>>> *Roslinazairimah Zakaria* >>>>>>> *Tel: +609-5492370 <+60%209-549%202370>; Fax. No.+609-5492766 >>>>>>> >>>>>> <+60%209-549%202766>* >>>> >>>>> >>>>>>> *Email: roslinazairimah at ump.edu.my <roslinazairimah at ump.edu.my>; >>>>>>> roslinaump at gmail.com <roslinaump at gmail.com>* >>>>>>> Faculty of Industrial Sciences & Technology >>>>>>> University Malaysia Pahang >>>>>>> Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia >>>>>>> >>>>>>> [[alternative HTML version deleted]] >>>>>>> >>>>>>> ______________________________________________ >>>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>>> PLEASE do read the posting guide http://www.R-project.org/posti >>>>>>> >>>>>> ng-guide.html >>>>>> >>>>>>> and provide commented, minimal, self-contained, reproducible code. >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> *Roslinazairimah Zakaria* >>>>> *Tel: +609-5492370 <+60%209-549%202370> <+60%209-549%202370>; Fax. No. >>>>> >>>> +609-5492766 <+60%209-549%202766> >>>> >>>> <+60%209-549%202766>* >>>>> >>>>> *Email: roslinazairimah at ump.edu.my <roslinazairimah at ump.edu.my>; >>>>> roslinaump at gmail.com <roslinaump at gmail.com>* >>>>> Faculty of Industrial Sciences & Technology >>>>> University Malaysia Pahang >>>>> Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia >>>>> >>>>> >>>> >>>> >>>> -- >>>> *Roslinazairimah Zakaria* >>>> *Tel: +609-5492370 <+60%209-549%202370>; Fax. No.+609-5492766 >>>> <+60%209-549%202766>* >>>> >>>> *Email: roslinazairimah at ump.edu.my <roslinazairimah at ump.edu.my>; >>>> roslinaump at gmail.com <roslinaump at gmail.com>* >>>> Faculty of Industrial Sciences & Technology >>>> University Malaysia Pahang >>>> Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide http://www.R-project.org/posti >>>> ng-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>> >>> >>> >>> -- >>> *Roslinazairimah Zakaria* >>> *Tel: +609-5492370 <+60%209-549%202370>; Fax. No.+609-5492766 >>> <+60%209-549%202766>* >>> >>> *Email: roslinazairimah at ump.edu.my <roslinazairimah at ump.edu.my>; >>> roslinaump at gmail.com <roslinaump at gmail.com>* >>> Faculty of Industrial Sciences & Technology >>> University Malaysia Pahang >>> Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia >>> >>> >> >> >>-- *Roslinazairimah Zakaria* *Tel: +609-5492370; Fax. No.+609-5492766* *Email: roslinazairimah at ump.edu.my <roslinazairimah at ump.edu.my>; roslinaump at gmail.com <roslinaump at gmail.com>* Faculty of Industrial Sciences & Technology University Malaysia Pahang Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia [[alternative HTML version deleted]]