roslinazairimah zakaria
2017-Mar-15 13:49 UTC
[R] Extract student ID that match certain criteria
Hi Caitlin,
I tried so many ways as suggested but unsuccessful...and I realise that I
need to filter the student ID and their CGPA, but if I change the ID into
character I lost the CGPA value. It is easy to do in excel, however a bit
time consuming and trying to do in R.
I have these data:
dput(dt_all2)
structure(list(FAC_CODE = structure(c(2L, 2L, 2L, 4L, 1L, 1L,
4L, 7L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 1L, 2L, 5L, 6L), .Label =
c("FKASA",
"FKEE", "FKKSA", "FKM", "FKP",
"FSKKP", "FTK"), class = "factor"),
STUDENT_ID = structure(c(9L, 6L, 7L, 17L, 2L, 3L, 18L, 19L,
13L, 12L, 14L, 15L, 16L, 10L, 8L, 1L, 5L, 11L, 4L), .Label
c("AA14068",
"AB15103", "AB15124", "CC14107",
"EA13043", "EB14059", "EB14073",
"EB14101", "EC14021", "EC15063",
"FB14085", "KA13142", "KA13143",
"KA13156", "KE13034", "KE13046",
"MA14071", "MA14115", "PA13048"
), class = "factor"), PROGRAM = structure(c(2L, 1L, 1L, 2L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L
), .Label = c("DIPLOMA", "IJAZAH SARJANA MUDA"), class =
"factor"),
CGPA = c(2.42, 3.27, 1.98, 2.85, 2.24, 3.01, 3.31, 2.88,
3.61, 3.69, 3.2, 3.85, 3.63, 2.67, 2.35, 2.74, 1.96, 2.89,
2.59)), .Names = c("FAC_CODE", "STUDENT_ID",
"PROGRAM", "CGPA"
), class = "data.frame", row.names = c(NA, -19L))
and I want to filter my data as follows:
> dput(dt_all3)
structure(list(FAC_CODE = structure(c(2L, 2L, 4L, 4L, 5L, 1L,
6L, 3L, 3L, 3L, 3L, 3L, 2L), .Label = c("FKASA", "FKEE",
"FKKSA",
"FKM", "FKP", "FTK"), class = "factor"),
STUDENT_ID = structure(c(4L,
3L, 11L, 12L, 5L, 1L, 13L, 7L, 6L, 8L, 9L, 10L, 2L), .Label =
c("AA14068",
"EA13043", "EC14021", "EC15063",
"FB14085", "KA13142", "KA13143",
"KA13156", "KE13034", "KE13046",
"MA14071", "MA14115", "PA13048"
), class = "factor"), PROGRAM = structure(c(1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "IJAZAH SARJANA MUDA", class
"factor"),
CGPA = c(2.67, 2.42, 2.85, 3.31, 2.89, 2.74, 2.88, 3.61,
3.69, 3.2, 3.85, 3.63, 1.96)), .Names = c("FAC_CODE",
"STUDENT_ID",
"PROGRAM", "CGPA"), class = "data.frame",
row.names = c(NA, -13L
))
I would like to select the student id where the third and fourth value
represent the year they register data is eg. AA15..., AE14,... and I would
also to select their cgpa value.
Thank you.
On Mon, Mar 13, 2017 at 2:26 PM, roslinazairimah zakaria <
roslinaump at gmail.com> wrote:
> Thank you so much for your help.
>
> On Mon, Mar 13, 2017 at 1:52 PM, bioprogrammer <bioprogrammer at
gmail.com>
> wrote:
>
>> Hi.
>>
>> I would use the "substr" function:
>>
>> https://stat.ethz.ch/R-manual/R-devel/library/base/html/substr.html
>>
>> ...assuming you're working with character data.
>>
>> Another useful skill involves working with regular expressions.
>>
>> http://www.endmemo.com/program/R/grep.php
>>
>> http://regular-expressions.mobi/tutorial.html
>>
>> Hope these help :)
>>
>> ~Caitlin
>>
>>
>>
>>
>>
>> Sent from my T-Mobile 4G LTE Device
>>
>>
>> -------- Original message --------
>> From: roslinazairimah zakaria <roslinaump at gmail.com>
>> Date:03/12/2017 10:18 PM (GMT-07:00)
>> To: Bert Gunter <bgunter.4567 at gmail.com>
>> Cc: r-help mailing list <r-help at r-project.org>
>> Subject: Re: [R] Extract student ID that match certain criteria
>>
>> Another question,
>>
>> How do I extract ID based on the third and fourth letter:
>>
>> I have for example, AA14004, AB15035, CB14024, PA14009, PA14009 etc
>>
>> I would like to extract ID no. of AB14..., CB14..., PA14...
>>
>> On Mon, Mar 13, 2017 at 12:37 PM, roslinazairimah zakaria <
>> roslinaump at gmail.com> wrote:
>>
>> > Hi Bert,
>> >
>> > Thank you so much for your help. However I don't really sure
what is
>> the
>> > use of y values. Can we do without it?
>> >
>> > x <- as.character(FKASA$STUDENT_ID)
>> > y <- c(1,786)
>> > My.Data <- data.frame (x,y)
>> >
>> > My.Data[grep("^AA14", My.Data$x), ]
>> >
>> > I got the following data:
>> >
>> > x y
>> > 1 AA14068 1
>> > 7 AA14090 1
>> > 11 AA14099 1
>> > 14 AA14012 786
>> > 15 AA14039 1
>> > 22 AA14251 786
>> >
>> > On Mon, Mar 13, 2017 at 11:51 AM, Bert Gunter <bgunter.4567 at
gmail.com>
>> > wrote:
>> >
>> >> 1. Your code is incorrect. All entries are character strings
and must
>> be
>> >> quoted.
>> >>
>> >> 2. See ?grep and note in particular (in the "Value"
section):
>> >>
>> >> "grep(value = TRUE) returns a character vector containing
the selected
>> >> elements of x (after coercion, preserving names but no other
>> >> attributes)."
>> >>
>> >>
>> >> 3. While the fixed = TRUE option will work here, you may wish
to learn
>> >> about "regular expressions", which can come in very
handy for
>> >> character string manipulation tasks. ?regex in R has a terse,
but I
>> >> have found comprehensible, discussion. There are many good
gentler
>> >> tutorials on the web, also.
>> >>
>> >>
>> >> Cheers,
>> >> Bert
>> >>
>> >> Bert Gunter
>> >>
>> >> "The trouble with having an open mind is that people keep
coming along
>> >> and sticking things into it."
>> >> -- Opus (aka Berkeley Breathed in his "Bloom County"
comic strip )
>> >>
>> >>
>> >> On Sun, Mar 12, 2017 at 8:32 PM, roslinazairimah zakaria
>> >> <roslinaump at gmail.com> wrote:
>> >> > Dear r-users,
>> >> >
>> >> > I have this list of student ID,
>> >> >
>> >> > dt <- c(AA14068, AA13194, AE11054, AA12251, AA13228,
AA13286,
>> AA14090,
>> >> > AA13256, AA13260, AA13291, AA14099, AA15071, AA13143,
AA14012,
>> AA14039,
>> >> > AA15018, AA13234, AA13149, AA13282, AA13218)
>> >> >
>> >> > and I would like to extract all student of ID AA14...
only.
>> >> >
>> >> > I search and tried substrt, subset and select but it
fail.
>> >> >
>> >> > substr(FKASA$STUDENT_ID, 2, nchar(string1))
>> >> > Error in nchar(string1) : 'nchar()' requires a
character vector
>> >> >> subset(FKASA, STUDENT_ID=="AA14" )
>> >> > [1] FAC_CODE FACULTY STUDENT_ID NAME
PROGRAM
>> KURSUS
>> >> > CGPA ACT_SS ACT_VAL ACT_CS ACT_LED
ACT_PS
>> >> > ACT_IM
>> >> > [14] ACT_ENT ACT_CRE ACT_UNI ACT_VOL...
>> >> >
>> >> > Thank you so much for your help.
>> >> >
>> >> > How do I do it?
>> >> > --
>> >> > *Roslinazairimah Zakaria*
>> >> > *Tel: +609-5492370 <+60%209-549%202370>; Fax.
No.+609-5492766
>> <+60%209-549%202766>*
>> >> >
>> >> > *Email: roslinazairimah at ump.edu.my <roslinazairimah
at ump.edu.my>;
>> >> > roslinaump at gmail.com <roslinaump at gmail.com>*
>> >> > Faculty of Industrial Sciences & Technology
>> >> > University Malaysia Pahang
>> >> > Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia
>> >> >
>> >> > [[alternative HTML version deleted]]
>> >> >
>> >> > ______________________________________________
>> >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE
and more, see
>> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >> > PLEASE do read the posting guide
http://www.R-project.org/posti
>> >> ng-guide.html
>> >> > and provide commented, minimal, self-contained,
reproducible code.
>> >>
>> >
>> >
>> >
>> > --
>> > *Roslinazairimah Zakaria*
>> > *Tel: +609-5492370 <+60%209-549%202370>
<+60%209-549%202370>; Fax. No.
>> +609-5492766 <+60%209-549%202766>
>> > <+60%209-549%202766>*
>> >
>> > *Email: roslinazairimah at ump.edu.my <roslinazairimah at
ump.edu.my>;
>> > roslinaump at gmail.com <roslinaump at gmail.com>*
>> > Faculty of Industrial Sciences & Technology
>> > University Malaysia Pahang
>> > Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia
>> >
>>
>>
>>
>> --
>> *Roslinazairimah Zakaria*
>> *Tel: +609-5492370 <+60%209-549%202370>; Fax. No.+609-5492766
>> <+60%209-549%202766>*
>>
>> *Email: roslinazairimah at ump.edu.my <roslinazairimah at
ump.edu.my>;
>> roslinaump at gmail.com <roslinaump at gmail.com>*
>> Faculty of Industrial Sciences & Technology
>> University Malaysia Pahang
>> Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> *Roslinazairimah Zakaria*
> *Tel: +609-5492370 <+60%209-549%202370>; Fax. No.+609-5492766
> <+60%209-549%202766>*
>
> *Email: roslinazairimah at ump.edu.my <roslinazairimah at
ump.edu.my>;
> roslinaump at gmail.com <roslinaump at gmail.com>*
> Faculty of Industrial Sciences & Technology
> University Malaysia Pahang
> Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia
>
--
*Roslinazairimah Zakaria*
*Tel: +609-5492370; Fax. No.+609-5492766*
*Email: roslinazairimah at ump.edu.my <roslinazairimah at ump.edu.my>;
roslinaump at gmail.com <roslinaump at gmail.com>*
Faculty of Industrial Sciences & Technology
University Malaysia Pahang
Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia
[[alternative HTML version deleted]]
Hello,
I believe your request is a bit confusing since you say you want to
filter the student id but then you have many years in dt_all3 and only
one program ("IJAZAH SARJANA MUDA"). So I've written two simple
functions, one to filter by year and the other by program.
fun1 <- function(x, year){
inx <- substr(x[["STUDENT_ID"]], 3, 4) == as.character(year)
x[inx, ]
}
fun2 <- function(x, program){
inx <- x[["PROGRAM"]] == program
x[inx, ]
}
fun1(dt_all2, 14) # filter by year = 14
fun2(dt_all2, "IJAZAH SARJANA MUDA")
Hope this helps,
Rui Barradas
Em 15-03-2017 13:49, roslinazairimah zakaria escreveu:> Hi Caitlin,
>
> I tried so many ways as suggested but unsuccessful...and I realise that I
> need to filter the student ID and their CGPA, but if I change the ID into
> character I lost the CGPA value. It is easy to do in excel, however a bit
> time consuming and trying to do in R.
>
> I have these data:
>
> dput(dt_all2)
> structure(list(FAC_CODE = structure(c(2L, 2L, 2L, 4L, 1L, 1L,
> 4L, 7L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 1L, 2L, 5L, 6L), .Label =
c("FKASA",
> "FKEE", "FKKSA", "FKM", "FKP",
"FSKKP", "FTK"), class = "factor"),
> STUDENT_ID = structure(c(9L, 6L, 7L, 17L, 2L, 3L, 18L, 19L,
> 13L, 12L, 14L, 15L, 16L, 10L, 8L, 1L, 5L, 11L, 4L), .Label >
c("AA14068",
> "AB15103", "AB15124", "CC14107",
"EA13043", "EB14059", "EB14073",
> "EB14101", "EC14021", "EC15063",
"FB14085", "KA13142", "KA13143",
> "KA13156", "KE13034", "KE13046",
"MA14071", "MA14115", "PA13048"
> ), class = "factor"), PROGRAM = structure(c(2L, 1L, 1L, 2L,
> 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L
> ), .Label = c("DIPLOMA", "IJAZAH SARJANA MUDA"),
class = "factor"),
> CGPA = c(2.42, 3.27, 1.98, 2.85, 2.24, 3.01, 3.31, 2.88,
> 3.61, 3.69, 3.2, 3.85, 3.63, 2.67, 2.35, 2.74, 1.96, 2.89,
> 2.59)), .Names = c("FAC_CODE", "STUDENT_ID",
"PROGRAM", "CGPA"
> ), class = "data.frame", row.names = c(NA, -19L))
>
> and I want to filter my data as follows:
>
>> dput(dt_all3)
> structure(list(FAC_CODE = structure(c(2L, 2L, 4L, 4L, 5L, 1L,
> 6L, 3L, 3L, 3L, 3L, 3L, 2L), .Label = c("FKASA",
"FKEE", "FKKSA",
> "FKM", "FKP", "FTK"), class =
"factor"), STUDENT_ID = structure(c(4L,
> 3L, 11L, 12L, 5L, 1L, 13L, 7L, 6L, 8L, 9L, 10L, 2L), .Label =
c("AA14068",
> "EA13043", "EC14021", "EC15063",
"FB14085", "KA13142", "KA13143",
> "KA13156", "KE13034", "KE13046",
"MA14071", "MA14115", "PA13048"
> ), class = "factor"), PROGRAM = structure(c(1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "IJAZAH SARJANA MUDA",
class > "factor"),
> CGPA = c(2.67, 2.42, 2.85, 3.31, 2.89, 2.74, 2.88, 3.61,
> 3.69, 3.2, 3.85, 3.63, 1.96)), .Names = c("FAC_CODE",
"STUDENT_ID",
> "PROGRAM", "CGPA"), class = "data.frame",
row.names = c(NA, -13L
> ))
>
> I would like to select the student id where the third and fourth value
> represent the year they register data is eg. AA15..., AE14,... and I would
> also to select their cgpa value.
>
> Thank you.
>
> On Mon, Mar 13, 2017 at 2:26 PM, roslinazairimah zakaria <
> roslinaump at gmail.com> wrote:
>
>> Thank you so much for your help.
>>
>> On Mon, Mar 13, 2017 at 1:52 PM, bioprogrammer <bioprogrammer at
gmail.com>
>> wrote:
>>
>>> Hi.
>>>
>>> I would use the "substr" function:
>>>
>>> https://stat.ethz.ch/R-manual/R-devel/library/base/html/substr.html
>>>
>>> ...assuming you're working with character data.
>>>
>>> Another useful skill involves working with regular expressions.
>>>
>>> http://www.endmemo.com/program/R/grep.php
>>>
>>> http://regular-expressions.mobi/tutorial.html
>>>
>>> Hope these help :)
>>>
>>> ~Caitlin
>>>
>>>
>>>
>>>
>>>
>>> Sent from my T-Mobile 4G LTE Device
>>>
>>>
>>> -------- Original message --------
>>> From: roslinazairimah zakaria <roslinaump at gmail.com>
>>> Date:03/12/2017 10:18 PM (GMT-07:00)
>>> To: Bert Gunter <bgunter.4567 at gmail.com>
>>> Cc: r-help mailing list <r-help at r-project.org>
>>> Subject: Re: [R] Extract student ID that match certain criteria
>>>
>>> Another question,
>>>
>>> How do I extract ID based on the third and fourth letter:
>>>
>>> I have for example, AA14004, AB15035, CB14024, PA14009, PA14009 etc
>>>
>>> I would like to extract ID no. of AB14..., CB14..., PA14...
>>>
>>> On Mon, Mar 13, 2017 at 12:37 PM, roslinazairimah zakaria <
>>> roslinaump at gmail.com> wrote:
>>>
>>>> Hi Bert,
>>>>
>>>> Thank you so much for your help. However I don't really
sure what is
>>> the
>>>> use of y values. Can we do without it?
>>>>
>>>> x <- as.character(FKASA$STUDENT_ID)
>>>> y <- c(1,786)
>>>> My.Data <- data.frame (x,y)
>>>>
>>>> My.Data[grep("^AA14", My.Data$x), ]
>>>>
>>>> I got the following data:
>>>>
>>>> x y
>>>> 1 AA14068 1
>>>> 7 AA14090 1
>>>> 11 AA14099 1
>>>> 14 AA14012 786
>>>> 15 AA14039 1
>>>> 22 AA14251 786
>>>>
>>>> On Mon, Mar 13, 2017 at 11:51 AM, Bert Gunter <bgunter.4567
at gmail.com>
>>>> wrote:
>>>>
>>>>> 1. Your code is incorrect. All entries are character
strings and must
>>> be
>>>>> quoted.
>>>>>
>>>>> 2. See ?grep and note in particular (in the
"Value" section):
>>>>>
>>>>> "grep(value = TRUE) returns a character vector
containing the selected
>>>>> elements of x (after coercion, preserving names but no
other
>>>>> attributes)."
>>>>>
>>>>>
>>>>> 3. While the fixed = TRUE option will work here, you may
wish to learn
>>>>> about "regular expressions", which can come in
very handy for
>>>>> character string manipulation tasks. ?regex in R has a
terse, but I
>>>>> have found comprehensible, discussion. There are many good
gentler
>>>>> tutorials on the web, also.
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Bert
>>>>>
>>>>> Bert Gunter
>>>>>
>>>>> "The trouble with having an open mind is that people
keep coming along
>>>>> and sticking things into it."
>>>>> -- Opus (aka Berkeley Breathed in his "Bloom
County" comic strip )
>>>>>
>>>>>
>>>>> On Sun, Mar 12, 2017 at 8:32 PM, roslinazairimah zakaria
>>>>> <roslinaump at gmail.com> wrote:
>>>>>> Dear r-users,
>>>>>>
>>>>>> I have this list of student ID,
>>>>>>
>>>>>> dt <- c(AA14068, AA13194, AE11054, AA12251, AA13228,
AA13286,
>>> AA14090,
>>>>>> AA13256, AA13260, AA13291, AA14099, AA15071, AA13143,
AA14012,
>>> AA14039,
>>>>>> AA15018, AA13234, AA13149, AA13282, AA13218)
>>>>>>
>>>>>> and I would like to extract all student of ID AA14...
only.
>>>>>>
>>>>>> I search and tried substrt, subset and select but it
fail.
>>>>>>
>>>>>> substr(FKASA$STUDENT_ID, 2, nchar(string1))
>>>>>> Error in nchar(string1) : 'nchar()' requires a
character vector
>>>>>>> subset(FKASA, STUDENT_ID=="AA14" )
>>>>>> [1] FAC_CODE FACULTY STUDENT_ID NAME
PROGRAM
>>> KURSUS
>>>>>> CGPA ACT_SS ACT_VAL ACT_CS
ACT_LED ACT_PS
>>>>>> ACT_IM
>>>>>> [14] ACT_ENT ACT_CRE ACT_UNI ACT_VOL...
>>>>>>
>>>>>> Thank you so much for your help.
>>>>>>
>>>>>> How do I do it?
>>>>>> --
>>>>>> *Roslinazairimah Zakaria*
>>>>>> *Tel: +609-5492370 <+60%209-549%202370>; Fax.
No.+609-5492766
>>> <+60%209-549%202766>*
>>>>>>
>>>>>> *Email: roslinazairimah at ump.edu.my
<roslinazairimah at ump.edu.my>;
>>>>>> roslinaump at gmail.com <roslinaump at
gmail.com>*
>>>>>> Faculty of Industrial Sciences & Technology
>>>>>> University Malaysia Pahang
>>>>>> Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia
>>>>>>
>>>>>> [[alternative HTML version deleted]]
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE
and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
http://www.R-project.org/posti
>>>>> ng-guide.html
>>>>>> and provide commented, minimal, self-contained,
reproducible code.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> *Roslinazairimah Zakaria*
>>>> *Tel: +609-5492370 <+60%209-549%202370>
<+60%209-549%202370>; Fax. No.
>>> +609-5492766 <+60%209-549%202766>
>>>> <+60%209-549%202766>*
>>>>
>>>> *Email: roslinazairimah at ump.edu.my <roslinazairimah at
ump.edu.my>;
>>>> roslinaump at gmail.com <roslinaump at gmail.com>*
>>>> Faculty of Industrial Sciences & Technology
>>>> University Malaysia Pahang
>>>> Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia
>>>>
>>>
>>>
>>>
>>> --
>>> *Roslinazairimah Zakaria*
>>> *Tel: +609-5492370 <+60%209-549%202370>; Fax. No.+609-5492766
>>> <+60%209-549%202766>*
>>>
>>> *Email: roslinazairimah at ump.edu.my <roslinazairimah at
ump.edu.my>;
>>> roslinaump at gmail.com <roslinaump at gmail.com>*
>>> Faculty of Industrial Sciences & Technology
>>> University Malaysia Pahang
>>> Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posti
>>> ng-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> *Roslinazairimah Zakaria*
>> *Tel: +609-5492370 <+60%209-549%202370>; Fax. No.+609-5492766
>> <+60%209-549%202766>*
>>
>> *Email: roslinazairimah at ump.edu.my <roslinazairimah at
ump.edu.my>;
>> roslinaump at gmail.com <roslinaump at gmail.com>*
>> Faculty of Industrial Sciences & Technology
>> University Malaysia Pahang
>> Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia
>>
>
>
>
roslinazairimah zakaria
2017-Mar-15 22:34 UTC
[R] Extract student ID that match certain criteria
Hi Rui, Both functions work beautifully. I really appreciate your help and others very much. Thank you On Wed, Mar 15, 2017 at 10:46 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:> Hello, > > I believe your request is a bit confusing since you say you want to filter > the student id but then you have many years in dt_all3 and only one program > ("IJAZAH SARJANA MUDA"). So I've written two simple functions, one to > filter by year and the other by program. > > > fun1 <- function(x, year){ > inx <- substr(x[["STUDENT_ID"]], 3, 4) == as.character(year) > x[inx, ] > } > > fun2 <- function(x, program){ > inx <- x[["PROGRAM"]] == program > x[inx, ] > } > > fun1(dt_all2, 14) # filter by year = 14 > fun2(dt_all2, "IJAZAH SARJANA MUDA") > > Hope this helps, > > Rui Barradas > > > > Em 15-03-2017 13:49, roslinazairimah zakaria escreveu: > >> Hi Caitlin, >> >> I tried so many ways as suggested but unsuccessful...and I realise that I >> need to filter the student ID and their CGPA, but if I change the ID into >> character I lost the CGPA value. It is easy to do in excel, however a bit >> time consuming and trying to do in R. >> >> I have these data: >> >> dput(dt_all2) >> structure(list(FAC_CODE = structure(c(2L, 2L, 2L, 4L, 1L, 1L, >> 4L, 7L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 1L, 2L, 5L, 6L), .Label = c("FKASA", >> "FKEE", "FKKSA", "FKM", "FKP", "FSKKP", "FTK"), class = "factor"), >> STUDENT_ID = structure(c(9L, 6L, 7L, 17L, 2L, 3L, 18L, 19L, >> 13L, 12L, 14L, 15L, 16L, 10L, 8L, 1L, 5L, 11L, 4L), .Label >> c("AA14068", >> "AB15103", "AB15124", "CC14107", "EA13043", "EB14059", "EB14073", >> "EB14101", "EC14021", "EC15063", "FB14085", "KA13142", "KA13143", >> "KA13156", "KE13034", "KE13046", "MA14071", "MA14115", "PA13048" >> ), class = "factor"), PROGRAM = structure(c(2L, 1L, 1L, 2L, >> 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L >> ), .Label = c("DIPLOMA", "IJAZAH SARJANA MUDA"), class = "factor"), >> CGPA = c(2.42, 3.27, 1.98, 2.85, 2.24, 3.01, 3.31, 2.88, >> 3.61, 3.69, 3.2, 3.85, 3.63, 2.67, 2.35, 2.74, 1.96, 2.89, >> 2.59)), .Names = c("FAC_CODE", "STUDENT_ID", "PROGRAM", "CGPA" >> ), class = "data.frame", row.names = c(NA, -19L)) >> >> and I want to filter my data as follows: >> >> dput(dt_all3) >>> >> structure(list(FAC_CODE = structure(c(2L, 2L, 4L, 4L, 5L, 1L, >> 6L, 3L, 3L, 3L, 3L, 3L, 2L), .Label = c("FKASA", "FKEE", "FKKSA", >> "FKM", "FKP", "FTK"), class = "factor"), STUDENT_ID = structure(c(4L, >> 3L, 11L, 12L, 5L, 1L, 13L, 7L, 6L, 8L, 9L, 10L, 2L), .Label = c("AA14068", >> "EA13043", "EC14021", "EC15063", "FB14085", "KA13142", "KA13143", >> "KA13156", "KE13034", "KE13046", "MA14071", "MA14115", "PA13048" >> ), class = "factor"), PROGRAM = structure(c(1L, 1L, 1L, 1L, 1L, >> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "IJAZAH SARJANA MUDA", class >> "factor"), >> CGPA = c(2.67, 2.42, 2.85, 3.31, 2.89, 2.74, 2.88, 3.61, >> 3.69, 3.2, 3.85, 3.63, 1.96)), .Names = c("FAC_CODE", "STUDENT_ID", >> "PROGRAM", "CGPA"), class = "data.frame", row.names = c(NA, -13L >> )) >> >> I would like to select the student id where the third and fourth value >> represent the year they register data is eg. AA15..., AE14,... and I would >> also to select their cgpa value. >> >> Thank you. >> >> On Mon, Mar 13, 2017 at 2:26 PM, roslinazairimah zakaria < >> roslinaump at gmail.com> wrote: >> >> Thank you so much for your help. >>> >>> On Mon, Mar 13, 2017 at 1:52 PM, bioprogrammer <bioprogrammer at gmail.com> >>> wrote: >>> >>> Hi. >>>> >>>> I would use the "substr" function: >>>> >>>> https://stat.ethz.ch/R-manual/R-devel/library/base/html/substr.html >>>> >>>> ...assuming you're working with character data. >>>> >>>> Another useful skill involves working with regular expressions. >>>> >>>> http://www.endmemo.com/program/R/grep.php >>>> >>>> http://regular-expressions.mobi/tutorial.html >>>> >>>> Hope these help :) >>>> >>>> ~Caitlin >>>> >>>> >>>> >>>> >>>> >>>> Sent from my T-Mobile 4G LTE Device >>>> >>>> >>>> -------- Original message -------- >>>> From: roslinazairimah zakaria <roslinaump at gmail.com> >>>> Date:03/12/2017 10:18 PM (GMT-07:00) >>>> To: Bert Gunter <bgunter.4567 at gmail.com> >>>> Cc: r-help mailing list <r-help at r-project.org> >>>> Subject: Re: [R] Extract student ID that match certain criteria >>>> >>>> Another question, >>>> >>>> How do I extract ID based on the third and fourth letter: >>>> >>>> I have for example, AA14004, AB15035, CB14024, PA14009, PA14009 etc >>>> >>>> I would like to extract ID no. of AB14..., CB14..., PA14... >>>> >>>> On Mon, Mar 13, 2017 at 12:37 PM, roslinazairimah zakaria < >>>> roslinaump at gmail.com> wrote: >>>> >>>> Hi Bert, >>>>> >>>>> Thank you so much for your help. However I don't really sure what is >>>>> >>>> the >>>> >>>>> use of y values. Can we do without it? >>>>> >>>>> x <- as.character(FKASA$STUDENT_ID) >>>>> y <- c(1,786) >>>>> My.Data <- data.frame (x,y) >>>>> >>>>> My.Data[grep("^AA14", My.Data$x), ] >>>>> >>>>> I got the following data: >>>>> >>>>> x y >>>>> 1 AA14068 1 >>>>> 7 AA14090 1 >>>>> 11 AA14099 1 >>>>> 14 AA14012 786 >>>>> 15 AA14039 1 >>>>> 22 AA14251 786 >>>>> >>>>> On Mon, Mar 13, 2017 at 11:51 AM, Bert Gunter <bgunter.4567 at gmail.com> >>>>> wrote: >>>>> >>>>> 1. Your code is incorrect. All entries are character strings and must >>>>>> >>>>> be >>>> >>>>> quoted. >>>>>> >>>>>> 2. See ?grep and note in particular (in the "Value" section): >>>>>> >>>>>> "grep(value = TRUE) returns a character vector containing the selected >>>>>> elements of x (after coercion, preserving names but no other >>>>>> attributes)." >>>>>> >>>>>> >>>>>> 3. While the fixed = TRUE option will work here, you may wish to learn >>>>>> about "regular expressions", which can come in very handy for >>>>>> character string manipulation tasks. ?regex in R has a terse, but I >>>>>> have found comprehensible, discussion. There are many good gentler >>>>>> tutorials on the web, also. >>>>>> >>>>>> >>>>>> Cheers, >>>>>> Bert >>>>>> >>>>>> Bert Gunter >>>>>> >>>>>> "The trouble with having an open mind is that people keep coming along >>>>>> and sticking things into it." >>>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >>>>>> >>>>>> >>>>>> On Sun, Mar 12, 2017 at 8:32 PM, roslinazairimah zakaria >>>>>> <roslinaump at gmail.com> wrote: >>>>>> >>>>>>> Dear r-users, >>>>>>> >>>>>>> I have this list of student ID, >>>>>>> >>>>>>> dt <- c(AA14068, AA13194, AE11054, AA12251, AA13228, AA13286, >>>>>>> >>>>>> AA14090, >>>> >>>>> AA13256, AA13260, AA13291, AA14099, AA15071, AA13143, AA14012, >>>>>>> >>>>>> AA14039, >>>> >>>>> AA15018, AA13234, AA13149, AA13282, AA13218) >>>>>>> >>>>>>> and I would like to extract all student of ID AA14... only. >>>>>>> >>>>>>> I search and tried substrt, subset and select but it fail. >>>>>>> >>>>>>> substr(FKASA$STUDENT_ID, 2, nchar(string1)) >>>>>>> Error in nchar(string1) : 'nchar()' requires a character vector >>>>>>> >>>>>>>> subset(FKASA, STUDENT_ID=="AA14" ) >>>>>>>> >>>>>>> [1] FAC_CODE FACULTY STUDENT_ID NAME PROGRAM >>>>>>> >>>>>> KURSUS >>>> >>>>> CGPA ACT_SS ACT_VAL ACT_CS ACT_LED ACT_PS >>>>>>> ACT_IM >>>>>>> [14] ACT_ENT ACT_CRE ACT_UNI ACT_VOL... >>>>>>> >>>>>>> Thank you so much for your help. >>>>>>> >>>>>>> How do I do it? >>>>>>> -- >>>>>>> *Roslinazairimah Zakaria* >>>>>>> *Tel: +609-5492370 <+60%209-549%202370>; Fax. No.+609-5492766 >>>>>>> >>>>>> <+60%209-549%202766>* >>>> >>>>> >>>>>>> *Email: roslinazairimah at ump.edu.my <roslinazairimah at ump.edu.my>; >>>>>>> roslinaump at gmail.com <roslinaump at gmail.com>* >>>>>>> Faculty of Industrial Sciences & Technology >>>>>>> University Malaysia Pahang >>>>>>> Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia >>>>>>> >>>>>>> [[alternative HTML version deleted]] >>>>>>> >>>>>>> ______________________________________________ >>>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>>> PLEASE do read the posting guide http://www.R-project.org/posti >>>>>>> >>>>>> ng-guide.html >>>>>> >>>>>>> and provide commented, minimal, self-contained, reproducible code. >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> *Roslinazairimah Zakaria* >>>>> *Tel: +609-5492370 <+60%209-549%202370> <+60%209-549%202370>; Fax. No. >>>>> >>>> +609-5492766 <+60%209-549%202766> >>>> >>>> <+60%209-549%202766>* >>>>> >>>>> *Email: roslinazairimah at ump.edu.my <roslinazairimah at ump.edu.my>; >>>>> roslinaump at gmail.com <roslinaump at gmail.com>* >>>>> Faculty of Industrial Sciences & Technology >>>>> University Malaysia Pahang >>>>> Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia >>>>> >>>>> >>>> >>>> >>>> -- >>>> *Roslinazairimah Zakaria* >>>> *Tel: +609-5492370 <+60%209-549%202370>; Fax. No.+609-5492766 >>>> <+60%209-549%202766>* >>>> >>>> *Email: roslinazairimah at ump.edu.my <roslinazairimah at ump.edu.my>; >>>> roslinaump at gmail.com <roslinaump at gmail.com>* >>>> Faculty of Industrial Sciences & Technology >>>> University Malaysia Pahang >>>> Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide http://www.R-project.org/posti >>>> ng-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>> >>> >>> >>> -- >>> *Roslinazairimah Zakaria* >>> *Tel: +609-5492370 <+60%209-549%202370>; Fax. No.+609-5492766 >>> <+60%209-549%202766>* >>> >>> *Email: roslinazairimah at ump.edu.my <roslinazairimah at ump.edu.my>; >>> roslinaump at gmail.com <roslinaump at gmail.com>* >>> Faculty of Industrial Sciences & Technology >>> University Malaysia Pahang >>> Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia >>> >>> >> >> >>-- *Roslinazairimah Zakaria* *Tel: +609-5492370; Fax. No.+609-5492766* *Email: roslinazairimah at ump.edu.my <roslinazairimah at ump.edu.my>; roslinaump at gmail.com <roslinaump at gmail.com>* Faculty of Industrial Sciences & Technology University Malaysia Pahang Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia [[alternative HTML version deleted]]