thr3ads.net - R help - [R] Make 2nd col of 2-col df into header row of same df then adjust col1 data display [Dec 2014]

If this information is useful, please help other people find it:
Share via:

bcrombie

2014-Dec-18 03:15 UTC

[R] Make 2nd col of 2-col df into header row of same df then adjust col1 data display

# I have a dataframe that contains 2 columns:
CaseID  <- c('1015285',
'1005317',
'1012281',
'1015285',
'1015285',
'1007183',
'1008833',
'1015315',
'1015322',
'1015285')

Primary.Viol.Type <- c('AS.Age',
'HS.Hours',
'HS.Hours',
'HS.Hours',
'RK.Records_CL',
'OT.Overtime',
'OT.Overtime',
'OT.Overtime',
'V.Poster_Other',
'V.Poster_Other')

PViol.Type.Per.Case.Original <- data.frame(CaseID,Primary.Viol.Type)

# CaseID?s can be repeated because there can be up to 14 Primary.Viol.Type?s
per CaseID.

# I want to transform this dataframe into one that has 15 columns, where the
first column is CaseID, and the rest are the 14 primary viol. types.  The
CaseID column will contain a list of the unique CaseID?s (no replicates) and
for each of their rows, there will be a ?1? under  a column corresponding to
a primary violation type recorded for that CaseID.  So, technically, there
could be zero to 14 ?1?s? in a CaseID?s row.

# For example, the row for CaseID '1015285' above would have a ?1? under
?AS.Age?, ?HS.Hours?, ?RK.Records_CL?, and ?V.Poster_Other?, but have
"NA"
under the rest of the columns.

PViol.Type <- c("CaseID",
                "BW.BackWages",
           "LD.Liquid_Damages",
           "MW.Minimum_Wage",
           "OT.Overtime",
           "RK.Records_FLSA",
           "V.Poster_Other",
           "AS.Age",
           "BW.WHMIS_BackWages",
           "HS.Hours",
           "OA.HazOccupationAg",
           "ON.HazOccupationNonAg",
           "R3.Reg3AgeOccupation",
           "RK.Records_CL",
           "V.Other")

PViol.Type.Columns <- t(data.frame(PViol.Type)

# What is the best way to do this in R?




--
View this message in context:
http://r.789695.n4.nabble.com/Make-2nd-col-of-2-col-df-into-header-row-of-same-df-then-adjust-col1-data-display-tp4700878.html
Sent from the R help mailing list archive at Nabble.com.

Boris Steipe

2014-Dec-18 10:28 UTC

head link

[R] Make 2nd col of 2-col df into header row of same df then adjust col1 data display

What you are describing sounds like a very spreadsheet-y thing. 

- The information is already IN your dataframe, and easy to get out by
subsetting. Depending on your usecase, that may actually be the
"best".

- If the number of CaseIDs is large, I would use a hash of lists (if the data is
sparse), or hash of named vectors if it's not sparse. Lookup is O(1) so that
may be the best. (Cf package hash, and explanations there).

- If it must be the spreadsheet-y thing, you could make a matrix with rownames
and colnames taken from unique() of your respective dataframe. Instead of 1 and
NA I probably would use TRUE/FALSE.

- If it takes less time to wait for the results than to look up how apply()
works, you can write a simple loop to populate your matrix. Otherwise apply() is
much faster.

- You could even use a loop to build the datastructure, checking for every
cbind() whether the value in column 1 already exists in the table - but
that's terrible and would make a kitten die somewhere on every iteration.

All of these are possible, and you haven't told us enough about what you
want to achieve to figure out what the "best" is. If you choose one of
the options and need help with the code, let us know.

Cheers,
B.





On Dec 17, 2014, at 10:15 PM, bcrombie <bcrombie at utk.edu> wrote:
> # I have a dataframe that contains 2 columns:
> CaseID  <- c('1015285',
> '1005317',
> '1012281',
> '1015285',
> '1015285',
> '1007183',
> '1008833',
> '1015315',
> '1015322',
> '1015285')
> 
> Primary.Viol.Type <- c('AS.Age',
> 'HS.Hours',
> 'HS.Hours',
> 'HS.Hours',
> 'RK.Records_CL',
> 'OT.Overtime',
> 'OT.Overtime',
> 'OT.Overtime',
> 'V.Poster_Other',
> 'V.Poster_Other')
> 
> PViol.Type.Per.Case.Original <- data.frame(CaseID,Primary.Viol.Type)
> 
> # CaseID?s can be repeated because there can be up to 14
Primary.Viol.Type?s
> per CaseID.
> 
> # I want to transform this dataframe into one that has 15 columns, where
the
> first column is CaseID, and the rest are the 14 primary viol. types.  The
> CaseID column will contain a list of the unique CaseID?s (no replicates)
and
> for each of their rows, there will be a ?1? under  a column corresponding
to
> a primary violation type recorded for that CaseID.  So, technically, there
> could be zero to 14 ?1?s? in a CaseID?s row.
> 
> # For example, the row for CaseID '1015285' above would have a ?1?
under
> ?AS.Age?, ?HS.Hours?, ?RK.Records_CL?, and ?V.Poster_Other?, but have
"NA"
> under the rest of the columns.
> 
> PViol.Type <- c("CaseID",
>                "BW.BackWages",
>           "LD.Liquid_Damages",
>           "MW.Minimum_Wage",
>           "OT.Overtime",
>           "RK.Records_FLSA",
>           "V.Poster_Other",
>           "AS.Age",
>           "BW.WHMIS_BackWages",
>           "HS.Hours",
>           "OA.HazOccupationAg",
>           "ON.HazOccupationNonAg",
>           "R3.Reg3AgeOccupation",
>           "RK.Records_CL",
>           "V.Other")
> 
> PViol.Type.Columns <- t(data.frame(PViol.Type)
> 
> # What is the best way to do this in R?
> 
> 
> 
> 
> --
> View this message in context:
http://r.789695.n4.nabble.com/Make-2nd-col-of-2-col-df-into-header-row-of-same-df-then-adjust-col1-data-display-tp4700878.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Crombie, Burnette N

2014-Dec-18 13:09 UTC

head link

[R] Make 2nd col of 2-col df into header row of same df then adjust col1 data display

I want to achieve a table that looks like a grid of 1's for all cases in a
survey.  I'm an R beginner and don't have a clue how to do all the
things you just suggested.  I really appreciate the time you took to explain all
of those options, though.  -- BNC

-----Original Message-----
From: Boris Steipe [mailto:boris.steipe at utoronto.ca] 
Sent: Thursday, December 18, 2014 5:29 AM
To: Crombie, Burnette N
Cc: r-help at r-project.org
Subject: Re: [R] Make 2nd col of 2-col df into header row of same df then adjust
col1 data display

What you are describing sounds like a very spreadsheet-y thing. 

- The information is already IN your dataframe, and easy to get out by
subsetting. Depending on your usecase, that may actually be the
"best".

- If the number of CaseIDs is large, I would use a hash of lists (if the data is
sparse), or hash of named vectors if it's not sparse. Lookup is O(1) so that
may be the best. (Cf package hash, and explanations there).

- If it must be the spreadsheet-y thing, you could make a matrix with rownames
and colnames taken from unique() of your respective dataframe. Instead of 1 and
NA I probably would use TRUE/FALSE.

- If it takes less time to wait for the results than to look up how apply()
works, you can write a simple loop to populate your matrix. Otherwise apply() is
much faster.

- You could even use a loop to build the datastructure, checking for every
cbind() whether the value in column 1 already exists in the table - but
that's terrible and would make a kitten die somewhere on every iteration.

All of these are possible, and you haven't told us enough about what you
want to achieve to figure out what the "best" is. If you choose one of
the options and need help with the code, let us know.

Cheers,
B.





On Dec 17, 2014, at 10:15 PM, bcrombie <bcrombie at utk.edu> wrote:
> # I have a dataframe that contains 2 columns:
> CaseID  <- c('1015285',
> '1005317',
> '1012281',
> '1015285',
> '1015285',
> '1007183',
> '1008833',
> '1015315',
> '1015322',
> '1015285')
> 
> Primary.Viol.Type <- c('AS.Age',
> 'HS.Hours',
> 'HS.Hours',
> 'HS.Hours',
> 'RK.Records_CL',
> 'OT.Overtime',
> 'OT.Overtime',
> 'OT.Overtime',
> 'V.Poster_Other',
> 'V.Poster_Other')
> 
> PViol.Type.Per.Case.Original <- data.frame(CaseID,Primary.Viol.Type)
> 
> # CaseID's can be repeated because there can be up to 14 
> Primary.Viol.Type's per CaseID.
> 
> # I want to transform this dataframe into one that has 15 columns, 
> where the first column is CaseID, and the rest are the 14 primary 
> viol. types.  The CaseID column will contain a list of the unique 
> CaseID's (no replicates) and for each of their rows, there will be a 
> "1" under  a column corresponding to a primary violation type
recorded
> for that CaseID.  So, technically, there could be zero to 14
"1's" in a CaseID's row.
> 
> # For example, the row for CaseID '1015285' above would have a
"1"
> under "AS.Age", "HS.Hours", "RK.Records_CL",
and "V.Poster_Other", but have "NA"
> under the rest of the columns.
> 
> PViol.Type <- c("CaseID",
>                "BW.BackWages",
>           "LD.Liquid_Damages",
>           "MW.Minimum_Wage",
>           "OT.Overtime",
>           "RK.Records_FLSA",
>           "V.Poster_Other",
>           "AS.Age",
>           "BW.WHMIS_BackWages",
>           "HS.Hours",
>           "OA.HazOccupationAg",
>           "ON.HazOccupationNonAg",
>           "R3.Reg3AgeOccupation",
>           "RK.Records_CL",
>           "V.Other")
> 
> PViol.Type.Columns <- t(data.frame(PViol.Type)
> 
> # What is the best way to do this in R?
> 
> 
> 
> 
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Make-2nd-col-of-2-col-df-into-header-row
> -of-same-df-then-adjust-col1-data-display-tp4700878.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

John Kane

2014-Dec-18 15:05 UTC

head link

[R] Make 2nd col of 2-col df into header row of same df then adjust col1 data display

Of course, but why? As Brian S says you have not given us enough information to
know exactly what you are after.

Have a look at https://github.com/hadley/devtools/wiki/Reproducibility or
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
for some information on how to form a question for the list.

It is good that you provided some data but it is better to use dput() (see links
above or ?dput) to supply the data as different R users have different settings
on their systems and may not read that data in the same way.

Note that I have simplified your incredibly verbose names and put everything
into lower case (see ?tolower) just to make life easier. Because R is
case-sensitive, it is usually easier to keep to lower case as much as possible
particularly when posting to the list and to use simple variable names where the
actual variables are likely to meaningless to the reader and long upper case
names just makes for more typing.

In any case here is a quick and dirty semi-solution using the reshape2 package
which I imagine you will have to install using
?install.packages("reshape2").

Depending on exactly what you need to know there may be, as Brian S says many
different and better approaches. While we really don't need the actual
variable names we need an overall idea of what you are going in substantive
terms and what the final results are.

Anyway welcome to the R-help list

#========start code====library(reshape2)
dat1  <-  structure(list(id = structure(c(5L, 1L, 4L, 5L, 5L, 2L, 3L, 6L, 
7L, 5L), .Label = c("1005317", "1007183",
"1008833", "1012281",
"1015285", "1015315", "1015322"), class =
"factor"), type = structure(c(1L,
2L, 2L, 2L, 4L, 3L, 3L, 3L, 5L, 5L), .Label = c("as.age",
"hs.hours",
"ot.overtime", "rk.records_cl", "v.poster_other"),
class = "factor")), .Names = c("id",
"type"), row.names = c(NA, -10L), class = "data.frame")
  

dcast(dat1, id ~ type)

#=======end code ======
John Kane
Kingston ON Canada

> -----Original Message-----
> From: bcrombie at utk.edu
> Sent: Wed, 17 Dec 2014 19:15:14 -0800 (PST)
> To: r-help at r-project.org
> Subject: [R] Make 2nd col of 2-col df into header row of same df then
> adjust col1 data display
> 
> # I have a dataframe that contains 2 columns:
> CaseID  <- c('1015285',
> '1005317',
> '1012281',
> '1015285',
> '1015285',
> '1007183',
> '1008833',
> '1015315',
> '1015322',
> '1015285')
> 
> Primary.Viol.Type <- c('AS.Age',
> 'HS.Hours',
> 'HS.Hours',
> 'HS.Hours',
> 'RK.Records_CL',
> 'OT.Overtime',
> 'OT.Overtime',
> 'OT.Overtime',
> 'V.Poster_Other',
> 'V.Poster_Other')
> 
> PViol.Type.Per.Case.Original <- data.frame(CaseID,Primary.Viol.Type)
> 
> # CaseID?s can be repeated because there can be up to 14
> Primary.Viol.Type?s
> per CaseID.
> 
> # I want to transform this dataframe into one that has 15 columns, where
> the
> first column is CaseID, and the rest are the 14 primary viol. types.  The
> CaseID column will contain a list of the unique CaseID?s (no replicates)
> and
> for each of their rows, there will be a ?1? under  a column corresponding
> to
> a primary violation type recorded for that CaseID.  So, technically,
> there
> could be zero to 14 ?1?s? in a CaseID?s row.
> 
> # For example, the row for CaseID '1015285' above would have a ?1?
under
> ?AS.Age?, ?HS.Hours?, ?RK.Records_CL?, and ?V.Poster_Other?, but have
> "NA"
> under the rest of the columns.
> 
> PViol.Type <- c("CaseID",
>                 "BW.BackWages",
>            "LD.Liquid_Damages",
>            "MW.Minimum_Wage",
>            "OT.Overtime",
>            "RK.Records_FLSA",
>            "V.Poster_Other",
>            "AS.Age",
>            "BW.WHMIS_BackWages",
>            "HS.Hours",
>            "OA.HazOccupationAg",
>            "ON.HazOccupationNonAg",
>            "R3.Reg3AgeOccupation",
>            "RK.Records_CL",
>            "V.Other")
> 
> PViol.Type.Columns <- t(data.frame(PViol.Type)
> 
> # What is the best way to do this in R?
> 
> 
> 
> 
> --
> View this message in context:
>
http://r.789695.n4.nabble.com/Make-2nd-col-of-2-col-df-into-header-row-of-same-df-then-adjust-col1-data-display-tp4700878.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
____________________________________________________________
FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!

Jeff Newmiller

2014-Dec-18 16:02 UTC

head link

[R] Make 2nd col of 2-col df into header row of same df then adjust col1 data display

No guarantees on "best"... but one way using base R could be:

# Note that "CaseID" is actually not a valid PViol.Type as you had it
PViol.Type <- c( "BW.BackWages"
                , "LD.Liquid_Damages"
                , "MW.Minimum_Wage"
                , "OT.Overtime"
                , "RK.Records_FLSA"
                , "V.Poster_Other"
                , "AS.Age"
                , "BW.WHMIS_BackWages"
                , "HS.Hours"
                , "OA.HazOccupationAg"
                , "ON.HazOccupationNonAg"
                , "R3.Reg3AgeOccupation"
                , "RK.Records_CL"
                , "V.Other" )

# explicitly specifying all levels to the factor insures a complete
# set of column outputs regardless of what is in the input
PViol.Type.Per.Case.Original <-
     data.frame( CaseID
               , Primary.Viol.Type=factor( Primary.Viol.Type
                                         , levels=PViol.Type ) )

tmp <- table( PViol.Type.Per.Case.Original )
ans <- data.frame( CaseID=rownames( tmp )
                  , as.data.frame( ifelse( 0==tmp, NA, 1 ) )
                  )


On Wed, 17 Dec 2014, bcrombie wrote:
> # I have a dataframe that contains 2 columns:
> CaseID  <- c('1015285',
> '1005317',
> '1012281',
> '1015285',
> '1015285',
> '1007183',
> '1008833',
> '1015315',
> '1015322',
> '1015285')
>
> Primary.Viol.Type <- c('AS.Age',
> 'HS.Hours',
> 'HS.Hours',
> 'HS.Hours',
> 'RK.Records_CL',
> 'OT.Overtime',
> 'OT.Overtime',
> 'OT.Overtime',
> 'V.Poster_Other',
> 'V.Poster_Other')
>
> PViol.Type.Per.Case.Original <- data.frame(CaseID,Primary.Viol.Type)
>
> # CaseID?s can be repeated because there can be up to 14
Primary.Viol.Type?s
> per CaseID.
>
> # I want to transform this dataframe into one that has 15 columns, where
the
> first column is CaseID, and the rest are the 14 primary viol. types.  The
> CaseID column will contain a list of the unique CaseID?s (no replicates)
and
> for each of their rows, there will be a ?1? under  a column corresponding
to
> a primary violation type recorded for that CaseID.  So, technically, there
> could be zero to 14 ?1?s? in a CaseID?s row.
>
> # For example, the row for CaseID '1015285' above would have a ?1?
under
> ?AS.Age?, ?HS.Hours?, ?RK.Records_CL?, and ?V.Poster_Other?, but have
"NA"
> under the rest of the columns.
>
> PViol.Type <- c("CaseID",
>                "BW.BackWages",
>           "LD.Liquid_Damages",
>           "MW.Minimum_Wage",
>           "OT.Overtime",
>           "RK.Records_FLSA",
>           "V.Poster_Other",
>           "AS.Age",
>           "BW.WHMIS_BackWages",
>           "HS.Hours",
>           "OA.HazOccupationAg",
>           "ON.HazOccupationNonAg",
>           "R3.Reg3AgeOccupation",
>           "RK.Records_CL",
>           "V.Other")
>
> PViol.Type.Columns <- t(data.frame(PViol.Type)
>
> # What is the best way to do this in R?
>
>
>
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/Make-2nd-col-of-2-col-df-into-header-row-of-same-df-then-adjust-col1-data-display-tp4700878.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k

Chel Hee Lee

2014-Dec-18 19:43 UTC

head link

[R] Make 2nd col of 2-col df into header row of same df then adjust col1 data display

I like the approach presented by Jeff Newmiller as shown in the previous 
post (I really like his way).  As he suggested, it would be good to 
start with 'factor' since you have all values of
'Primary.Viol.Type'.
You may try to use 'split()' function for creating table that you wish 
to build.  Please see the below (I hope this helps):

 > PViol.Type.Per.Case.Original$Primary.Viol.Type <- 
factor(Primary.Viol.Type, levels=PViol.Type, labels=PViol.Type)
 >
 > tmp <- split(PViol.Type.Per.Case.Original, 
PViol.Type.Per.Case.Original$CaseID)
 > ans <- ifelse(do.call(rbind, lapply(tmp, function(x) 
table(x$Primary.Viol.Type))), 1, NA)
 > ans
         CaseID BW.BackWages LD.Liquid_Damages MW.Minimum_Wage OT.Overtime
1005317     NA           NA                NA              NA          NA
1007183     NA           NA                NA              NA           1
1008833     NA           NA                NA              NA           1
1012281     NA           NA                NA              NA          NA
1015285     NA           NA                NA              NA          NA
1015315     NA           NA                NA              NA           1
1015322     NA           NA                NA              NA          NA
         RK.Records_FLSA V.Poster_Other AS.Age BW.WHMIS_BackWages HS.Hours
1005317              NA             NA     NA                 NA        1
1007183              NA             NA     NA                 NA       NA
1008833              NA             NA     NA                 NA       NA
1012281              NA             NA     NA                 NA        1
1015285              NA              1      1                 NA        1
1015315              NA             NA     NA                 NA       NA
1015322              NA              1     NA                 NA       NA
         OA.HazOccupationAg ON.HazOccupationNonAg R3.Reg3AgeOccupation
1005317                 NA                    NA                   NA
1007183                 NA                    NA                   NA
1008833                 NA                    NA                   NA
1012281                 NA                    NA                   NA
1015285                 NA                    NA                   NA
1015315                 NA                    NA                   NA
1015322                 NA                    NA                   NA
         RK.Records_CL V.Other
1005317            NA      NA
1007183            NA      NA
1008833            NA      NA
1012281            NA      NA
1015285             1      NA
1015315            NA      NA
1015322            NA      NA
 >

Chel Hee Lee

On 12/18/2014 10:02 AM, Jeff Newmiller wrote:> No guarantees on "best"... but one way using base R could be:
>
> # Note that "CaseID" is actually not a valid PViol.Type as you
had it
> PViol.Type <- c( "BW.BackWages"
>                 , "LD.Liquid_Damages"
>                 , "MW.Minimum_Wage"
>                 , "OT.Overtime"
>                 , "RK.Records_FLSA"
>                 , "V.Poster_Other"
>                 , "AS.Age"
>                 , "BW.WHMIS_BackWages"
>                 , "HS.Hours"
>                 , "OA.HazOccupationAg"
>                 , "ON.HazOccupationNonAg"
>                 , "R3.Reg3AgeOccupation"
>                 , "RK.Records_CL"
>                 , "V.Other" )
>
> # explicitly specifying all levels to the factor insures a complete
> # set of column outputs regardless of what is in the input
> PViol.Type.Per.Case.Original <-
>      data.frame( CaseID
>                , Primary.Viol.Type=factor( Primary.Viol.Type
>                                          , levels=PViol.Type ) )
>
> tmp <- table( PViol.Type.Per.Case.Original )
> ans <- data.frame( CaseID=rownames( tmp )
>                   , as.data.frame( ifelse( 0==tmp, NA, 1 ) )
>                   )
>
>
> On Wed, 17 Dec 2014, bcrombie wrote:
>
>> # I have a dataframe that contains 2 columns:
>> CaseID  <- c('1015285',
>> '1005317',
>> '1012281',
>> '1015285',
>> '1015285',
>> '1007183',
>> '1008833',
>> '1015315',
>> '1015322',
>> '1015285')
>>
>> Primary.Viol.Type <- c('AS.Age',
>> 'HS.Hours',
>> 'HS.Hours',
>> 'HS.Hours',
>> 'RK.Records_CL',
>> 'OT.Overtime',
>> 'OT.Overtime',
>> 'OT.Overtime',
>> 'V.Poster_Other',
>> 'V.Poster_Other')
>>
>> PViol.Type.Per.Case.Original <- data.frame(CaseID,Primary.Viol.Type)
>>
>> # CaseID?s can be repeated because there can be up to 14
>> Primary.Viol.Type?s
>> per CaseID.
>>
>> # I want to transform this dataframe into one that has 15 columns,
>> where the
>> first column is CaseID, and the rest are the 14 primary viol. types. 
The
>> CaseID column will contain a list of the unique CaseID?s (no
>> replicates) and
>> for each of their rows, there will be a ?1? under  a column
>> corresponding to
>> a primary violation type recorded for that CaseID.  So, technically,
>> there
>> could be zero to 14 ?1?s? in a CaseID?s row.
>>
>> # For example, the row for CaseID '1015285' above would have a
?1? under
>> ?AS.Age?, ?HS.Hours?, ?RK.Records_CL?, and ?V.Poster_Other?, but have
>> "NA"
>> under the rest of the columns.
>>
>> PViol.Type <- c("CaseID",
>>                "BW.BackWages",
>>           "LD.Liquid_Damages",
>>           "MW.Minimum_Wage",
>>           "OT.Overtime",
>>           "RK.Records_FLSA",
>>           "V.Poster_Other",
>>           "AS.Age",
>>           "BW.WHMIS_BackWages",
>>           "HS.Hours",
>>           "OA.HazOccupationAg",
>>           "ON.HazOccupationNonAg",
>>           "R3.Reg3AgeOccupation",
>>           "RK.Records_CL",
>>           "V.Other")
>>
>> PViol.Type.Columns <- t(data.frame(PViol.Type)
>>
>> # What is the best way to do this in R?
>>
>>
>>
>>
>> --
>> View this message in context:
>>
http://r.789695.n4.nabble.com/Make-2nd-col-of-2-col-df-into-header-row-of-same-df-then-adjust-col1-data-display-tp4700878.html
>>
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ---------------------------------------------------------------------------
> Jeff Newmiller                        The     .....       .....  Go Live...
> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#. 
Live Go...
>                                        Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Jeff Newmiller

2014-Dec-19 04:21 UTC

head link

[R] Make 2nd col of 2-col df into header row of same df then adjust col1 data display

Please keep the list in the loop.

Take a look at my code again... the factor function can accept a vector of all
levels you want it to include.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

On December 18, 2014 7:35:28 PM PST, "Crombie, Burnette N"
<bcrombie at utk.edu> wrote:>Jeff, your code works fabulously on the dataset I submitted with my
>question, but I can't get it to retain all 14 of the PViol.Type's 
with
>my real dataset which is imported as a csv.  Do you have any ideas how
>to fix this?
>
>##########
>MERGE_PViol.Detail.Per.Case <-
>read.csv("~/FOIA_FLSA/MERGE_PViol.Detail.Per.Case_for_rtf10.csv",
>stringsAsFactors=TRUE)
>
>###select only certain columns from dataset
>PViol.Type.Per.Case <- MERGE_PViol.Detail.Per.Case[,c("CaseID",
>"Primary.Viol.Type")]
>PViol.Type.Per.Case$CaseID <- factor(PViol.Type.Per.Case$CaseID)
>str(PViol.Type.Per.Case)
>### 'data.frame':	13 obs. of  2 variables:
>###  $ CaseID           : Factor w/ 8 levels
"1005317","1007183",..: 5
>1 4 5 5 2 3 6 7 8 ...
>### $ Primary.Viol.Type: Factor w/ 5 levels
"AS.Age","HS.Hours",..: 1 2
>2 2 4 3 3 3 3 3 ...
>
>
>PViol.Type <- c("BW.BackWages",
>                "LD.Liquid_Damages",
>                "MW.Minimum_Wage",
>                "OT.Overtime",
>                "RK.Records_FLSA",
>                "V.Poster_Other",
>                "AS.Age",
>                "BW.WHMIS_BackWages",
>                "HS.Hours",
>                "OA.HazOccupationAg",
>                "ON.HazOccupationNonAg",
>                "R3.Reg3AgeOccupation",
>                "RK.Records_CL",
>                "V.Other")
>
>###### Jeff Newmiller (RHelp)
>#PViol.Type.Per.Case <- data.frame( CaseID, Primary.Viol.Type=factor(
>Primary.Viol.Type, levels=PViol.Type ) )
>
>tmp <- table( PViol.Type.Per.Case )
>ans <- data.frame( CaseID=rownames( tmp ), as.data.frame( ifelse(
>0==tmp, NA, 1 ) ))
>##########
>
>
>
>-----Original Message-----
>From: Crombie, Burnette N 
>Sent: Thursday, December 18, 2014 11:17 AM
>To: 'Jeff Newmiller'
>Subject: RE: [R] Make 2nd col of 2-col df into header row of same df
>then adjust col1 data display
>
>Thanks so much for your time.  I will reply as soon as possible.  I've
>been pulled away from my desk -- BNC
>
>-----Original Message-----
>From: Jeff Newmiller [mailto:jdnewmil at dcn.davis.ca.us]
>Sent: Thursday, December 18, 2014 11:02 AM
>To: Crombie, Burnette N
>Cc: r-help at r-project.org
>Subject: Re: [R] Make 2nd col of 2-col df into header row of same df
>then adjust col1 data display
>
>No guarantees on "best"... but one way using base R could be:
>
># Note that "CaseID" is actually not a valid PViol.Type as you had
it
>PViol.Type <- c( "BW.BackWages"
>                , "LD.Liquid_Damages"
>                , "MW.Minimum_Wage"
>                , "OT.Overtime"
>                , "RK.Records_FLSA"
>                , "V.Poster_Other"
>                , "AS.Age"
>                , "BW.WHMIS_BackWages"
>                , "HS.Hours"
>                , "OA.HazOccupationAg"
>                , "ON.HazOccupationNonAg"
>                , "R3.Reg3AgeOccupation"
>                , "RK.Records_CL"
>                , "V.Other" )
>
># explicitly specifying all levels to the factor insures a complete #
>set of column outputs regardless of what is in the input
>PViol.Type.Per.Case.Original <-
>     data.frame( CaseID
>               , Primary.Viol.Type=factor( Primary.Viol.Type
>                                         , levels=PViol.Type ) )
>
>tmp <- table( PViol.Type.Per.Case.Original ) ans <- data.frame(
>CaseID=rownames( tmp )
>                  , as.data.frame( ifelse( 0==tmp, NA, 1 ) )
>                  )
>
>
>On Wed, 17 Dec 2014, bcrombie wrote:
>
>> # I have a dataframe that contains 2 columns:
>> CaseID  <- c('1015285',
>> '1005317',
>> '1012281',
>> '1015285',
>> '1015285',
>> '1007183',
>> '1008833',
>> '1015315',
>> '1015322',
>> '1015285')
>>
>> Primary.Viol.Type <- c('AS.Age',
>> 'HS.Hours',
>> 'HS.Hours',
>> 'HS.Hours',
>> 'RK.Records_CL',
>> 'OT.Overtime',
>> 'OT.Overtime',
>> 'OT.Overtime',
>> 'V.Poster_Other',
>> 'V.Poster_Other')
>>
>> PViol.Type.Per.Case.Original <- data.frame(CaseID,Primary.Viol.Type)
>>
>> # CaseID?s can be repeated because there can be up to 14 
>> Primary.Viol.Type?s per CaseID.
>>
>> # I want to transform this dataframe into one that has 15 columns, 
>> where the first column is CaseID, and the rest are the 14 primary 
>> viol. types.  The CaseID column will contain a list of the unique 
>> CaseID?s (no replicates) and for each of their rows, there will be a 
>> ?1? under  a column corresponding to a primary violation type
>recorded 
>> for that CaseID.  So, technically, there could be zero to 14 ?1?s? in
>a CaseID?s row.
>>
>> # For example, the row for CaseID '1015285' above would have a
?1?
>> under ?AS.Age?, ?HS.Hours?, ?RK.Records_CL?, and ?V.Poster_Other?,
>but have "NA"
>> under the rest of the columns.
>>
>> PViol.Type <- c("CaseID",
>>                "BW.BackWages",
>>           "LD.Liquid_Damages",
>>           "MW.Minimum_Wage",
>>           "OT.Overtime",
>>           "RK.Records_FLSA",
>>           "V.Poster_Other",
>>           "AS.Age",
>>           "BW.WHMIS_BackWages",
>>           "HS.Hours",
>>           "OA.HazOccupationAg",
>>           "ON.HazOccupationNonAg",
>>           "R3.Reg3AgeOccupation",
>>           "RK.Records_CL",
>>           "V.Other")
>>
>> PViol.Type.Columns <- t(data.frame(PViol.Type)
>>
>> # What is the best way to do this in R?
>>
>>
>>
>>
>> --
>> View this message in context: 
>>
>http://r.789695.n4.nabble.com/Make-2nd-col-of-2-col-df-into-header-row
>> -of-same-df-then-adjust-col1-data-display-tp4700878.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see 
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>---------------------------------------------------------------------------
>Jeff Newmiller                        The     .....       .....  Go
>Live...
>DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#. 
Live
>Go...
>                                     Live:   OO#.. Dead: OO#..  Playing
>Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
>/Software/Embedded Controllers)               .OO#.       .OO#. 
>rocks...1k
>---------------------------------------------------------------------------

Chel Hee Lee

2014-Dec-19 05:35 UTC

head link

[R] Make 2nd col of 2-col df into header row of same df then adjust col1 data display

Please take a look at my code again.  The error message says that object 
'Primary.Viol.Type' not found.  Have you ever created the object 
'Primary.Viol.Type'?   It will be working if you replace 
'Primary.Viol.Type' by
'PViol.Type.Per.Case.Original$Primary.Viol.Type'
where 'factor()' is used.  I hope this helps.

Chel Hee Lee

On 12/18/2014 08:57 PM, Crombie, Burnette N wrote:> Chel, your solution is fantastic on the dataset I submitted in my question
but it is not working when I import my real dataset into R.  Do I need to
vectorize the columns in my real dataset after importing?  I tried a few things
(###) but not making progress:
>
> MERGE_PViol.Detail.Per.Case <-
read.csv("~/FOIA_FLSA/MERGE_PViol.Detail.Per.Case_for_rtf10.csv",
stringsAsFactors=TRUE)
>
> ### select only certain columns
> PViol.Type.Per.Case.Original <-
MERGE_PViol.Detail.Per.Case[,c("CaseID",
"Primary.Viol.Type")]
>
> ###
write.csv(PViol.Type.Per.Case,file="PViol.Type.Per.Case.Select.csv")
> ### PViol.Type.Per.Case.Original <-
read.csv("~/FOIA_FLSA/PViol.Type.Per.Case.Select.csv")
> ### PViol.Type.Per.Case.Original$X <- NULL
> ###PViol.Type.Per.Case.Original[] <-
lapply(PViol.Type.Per.Case.Original, as.character)
>
> PViol.Type <- c("CaseID",
>                  "BW.BackWages",
>                  "LD.Liquid_Damages",
>                  "MW.Minimum_Wage",
>                  "OT.Overtime",
>                  "RK.Records_FLSA",
>                  "V.Poster_Other",
>                  "AS.Age",
>                  "BW.WHMIS_BackWages",
>                  "HS.Hours",
>                  "OA.HazOccupationAg",
>                  "ON.HazOccupationNonAg",
>                  "R3.Reg3AgeOccupation",
>                  "RK.Records_CL",
>                  "V.Other")
>
> PViol.Type.Per.Case.Original$Primary.Viol.Type <-
factor(Primary.Viol.Type, levels=PViol.Type, labels=PViol.Type)
>
> ### Error in factor(Primary.Viol.Type, levels = PViol.Type, labels =
PViol.Type) :  object 'Primary.Viol.Type' not found
>
> tmp <-
split(PViol.Type.Per.Case.Original,PViol.Type.Per.Case.Original$CaseID)
> ans <- ifelse(do.call(rbind, lapply(tmp,
function(x)table(x$Primary.Viol.Type))), 1, NA)
>
>
>
> -----Original Message-----
> From: Crombie, Burnette N
> Sent: Thursday, December 18, 2014 3:01 PM
> To: 'Chel Hee Lee'
> Subject: RE: [R] Make 2nd col of 2-col df into header row of same df then
adjust col1 data display
>
> Thanks for taking the time to review this, Chel.  I've got to step away
from my desk, but will reply more substantially as soon as possible. -- BNC
>
> -----Original Message-----
> From: Chel Hee Lee [mailto:chl948 at mail.usask.ca]
> Sent: Thursday, December 18, 2014 2:43 PM
> To: Jeff Newmiller; Crombie, Burnette N
> Cc: r-help at r-project.org
> Subject: Re: [R] Make 2nd col of 2-col df into header row of same df then
adjust col1 data display
>
> I like the approach presented by Jeff Newmiller as shown in the previous
post (I really like his way).  As he suggested, it would be good to start with
'factor' since you have all values of 'Primary.Viol.Type'.
> You may try to use 'split()' function for creating table that you
wish to build.  Please see the below (I hope this helps):
>
>   > PViol.Type.Per.Case.Original$Primary.Viol.Type <-
factor(Primary.Viol.Type, levels=PViol.Type, labels=PViol.Type)  >  > tmp
<- split(PViol.Type.Per.Case.Original,
> PViol.Type.Per.Case.Original$CaseID)
>   > ans <- ifelse(do.call(rbind, lapply(tmp, function(x)
table(x$Primary.Viol.Type))), 1, NA)  > ans
>           CaseID BW.BackWages LD.Liquid_Damages MW.Minimum_Wage OT.Overtime
> 1005317     NA           NA                NA              NA          NA
> 1007183     NA           NA                NA              NA           1
> 1008833     NA           NA                NA              NA           1
> 1012281     NA           NA                NA              NA          NA
> 1015285     NA           NA                NA              NA          NA
> 1015315     NA           NA                NA              NA           1
> 1015322     NA           NA                NA              NA          NA
>           RK.Records_FLSA V.Poster_Other AS.Age BW.WHMIS_BackWages HS.Hours
> 1005317              NA             NA     NA                 NA        1
> 1007183              NA             NA     NA                 NA       NA
> 1008833              NA             NA     NA                 NA       NA
> 1012281              NA             NA     NA                 NA        1
> 1015285              NA              1      1                 NA        1
> 1015315              NA             NA     NA                 NA       NA
> 1015322              NA              1     NA                 NA       NA
>           OA.HazOccupationAg ON.HazOccupationNonAg R3.Reg3AgeOccupation
> 1005317                 NA                    NA                   NA
> 1007183                 NA                    NA                   NA
> 1008833                 NA                    NA                   NA
> 1012281                 NA                    NA                   NA
> 1015285                 NA                    NA                   NA
> 1015315                 NA                    NA                   NA
> 1015322                 NA                    NA                   NA
>           RK.Records_CL V.Other
> 1005317            NA      NA
> 1007183            NA      NA
> 1008833            NA      NA
> 1012281            NA      NA
> 1015285             1      NA
> 1015315            NA      NA
> 1015322            NA      NA
>   >
>
> Chel Hee Lee
>
> On 12/18/2014 10:02 AM, Jeff Newmiller wrote:
>> No guarantees on "best"... but one way using base R could be:
>>
>> # Note that "CaseID" is actually not a valid PViol.Type as
you had it
>> PViol.Type <- c( "BW.BackWages"
>>                  , "LD.Liquid_Damages"
>>                  , "MW.Minimum_Wage"
>>                  , "OT.Overtime"
>>                  , "RK.Records_FLSA"
>>                  , "V.Poster_Other"
>>                  , "AS.Age"
>>                  , "BW.WHMIS_BackWages"
>>                  , "HS.Hours"
>>                  , "OA.HazOccupationAg"
>>                  , "ON.HazOccupationNonAg"
>>                  , "R3.Reg3AgeOccupation"
>>                  , "RK.Records_CL"
>>                  , "V.Other" )
>>
>> # explicitly specifying all levels to the factor insures a complete #
>> set of column outputs regardless of what is in the input
>> PViol.Type.Per.Case.Original <-
>>       data.frame( CaseID
>>                 , Primary.Viol.Type=factor( Primary.Viol.Type
>>                                           , levels=PViol.Type ) )
>>
>> tmp <- table( PViol.Type.Per.Case.Original ) ans <- data.frame(
>> CaseID=rownames( tmp )
>>                    , as.data.frame( ifelse( 0==tmp, NA, 1 ) )
>>                    )
>>
>>
>> On Wed, 17 Dec 2014, bcrombie wrote:
>>
>>> # I have a dataframe that contains 2 columns:
>>> CaseID  <- c('1015285',
>>> '1005317',
>>> '1012281',
>>> '1015285',
>>> '1015285',
>>> '1007183',
>>> '1008833',
>>> '1015315',
>>> '1015322',
>>> '1015285')
>>>
>>> Primary.Viol.Type <- c('AS.Age',
>>> 'HS.Hours',
>>> 'HS.Hours',
>>> 'HS.Hours',
>>> 'RK.Records_CL',
>>> 'OT.Overtime',
>>> 'OT.Overtime',
>>> 'OT.Overtime',
>>> 'V.Poster_Other',
>>> 'V.Poster_Other')
>>>
>>> PViol.Type.Per.Case.Original <-
data.frame(CaseID,Primary.Viol.Type)
>>>
>>> # CaseID?s can be repeated because there can be up to 14
>>> Primary.Viol.Type?s per CaseID.
>>>
>>> # I want to transform this dataframe into one that has 15 columns,
>>> where the first column is CaseID, and the rest are the 14 primary
>>> viol. types.  The CaseID column will contain a list of the unique
>>> CaseID?s (no
>>> replicates) and
>>> for each of their rows, there will be a ?1? under  a column
>>> corresponding to a primary violation type recorded for that CaseID.
>>> So, technically, there could be zero to 14 ?1?s? in a CaseID?s row.
>>>
>>> # For example, the row for CaseID '1015285' above would
have a ?1?
>>> under ?AS.Age?, ?HS.Hours?, ?RK.Records_CL?, and ?V.Poster_Other?,
>>> but have "NA"
>>> under the rest of the columns.
>>>
>>> PViol.Type <- c("CaseID",
>>>                 "BW.BackWages",
>>>            "LD.Liquid_Damages",
>>>            "MW.Minimum_Wage",
>>>            "OT.Overtime",
>>>            "RK.Records_FLSA",
>>>            "V.Poster_Other",
>>>            "AS.Age",
>>>            "BW.WHMIS_BackWages",
>>>            "HS.Hours",
>>>            "OA.HazOccupationAg",
>>>            "ON.HazOccupationNonAg",
>>>            "R3.Reg3AgeOccupation",
>>>            "RK.Records_CL",
>>>            "V.Other")
>>>
>>> PViol.Type.Columns <- t(data.frame(PViol.Type)
>>>
>>> # What is the best way to do this in R?
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>>
http://r.789695.n4.nabble.com/Make-2nd-col-of-2-col-df-into-header-ro
>>> w-of-same-df-then-adjust-col1-data-display-tp4700878.html
>>>
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
---------------------------------------------------------------------------
>> Jeff Newmiller                        The     .....       .....  Go
Live...
>> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.      
##.#.  Live Go...
>>                                         Live:   OO#.. Dead: OO#.. 
Playing
>> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
>> /Software/Embedded Controllers)               .OO#.       .OO#. 
rocks...1k
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

Sven E. Templer

2014-Dec-19 09:13 UTC

head link

[R] Make 2nd col of 2-col df into header row of same df then adjust col1 data display

Another solution:

CaseID <- c("1015285", "1005317", "1012281",
"1015285", "1015285", "1007183",
"1008833", "1015315", "1015322",
"1015285")
Primary.Viol.Type <- c("AS.Age", "HS.Hours",
"HS.Hours", "HS.Hours",
"RK.Records_CL",
"OT.Overtime", "OT.Overtime", "OT.Overtime",
"V.Poster_Other",
"V.Poster_Other")

library(reshape2)
dcast(data.frame(CaseID, Primary.Viol.Type), CaseID~Primary.Viol.Type, length)

# result:

Using Primary.Viol.Type as value column: use value.var to override.
   CaseID AS.Age HS.Hours OT.Overtime RK.Records_CL V.Poster_Other
1 1005317      0        1           0             0              0
2 1007183      0        0           1             0              0
3 1008833      0        0           1             0              0
4 1012281      0        1           0             0              0
5 1015285      1        1           0             1              1
6 1015315      0        0           1             0              0
7 1015322      0        0           0             0              1


best, s.

On 19 December 2014 at 06:35, Chel Hee Lee <chl948 at mail.usask.ca>
wrote:> Please take a look at my code again.  The error message says that object
> 'Primary.Viol.Type' not found.  Have you ever created the object
> 'Primary.Viol.Type'?   It will be working if you replace
'Primary.Viol.Type'
> by 'PViol.Type.Per.Case.Original$Primary.Viol.Type' where
'factor()' is
> used.  I hope this helps.
>
> Chel Hee Lee
>
> On 12/18/2014 08:57 PM, Crombie, Burnette N wrote:
>>
>> Chel, your solution is fantastic on the dataset I submitted in my
question
>> but it is not working when I import my real dataset into R.  Do I need
to
>> vectorize the columns in my real dataset after importing?  I tried a
few
>> things (###) but not making progress:
>>
>> MERGE_PViol.Detail.Per.Case <-
>>
read.csv("~/FOIA_FLSA/MERGE_PViol.Detail.Per.Case_for_rtf10.csv",
>> stringsAsFactors=TRUE)
>>
>> ### select only certain columns
>> PViol.Type.Per.Case.Original <-
MERGE_PViol.Detail.Per.Case[,c("CaseID",
>> "Primary.Viol.Type")]
>>
>> ###
write.csv(PViol.Type.Per.Case,file="PViol.Type.Per.Case.Select.csv")
>> ### PViol.Type.Per.Case.Original <-
>> read.csv("~/FOIA_FLSA/PViol.Type.Per.Case.Select.csv")
>> ### PViol.Type.Per.Case.Original$X <- NULL
>> ###PViol.Type.Per.Case.Original[] <-
lapply(PViol.Type.Per.Case.Original,
>> as.character)
>>
>> PViol.Type <- c("CaseID",
>>                  "BW.BackWages",
>>                  "LD.Liquid_Damages",
>>                  "MW.Minimum_Wage",
>>                  "OT.Overtime",
>>                  "RK.Records_FLSA",
>>                  "V.Poster_Other",
>>                  "AS.Age",
>>                  "BW.WHMIS_BackWages",
>>                  "HS.Hours",
>>                  "OA.HazOccupationAg",
>>                  "ON.HazOccupationNonAg",
>>                  "R3.Reg3AgeOccupation",
>>                  "RK.Records_CL",
>>                  "V.Other")
>>
>> PViol.Type.Per.Case.Original$Primary.Viol.Type <-
>> factor(Primary.Viol.Type, levels=PViol.Type, labels=PViol.Type)
>>
>> ### Error in factor(Primary.Viol.Type, levels = PViol.Type, labels
>> PViol.Type) :  object 'Primary.Viol.Type' not found
>>
>> tmp <-
>> split(PViol.Type.Per.Case.Original,PViol.Type.Per.Case.Original$CaseID)
>> ans <- ifelse(do.call(rbind, lapply(tmp,
>> function(x)table(x$Primary.Viol.Type))), 1, NA)
>>
>>
>>
>> -----Original Message-----
>> From: Crombie, Burnette N
>> Sent: Thursday, December 18, 2014 3:01 PM
>> To: 'Chel Hee Lee'
>> Subject: RE: [R] Make 2nd col of 2-col df into header row of same df
then
>> adjust col1 data display
>>
>> Thanks for taking the time to review this, Chel.  I've got to step
away
>> from my desk, but will reply more substantially as soon as possible. --
BNC
>>
>> -----Original Message-----
>> From: Chel Hee Lee [mailto:chl948 at mail.usask.ca]
>> Sent: Thursday, December 18, 2014 2:43 PM
>> To: Jeff Newmiller; Crombie, Burnette N
>> Cc: r-help at r-project.org
>> Subject: Re: [R] Make 2nd col of 2-col df into header row of same df
then
>> adjust col1 data display
>>
>> I like the approach presented by Jeff Newmiller as shown in the
previous
>> post (I really like his way).  As he suggested, it would be good to
start
>> with 'factor' since you have all values of
'Primary.Viol.Type'.
>> You may try to use 'split()' function for creating table that
you wish to
>> build.  Please see the below (I hope this helps):
>>
>>   > PViol.Type.Per.Case.Original$Primary.Viol.Type <-
>> factor(Primary.Viol.Type, levels=PViol.Type, labels=PViol.Type)  > 
> tmp <-
>> split(PViol.Type.Per.Case.Original,
>> PViol.Type.Per.Case.Original$CaseID)
>>   > ans <- ifelse(do.call(rbind, lapply(tmp, function(x)
>> table(x$Primary.Viol.Type))), 1, NA)  > ans
>>           CaseID BW.BackWages LD.Liquid_Damages MW.Minimum_Wage
>> OT.Overtime
>> 1005317     NA           NA                NA              NA         
NA
>> 1007183     NA           NA                NA              NA          
1
>> 1008833     NA           NA                NA              NA          
1
>> 1012281     NA           NA                NA              NA         
NA
>> 1015285     NA           NA                NA              NA         
NA
>> 1015315     NA           NA                NA              NA          
1
>> 1015322     NA           NA                NA              NA         
NA
>>           RK.Records_FLSA V.Poster_Other AS.Age BW.WHMIS_BackWages
>> HS.Hours
>> 1005317              NA             NA     NA                 NA       
1
>> 1007183              NA             NA     NA                 NA      
NA
>> 1008833              NA             NA     NA                 NA      
NA
>> 1012281              NA             NA     NA                 NA       
1
>> 1015285              NA              1      1                 NA       
1
>> 1015315              NA             NA     NA                 NA      
NA
>> 1015322              NA              1     NA                 NA      
NA
>>           OA.HazOccupationAg ON.HazOccupationNonAg R3.Reg3AgeOccupation
>> 1005317                 NA                    NA                   NA
>> 1007183                 NA                    NA                   NA
>> 1008833                 NA                    NA                   NA
>> 1012281                 NA                    NA                   NA
>> 1015285                 NA                    NA                   NA
>> 1015315                 NA                    NA                   NA
>> 1015322                 NA                    NA                   NA
>>           RK.Records_CL V.Other
>> 1005317            NA      NA
>> 1007183            NA      NA
>> 1008833            NA      NA
>> 1012281            NA      NA
>> 1015285             1      NA
>> 1015315            NA      NA
>> 1015322            NA      NA
>>   >
>>
>> Chel Hee Lee
>>
>> On 12/18/2014 10:02 AM, Jeff Newmiller wrote:
>>>
>>> No guarantees on "best"... but one way using base R could
be:
>>>
>>> # Note that "CaseID" is actually not a valid PViol.Type
as you had it
>>> PViol.Type <- c( "BW.BackWages"
>>>                  , "LD.Liquid_Damages"
>>>                  , "MW.Minimum_Wage"
>>>                  , "OT.Overtime"
>>>                  , "RK.Records_FLSA"
>>>                  , "V.Poster_Other"
>>>                  , "AS.Age"
>>>                  , "BW.WHMIS_BackWages"
>>>                  , "HS.Hours"
>>>                  , "OA.HazOccupationAg"
>>>                  , "ON.HazOccupationNonAg"
>>>                  , "R3.Reg3AgeOccupation"
>>>                  , "RK.Records_CL"
>>>                  , "V.Other" )
>>>
>>> # explicitly specifying all levels to the factor insures a complete
#
>>> set of column outputs regardless of what is in the input
>>> PViol.Type.Per.Case.Original <-
>>>       data.frame( CaseID
>>>                 , Primary.Viol.Type=factor( Primary.Viol.Type
>>>                                           , levels=PViol.Type ) )
>>>
>>> tmp <- table( PViol.Type.Per.Case.Original ) ans <-
data.frame(
>>> CaseID=rownames( tmp )
>>>                    , as.data.frame( ifelse( 0==tmp, NA, 1 ) )
>>>                    )
>>>
>>>
>>> On Wed, 17 Dec 2014, bcrombie wrote:
>>>
>>>> # I have a dataframe that contains 2 columns:
>>>> CaseID  <- c('1015285',
>>>> '1005317',
>>>> '1012281',
>>>> '1015285',
>>>> '1015285',
>>>> '1007183',
>>>> '1008833',
>>>> '1015315',
>>>> '1015322',
>>>> '1015285')
>>>>
>>>> Primary.Viol.Type <- c('AS.Age',
>>>> 'HS.Hours',
>>>> 'HS.Hours',
>>>> 'HS.Hours',
>>>> 'RK.Records_CL',
>>>> 'OT.Overtime',
>>>> 'OT.Overtime',
>>>> 'OT.Overtime',
>>>> 'V.Poster_Other',
>>>> 'V.Poster_Other')
>>>>
>>>> PViol.Type.Per.Case.Original <-
data.frame(CaseID,Primary.Viol.Type)
>>>>
>>>> # CaseID?s can be repeated because there can be up to 14
>>>> Primary.Viol.Type?s per CaseID.
>>>>
>>>> # I want to transform this dataframe into one that has 15
columns,
>>>> where the first column is CaseID, and the rest are the 14
primary
>>>> viol. types.  The CaseID column will contain a list of the
unique
>>>> CaseID?s (no
>>>> replicates) and
>>>> for each of their rows, there will be a ?1? under  a column
>>>> corresponding to a primary violation type recorded for that
CaseID.
>>>> So, technically, there could be zero to 14 ?1?s? in a CaseID?s
row.
>>>>
>>>> # For example, the row for CaseID '1015285' above would
have a ?1?
>>>> under ?AS.Age?, ?HS.Hours?, ?RK.Records_CL?, and
?V.Poster_Other?,
>>>> but have "NA"
>>>> under the rest of the columns.
>>>>
>>>> PViol.Type <- c("CaseID",
>>>>                 "BW.BackWages",
>>>>            "LD.Liquid_Damages",
>>>>            "MW.Minimum_Wage",
>>>>            "OT.Overtime",
>>>>            "RK.Records_FLSA",
>>>>            "V.Poster_Other",
>>>>            "AS.Age",
>>>>            "BW.WHMIS_BackWages",
>>>>            "HS.Hours",
>>>>            "OA.HazOccupationAg",
>>>>            "ON.HazOccupationNonAg",
>>>>            "R3.Reg3AgeOccupation",
>>>>            "RK.Records_CL",
>>>>            "V.Other")
>>>>
>>>> PViol.Type.Columns <- t(data.frame(PViol.Type)
>>>>
>>>> # What is the best way to do this in R?
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>>
http://r.789695.n4.nabble.com/Make-2nd-col-of-2-col-df-into-header-ro
>>>> w-of-same-df-then-adjust-col1-data-display-tp4700878.html
>>>>
>>>> Sent from the R help mailing list archive at Nabble.com.
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible
code.
>>>
>>>
>>>
>>>
---------------------------------------------------------------------------
>>> Jeff Newmiller                        The     .....       .....  Go
>>> Live...
>>> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.      
##.#.  Live
>>> Go...
>>>                                         Live:   OO#.. Dead: OO#..
>>> Playing
>>> Research Engineer (Solar/Batteries            O.O#.       #.O#. 
with
>>> /Software/Embedded Controllers)               .OO#.       .OO#.
>>> rocks...1k
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

John Kane

2014-Dec-19 13:44 UTC

head link

[R] Make 2nd col of 2-col df into header row of same df then adjust col1 data display

Very pretty. 
I could have saved myself about 1/2 hour of mucking about if I had thought ot
"length".

John Kane
Kingston ON Canada

> -----Original Message-----
> From: sven.templer at gmail.com
> Sent: Fri, 19 Dec 2014 10:13:55 +0100
> To: chl948 at mail.usask.ca
> Subject: Re: [R] Make 2nd col of 2-col df into header row of same df then
> adjust col1 data display
> 
> Another solution:
> 
> CaseID <- c("1015285", "1005317",
"1012281", "1015285", "1015285",
> "1007183",
> "1008833", "1015315", "1015322",
"1015285")
> Primary.Viol.Type <- c("AS.Age", "HS.Hours",
"HS.Hours", "HS.Hours",
> "RK.Records_CL",
> "OT.Overtime", "OT.Overtime", "OT.Overtime",
"V.Poster_Other",
> "V.Poster_Other")
> 
> library(reshape2)
> dcast(data.frame(CaseID, Primary.Viol.Type), CaseID~Primary.Viol.Type,
> length)
> 
> # result:
> 
> Using Primary.Viol.Type as value column: use value.var to override.
>    CaseID AS.Age HS.Hours OT.Overtime RK.Records_CL V.Poster_Other
> 1 1005317      0        1           0             0              0
> 2 1007183      0        0           1             0              0
> 3 1008833      0        0           1             0              0
> 4 1012281      0        1           0             0              0
> 5 1015285      1        1           0             1              1
> 6 1015315      0        0           1             0              0
> 7 1015322      0        0           0             0              1
> 
> 
> best, s.
> 
> On 19 December 2014 at 06:35, Chel Hee Lee <chl948 at mail.usask.ca>
wrote:
>> Please take a look at my code again.  The error message says that
object
>> 'Primary.Viol.Type' not found.  Have you ever created the
object
>> 'Primary.Viol.Type'?   It will be working if you replace
>> 'Primary.Viol.Type'
>> by 'PViol.Type.Per.Case.Original$Primary.Viol.Type' where
'factor()' is
>> used.  I hope this helps.
>> 
>> Chel Hee Lee
>> 
>> On 12/18/2014 08:57 PM, Crombie, Burnette N wrote:
>>> 
>>> Chel, your solution is fantastic on the dataset I submitted in my
>>> question
>>> but it is not working when I import my real dataset into R.  Do I
need
>>> to
>>> vectorize the columns in my real dataset after importing?  I tried
a
>>> few
>>> things (###) but not making progress:
>>> 
>>> MERGE_PViol.Detail.Per.Case <-
>>>
read.csv("~/FOIA_FLSA/MERGE_PViol.Detail.Per.Case_for_rtf10.csv",
>>> stringsAsFactors=TRUE)
>>> 
>>> ### select only certain columns
>>> PViol.Type.Per.Case.Original <-
>>> MERGE_PViol.Detail.Per.Case[,c("CaseID",
>>> "Primary.Viol.Type")]
>>> 
>>> ###
>>>
write.csv(PViol.Type.Per.Case,file="PViol.Type.Per.Case.Select.csv")
>>> ### PViol.Type.Per.Case.Original <-
>>> read.csv("~/FOIA_FLSA/PViol.Type.Per.Case.Select.csv")
>>> ### PViol.Type.Per.Case.Original$X <- NULL
>>> ###PViol.Type.Per.Case.Original[] <-
>>> lapply(PViol.Type.Per.Case.Original,
>>> as.character)
>>> 
>>> PViol.Type <- c("CaseID",
>>>                  "BW.BackWages",
>>>                  "LD.Liquid_Damages",
>>>                  "MW.Minimum_Wage",
>>>                  "OT.Overtime",
>>>                  "RK.Records_FLSA",
>>>                  "V.Poster_Other",
>>>                  "AS.Age",
>>>                  "BW.WHMIS_BackWages",
>>>                  "HS.Hours",
>>>                  "OA.HazOccupationAg",
>>>                  "ON.HazOccupationNonAg",
>>>                  "R3.Reg3AgeOccupation",
>>>                  "RK.Records_CL",
>>>                  "V.Other")
>>> 
>>> PViol.Type.Per.Case.Original$Primary.Viol.Type <-
>>> factor(Primary.Viol.Type, levels=PViol.Type, labels=PViol.Type)
>>> 
>>> ### Error in factor(Primary.Viol.Type, levels = PViol.Type, labels
>>> PViol.Type) :  object 'Primary.Viol.Type' not found
>>> 
>>> tmp <-
>>>
split(PViol.Type.Per.Case.Original,PViol.Type.Per.Case.Original$CaseID)
>>> ans <- ifelse(do.call(rbind, lapply(tmp,
>>> function(x)table(x$Primary.Viol.Type))), 1, NA)
>>> 
>>> 
>>> 
>>> -----Original Message-----
>>> From: Crombie, Burnette N
>>> Sent: Thursday, December 18, 2014 3:01 PM
>>> To: 'Chel Hee Lee'
>>> Subject: RE: [R] Make 2nd col of 2-col df into header row of same
df
>>> then
>>> adjust col1 data display
>>> 
>>> Thanks for taking the time to review this, Chel.  I've got to
step away
>>> from my desk, but will reply more substantially as soon as
possible. --
>>> BNC
>>> 
>>> -----Original Message-----
>>> From: Chel Hee Lee [mailto:chl948 at mail.usask.ca]
>>> Sent: Thursday, December 18, 2014 2:43 PM
>>> To: Jeff Newmiller; Crombie, Burnette N
>>> Cc: r-help at r-project.org
>>> Subject: Re: [R] Make 2nd col of 2-col df into header row of same
df
>>> then
>>> adjust col1 data display
>>> 
>>> I like the approach presented by Jeff Newmiller as shown in the
>>> previous
>>> post (I really like his way).  As he suggested, it would be good to
>>> start
>>> with 'factor' since you have all values of
'Primary.Viol.Type'.
>>> You may try to use 'split()' function for creating table
that you wish
>>> to
>>> build.  Please see the below (I hope this helps):
>>> 
>>>   > PViol.Type.Per.Case.Original$Primary.Viol.Type <-
>>> factor(Primary.Viol.Type, levels=PViol.Type, labels=PViol.Type) 
>  >
>>> tmp <-
>>> split(PViol.Type.Per.Case.Original,
>>> PViol.Type.Per.Case.Original$CaseID)
>>>   > ans <- ifelse(do.call(rbind, lapply(tmp, function(x)
>>> table(x$Primary.Viol.Type))), 1, NA)  > ans
>>>           CaseID BW.BackWages LD.Liquid_Damages MW.Minimum_Wage
>>> OT.Overtime
>>> 1005317     NA           NA                NA              NA
>>> NA
>>> 1007183     NA           NA                NA              NA
>>> 1
>>> 1008833     NA           NA                NA              NA
>>> 1
>>> 1012281     NA           NA                NA              NA
>>> NA
>>> 1015285     NA           NA                NA              NA
>>> NA
>>> 1015315     NA           NA                NA              NA
>>> 1
>>> 1015322     NA           NA                NA              NA
>>> NA
>>>           RK.Records_FLSA V.Poster_Other AS.Age BW.WHMIS_BackWages
>>> HS.Hours
>>> 1005317              NA             NA     NA                 NA
>>> 1
>>> 1007183              NA             NA     NA                 NA
>>> NA
>>> 1008833              NA             NA     NA                 NA
>>> NA
>>> 1012281              NA             NA     NA                 NA
>>> 1
>>> 1015285              NA              1      1                 NA
>>> 1
>>> 1015315              NA             NA     NA                 NA
>>> NA
>>> 1015322              NA              1     NA                 NA
>>> NA
>>>           OA.HazOccupationAg ON.HazOccupationNonAg
R3.Reg3AgeOccupation
>>> 1005317                 NA                    NA                  
NA
>>> 1007183                 NA                    NA                  
NA
>>> 1008833                 NA                    NA                  
NA
>>> 1012281                 NA                    NA                  
NA
>>> 1015285                 NA                    NA                  
NA
>>> 1015315                 NA                    NA                  
NA
>>> 1015322                 NA                    NA                  
NA
>>>           RK.Records_CL V.Other
>>> 1005317            NA      NA
>>> 1007183            NA      NA
>>> 1008833            NA      NA
>>> 1012281            NA      NA
>>> 1015285             1      NA
>>> 1015315            NA      NA
>>> 1015322            NA      NA
>>>   >
>>> 
>>> Chel Hee Lee
>>> 
>>> On 12/18/2014 10:02 AM, Jeff Newmiller wrote:
>>>> 
>>>> No guarantees on "best"... but one way using base R
could be:
>>>> 
>>>> # Note that "CaseID" is actually not a valid
PViol.Type as you had it
>>>> PViol.Type <- c( "BW.BackWages"
>>>>                  , "LD.Liquid_Damages"
>>>>                  , "MW.Minimum_Wage"
>>>>                  , "OT.Overtime"
>>>>                  , "RK.Records_FLSA"
>>>>                  , "V.Poster_Other"
>>>>                  , "AS.Age"
>>>>                  , "BW.WHMIS_BackWages"
>>>>                  , "HS.Hours"
>>>>                  , "OA.HazOccupationAg"
>>>>                  , "ON.HazOccupationNonAg"
>>>>                  , "R3.Reg3AgeOccupation"
>>>>                  , "RK.Records_CL"
>>>>                  , "V.Other" )
>>>> 
>>>> # explicitly specifying all levels to the factor insures a
complete #
>>>> set of column outputs regardless of what is in the input
>>>> PViol.Type.Per.Case.Original <-
>>>>       data.frame( CaseID
>>>>                 , Primary.Viol.Type=factor( Primary.Viol.Type
>>>>                                           , levels=PViol.Type )
)
>>>> 
>>>> tmp <- table( PViol.Type.Per.Case.Original ) ans <-
data.frame(
>>>> CaseID=rownames( tmp )
>>>>                    , as.data.frame( ifelse( 0==tmp, NA, 1 ) )
>>>>                    )
>>>> 
>>>> 
>>>> On Wed, 17 Dec 2014, bcrombie wrote:
>>>> 
>>>>> # I have a dataframe that contains 2 columns:
>>>>> CaseID  <- c('1015285',
>>>>> '1005317',
>>>>> '1012281',
>>>>> '1015285',
>>>>> '1015285',
>>>>> '1007183',
>>>>> '1008833',
>>>>> '1015315',
>>>>> '1015322',
>>>>> '1015285')
>>>>> 
>>>>> Primary.Viol.Type <- c('AS.Age',
>>>>> 'HS.Hours',
>>>>> 'HS.Hours',
>>>>> 'HS.Hours',
>>>>> 'RK.Records_CL',
>>>>> 'OT.Overtime',
>>>>> 'OT.Overtime',
>>>>> 'OT.Overtime',
>>>>> 'V.Poster_Other',
>>>>> 'V.Poster_Other')
>>>>> 
>>>>> PViol.Type.Per.Case.Original <-
data.frame(CaseID,Primary.Viol.Type)
>>>>> 
>>>>> # CaseID?s can be repeated because there can be up to 14
>>>>> Primary.Viol.Type?s per CaseID.
>>>>> 
>>>>> # I want to transform this dataframe into one that has 15
columns,
>>>>> where the first column is CaseID, and the rest are the 14
primary
>>>>> viol. types.  The CaseID column will contain a list of the
unique
>>>>> CaseID?s (no
>>>>> replicates) and
>>>>> for each of their rows, there will be a ?1? under  a column
>>>>> corresponding to a primary violation type recorded for that
CaseID.
>>>>> So, technically, there could be zero to 14 ?1?s? in a
CaseID?s row.
>>>>> 
>>>>> # For example, the row for CaseID '1015285' above
would have a ?1?
>>>>> under ?AS.Age?, ?HS.Hours?, ?RK.Records_CL?, and
?V.Poster_Other?,
>>>>> but have "NA"
>>>>> under the rest of the columns.
>>>>> 
>>>>> PViol.Type <- c("CaseID",
>>>>>                 "BW.BackWages",
>>>>>            "LD.Liquid_Damages",
>>>>>            "MW.Minimum_Wage",
>>>>>            "OT.Overtime",
>>>>>            "RK.Records_FLSA",
>>>>>            "V.Poster_Other",
>>>>>            "AS.Age",
>>>>>            "BW.WHMIS_BackWages",
>>>>>            "HS.Hours",
>>>>>            "OA.HazOccupationAg",
>>>>>            "ON.HazOccupationNonAg",
>>>>>            "R3.Reg3AgeOccupation",
>>>>>            "RK.Records_CL",
>>>>>            "V.Other")
>>>>> 
>>>>> PViol.Type.Columns <- t(data.frame(PViol.Type)
>>>>> 
>>>>> # What is the best way to do this in R?
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> View this message in context:
>>>>>
http://r.789695.n4.nabble.com/Make-2nd-col-of-2-col-df-into-header-ro
>>>>> w-of-same-df-then-adjust-col1-data-display-tp4700878.html
>>>>> 
>>>>> Sent from the R help mailing list archive at Nabble.com.
>>>>> 
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained,
reproducible code.
>>>> 
>>>> 
>>>> 
>>>>
---------------------------------------------------------------------------
>>>> Jeff Newmiller                        The     .....       .....
Go
>>>> Live...
>>>> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.   
##.#.  Live
>>>> Go...
>>>>                                         Live:   OO#.. Dead:
OO#..
>>>> Playing
>>>> Research Engineer (Solar/Batteries            O.O#.       #.O#.
with
>>>> /Software/Embedded Controllers)               .OO#.       .OO#.
>>>> rocks...1k
>>>> 
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible
code.
>> 
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
____________________________________________________________
Can't remember your password? Do you need a strong and secure password?
Use Password manager! It stores your passwords & protects your account.

Crombie, Burnette N

2014-Dec-19 13:52 UTC

head link

[R] Make 2nd col of 2-col df into header row of same df then adjust col1 data display

That is the solution I had tried first (yes, it's nice!), but it doesn't
provide the other PViol.Type's that aren't necessarily in my dataset. 
That's where my problem is.  I'm closer to the cure, though, and think
I've thought of a solution as soon as I have time.  I'll update everyone
then. -- BNC

-----Original Message-----
From: John Kane [mailto:jrkrideau at inbox.com] 
Sent: Friday, December 19, 2014 8:44 AM
To: Sven E. Templer; Chel Hee Lee
Cc: R Help List; Crombie, Burnette N
Subject: Re: [R] Make 2nd col of 2-col df into header row of same df then adjust
col1 data display

Very pretty. 
I could have saved myself about 1/2 hour of mucking about if I had thought ot
"length".

John Kane
Kingston ON Canada

> -----Original Message-----
> From: sven.templer at gmail.com
> Sent: Fri, 19 Dec 2014 10:13:55 +0100
> To: chl948 at mail.usask.ca
> Subject: Re: [R] Make 2nd col of 2-col df into header row of same df 
> then adjust col1 data display
> 
> Another solution:
> 
> CaseID <- c("1015285", "1005317",
"1012281", "1015285", "1015285",
> "1007183", "1008833", "1015315",
"1015322", "1015285")
> Primary.Viol.Type <- c("AS.Age", "HS.Hours",
"HS.Hours", "HS.Hours",
> "RK.Records_CL", "OT.Overtime",
"OT.Overtime", "OT.Overtime",
> "V.Poster_Other",
> "V.Poster_Other")
> 
> library(reshape2)
> dcast(data.frame(CaseID, Primary.Viol.Type), CaseID~Primary.Viol.Type,
> length)
> 
> # result:
> 
> Using Primary.Viol.Type as value column: use value.var to override.
>    CaseID AS.Age HS.Hours OT.Overtime RK.Records_CL V.Poster_Other
> 1 1005317      0        1           0             0              0
> 2 1007183      0        0           1             0              0
> 3 1008833      0        0           1             0              0
> 4 1012281      0        1           0             0              0
> 5 1015285      1        1           0             1              1
> 6 1015315      0        0           1             0              0
> 7 1015322      0        0           0             0              1
> 
> 
> best, s.
> 
> On 19 December 2014 at 06:35, Chel Hee Lee <chl948 at mail.usask.ca>
wrote:
>> Please take a look at my code again.  The error message says that 
>> object 'Primary.Viol.Type' not found.  Have you ever created
the object
>> 'Primary.Viol.Type'?   It will be working if you replace
>> 'Primary.Viol.Type'
>> by 'PViol.Type.Per.Case.Original$Primary.Viol.Type' where
'factor()'
>> is used.  I hope this helps.
>> 
>> Chel Hee Lee
>> 
>> On 12/18/2014 08:57 PM, Crombie, Burnette N wrote:
>>> 
>>> Chel, your solution is fantastic on the dataset I submitted in my 
>>> question but it is not working when I import my real dataset into
R.
>>> Do I need to vectorize the columns in my real dataset after 
>>> importing?  I tried a few things (###) but not making progress:
>>> 
>>> MERGE_PViol.Detail.Per.Case <-
>>>
read.csv("~/FOIA_FLSA/MERGE_PViol.Detail.Per.Case_for_rtf10.csv",
>>> stringsAsFactors=TRUE)
>>> 
>>> ### select only certain columns
>>> PViol.Type.Per.Case.Original <-
>>> MERGE_PViol.Detail.Per.Case[,c("CaseID",
>>> "Primary.Viol.Type")]
>>> 
>>> ###
>>>
write.csv(PViol.Type.Per.Case,file="PViol.Type.Per.Case.Select.csv")
>>> ### PViol.Type.Per.Case.Original <-
>>> read.csv("~/FOIA_FLSA/PViol.Type.Per.Case.Select.csv")
>>> ### PViol.Type.Per.Case.Original$X <- NULL 
>>> ###PViol.Type.Per.Case.Original[] <- 
>>> lapply(PViol.Type.Per.Case.Original,
>>> as.character)
>>> 
>>> PViol.Type <- c("CaseID",
>>>                  "BW.BackWages",
>>>                  "LD.Liquid_Damages",
>>>                  "MW.Minimum_Wage",
>>>                  "OT.Overtime",
>>>                  "RK.Records_FLSA",
>>>                  "V.Poster_Other",
>>>                  "AS.Age",
>>>                  "BW.WHMIS_BackWages",
>>>                  "HS.Hours",
>>>                  "OA.HazOccupationAg",
>>>                  "ON.HazOccupationNonAg",
>>>                  "R3.Reg3AgeOccupation",
>>>                  "RK.Records_CL",
>>>                  "V.Other")
>>> 
>>> PViol.Type.Per.Case.Original$Primary.Viol.Type <- 
>>> factor(Primary.Viol.Type, levels=PViol.Type, labels=PViol.Type)
>>> 
>>> ### Error in factor(Primary.Viol.Type, levels = PViol.Type, labels
>>> PViol.Type) :  object 'Primary.Viol.Type' not found
>>> 
>>> tmp <-
>>>
split(PViol.Type.Per.Case.Original,PViol.Type.Per.Case.Original$Case
>>> ID) ans <- ifelse(do.call(rbind, lapply(tmp, 
>>> function(x)table(x$Primary.Viol.Type))), 1, NA)
>>> 
>>> 
>>> 
>>> -----Original Message-----
>>> From: Crombie, Burnette N
>>> Sent: Thursday, December 18, 2014 3:01 PM
>>> To: 'Chel Hee Lee'
>>> Subject: RE: [R] Make 2nd col of 2-col df into header row of same
df
>>> then adjust col1 data display
>>> 
>>> Thanks for taking the time to review this, Chel.  I've got to
step
>>> away from my desk, but will reply more substantially as soon as 
>>> possible. -- BNC
>>> 
>>> -----Original Message-----
>>> From: Chel Hee Lee [mailto:chl948 at mail.usask.ca]
>>> Sent: Thursday, December 18, 2014 2:43 PM
>>> To: Jeff Newmiller; Crombie, Burnette N
>>> Cc: r-help at r-project.org
>>> Subject: Re: [R] Make 2nd col of 2-col df into header row of same
df
>>> then adjust col1 data display
>>> 
>>> I like the approach presented by Jeff Newmiller as shown in the 
>>> previous post (I really like his way).  As he suggested, it would
be
>>> good to start with 'factor' since you have all values of 
>>> 'Primary.Viol.Type'.
>>> You may try to use 'split()' function for creating table
that you
>>> wish to build.  Please see the below (I hope this helps):
>>> 
>>>   > PViol.Type.Per.Case.Original$Primary.Viol.Type <- 
>>> factor(Primary.Viol.Type, levels=PViol.Type, labels=PViol.Type) 
>
>>> > tmp <- split(PViol.Type.Per.Case.Original,
>>> PViol.Type.Per.Case.Original$CaseID)
>>>   > ans <- ifelse(do.call(rbind, lapply(tmp, function(x) 
>>> table(x$Primary.Viol.Type))), 1, NA)  > ans
>>>           CaseID BW.BackWages LD.Liquid_Damages MW.Minimum_Wage 
>>> OT.Overtime
>>> 1005317     NA           NA                NA              NA
>>> NA
>>> 1007183     NA           NA                NA              NA
>>> 1
>>> 1008833     NA           NA                NA              NA
>>> 1
>>> 1012281     NA           NA                NA              NA
>>> NA
>>> 1015285     NA           NA                NA              NA
>>> NA
>>> 1015315     NA           NA                NA              NA
>>> 1
>>> 1015322     NA           NA                NA              NA
>>> NA
>>>           RK.Records_FLSA V.Poster_Other AS.Age BW.WHMIS_BackWages 
>>> HS.Hours
>>> 1005317              NA             NA     NA                 NA
>>> 1
>>> 1007183              NA             NA     NA                 NA
>>> NA
>>> 1008833              NA             NA     NA                 NA
>>> NA
>>> 1012281              NA             NA     NA                 NA
>>> 1
>>> 1015285              NA              1      1                 NA
>>> 1
>>> 1015315              NA             NA     NA                 NA
>>> NA
>>> 1015322              NA              1     NA                 NA
>>> NA
>>>           OA.HazOccupationAg ON.HazOccupationNonAg
R3.Reg3AgeOccupation
>>> 1005317                 NA                    NA                  
NA
>>> 1007183                 NA                    NA                  
NA
>>> 1008833                 NA                    NA                  
NA
>>> 1012281                 NA                    NA                  
NA
>>> 1015285                 NA                    NA                  
NA
>>> 1015315                 NA                    NA                  
NA
>>> 1015322                 NA                    NA                  
NA
>>>           RK.Records_CL V.Other
>>> 1005317            NA      NA
>>> 1007183            NA      NA
>>> 1008833            NA      NA
>>> 1012281            NA      NA
>>> 1015285             1      NA
>>> 1015315            NA      NA
>>> 1015322            NA      NA
>>>   >
>>> 
>>> Chel Hee Lee
>>> 
>>> On 12/18/2014 10:02 AM, Jeff Newmiller wrote:
>>>> 
>>>> No guarantees on "best"... but one way using base R
could be:
>>>> 
>>>> # Note that "CaseID" is actually not a valid
PViol.Type as you had
>>>> it PViol.Type <- c( "BW.BackWages"
>>>>                  , "LD.Liquid_Damages"
>>>>                  , "MW.Minimum_Wage"
>>>>                  , "OT.Overtime"
>>>>                  , "RK.Records_FLSA"
>>>>                  , "V.Poster_Other"
>>>>                  , "AS.Age"
>>>>                  , "BW.WHMIS_BackWages"
>>>>                  , "HS.Hours"
>>>>                  , "OA.HazOccupationAg"
>>>>                  , "ON.HazOccupationNonAg"
>>>>                  , "R3.Reg3AgeOccupation"
>>>>                  , "RK.Records_CL"
>>>>                  , "V.Other" )
>>>> 
>>>> # explicitly specifying all levels to the factor insures a
complete
>>>> # set of column outputs regardless of what is in the input 
>>>> PViol.Type.Per.Case.Original <-
>>>>       data.frame( CaseID
>>>>                 , Primary.Viol.Type=factor( Primary.Viol.Type
>>>>                                           , levels=PViol.Type )
)
>>>> 
>>>> tmp <- table( PViol.Type.Per.Case.Original ) ans <-
data.frame(
>>>> CaseID=rownames( tmp )
>>>>                    , as.data.frame( ifelse( 0==tmp, NA, 1 ) )
>>>>                    )
>>>> 
>>>> 
>>>> On Wed, 17 Dec 2014, bcrombie wrote:
>>>> 
>>>>> # I have a dataframe that contains 2 columns:
>>>>> CaseID  <- c('1015285',
>>>>> '1005317',
>>>>> '1012281',
>>>>> '1015285',
>>>>> '1015285',
>>>>> '1007183',
>>>>> '1008833',
>>>>> '1015315',
>>>>> '1015322',
>>>>> '1015285')
>>>>> 
>>>>> Primary.Viol.Type <- c('AS.Age',
>>>>> 'HS.Hours',
>>>>> 'HS.Hours',
>>>>> 'HS.Hours',
>>>>> 'RK.Records_CL',
>>>>> 'OT.Overtime',
>>>>> 'OT.Overtime',
>>>>> 'OT.Overtime',
>>>>> 'V.Poster_Other',
>>>>> 'V.Poster_Other')
>>>>> 
>>>>> PViol.Type.Per.Case.Original <- 
>>>>> data.frame(CaseID,Primary.Viol.Type)
>>>>> 
>>>>> # CaseID?s can be repeated because there can be up to 14 
>>>>> Primary.Viol.Type?s per CaseID.
>>>>> 
>>>>> # I want to transform this dataframe into one that has 15
columns,
>>>>> where the first column is CaseID, and the rest are the 14
primary
>>>>> viol. types.  The CaseID column will contain a list of the
unique
>>>>> CaseID?s (no
>>>>> replicates) and
>>>>> for each of their rows, there will be a ?1? under  a column
>>>>> corresponding to a primary violation type recorded for that
CaseID.
>>>>> So, technically, there could be zero to 14 ?1?s? in a
CaseID?s row.
>>>>> 
>>>>> # For example, the row for CaseID '1015285' above
would have a ?1?
>>>>> under ?AS.Age?, ?HS.Hours?, ?RK.Records_CL?, and
?V.Poster_Other?,
>>>>> but have "NA"
>>>>> under the rest of the columns.
>>>>> 
>>>>> PViol.Type <- c("CaseID",
>>>>>                 "BW.BackWages",
>>>>>            "LD.Liquid_Damages",
>>>>>            "MW.Minimum_Wage",
>>>>>            "OT.Overtime",
>>>>>            "RK.Records_FLSA",
>>>>>            "V.Poster_Other",
>>>>>            "AS.Age",
>>>>>            "BW.WHMIS_BackWages",
>>>>>            "HS.Hours",
>>>>>            "OA.HazOccupationAg",
>>>>>            "ON.HazOccupationNonAg",
>>>>>            "R3.Reg3AgeOccupation",
>>>>>            "RK.Records_CL",
>>>>>            "V.Other")
>>>>> 
>>>>> PViol.Type.Columns <- t(data.frame(PViol.Type)
>>>>> 
>>>>> # What is the best way to do this in R?
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> View this message in context:
>>>>>
http://r.789695.n4.nabble.com/Make-2nd-col-of-2-col-df-into-header
>>>>> -ro
w-of-same-df-then-adjust-col1-data-display-tp4700878.html
>>>>> 
>>>>> Sent from the R help mailing list archive at Nabble.com.
>>>>> 
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained,
reproducible code.
>>>> 
>>>> 
>>>> 
>>>>
---------------------------------------------------------------------------
>>>> Jeff Newmiller                        The     .....       .....
Go
>>>> Live...
>>>> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.   
##.#.  Live
>>>> Go...
>>>>                                         Live:   OO#.. Dead:
OO#..
>>>> Playing
>>>> Research Engineer (Solar/Batteries            O.O#.       #.O#.
with
>>>> /Software/Embedded Controllers)               .OO#.       .OO#.
>>>> rocks...1k
>>>> 
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible
code.
>> 
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see 
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
____________________________________________________________
Can't remember your password? Do you need a strong and secure password?
Use Password manager! It stores your passwords & protects your account.

R help - Dec 2014 - Make 2nd col of 2-col df into header row of same df then adjust col1 data display

[R] Make 2nd col of 2-col df into header row of same df then adjust col1 data display

[R] Make 2nd col of 2-col df into header row of same df then adjust col1 data display

[R] Make 2nd col of 2-col df into header row of same df then adjust col1 data display

[R] Make 2nd col of 2-col df into header row of same df then adjust col1 data display

[R] Make 2nd col of 2-col df into header row of same df then adjust col1 data display

[R] Make 2nd col of 2-col df into header row of same df then adjust col1 data display

[R] Make 2nd col of 2-col df into header row of same df then adjust col1 data display

[R] Make 2nd col of 2-col df into header row of same df then adjust col1 data display

[R] Make 2nd col of 2-col df into header row of same df then adjust col1 data display

[R] Make 2nd col of 2-col df into header row of same df then adjust col1 data display

[R] Make 2nd col of 2-col df into header row of same df then adjust col1 data display