thr3ads.net - R help - [R] Reading Multi-value data fields for descriptive analysis [Jul 2008]

If this information is useful, please help other people find it:
Share via:

Hohm, Dale

2008-Jul-12 16:37 UTC

[R] Reading Multi-value data fields for descriptive analysis

Hello,

I'm looking for help on the best approach to get "multi-value"
data fields into R for simple descriptive analysis.

-------------------------------------

I am new to this list and new to R, but I really want to get over the hump and
get productive with it.  Some help with how to best get the following data into
R would be greatly appreciated.  I have programming experience and stale
experience with SPSS.

I am trying to do some simple descriptive analysis (frequencies, cross-tabs) of
data stored in a Microsoft SharePoint list.  The data can be accessed with ODBC
or it can readily be extracted into an Excel or CSV format.  One of the
challenges with the data is that it uses several "multi-value" fields
(Microsoft Access provides the same data-type).

By "multi-value" I mean that multiple responses are packed into a
single data column; the data input form presents a question with several
checkboxes and a free-format write-in response.  The individual values within
the data field are separated with the two characters ";#".  So, the
data would be of the following format (in CSV form with column headers and a
tilde as the field separator):

Column1single~Column2multi~Column3multi
a sample value~C2 a multi one;#C2 a multi two~C3 a multi one;#C3 a multi two;#C3
a free-form answer


The first approach that comes to mind is to explode the multi-value fields into
unique bi-variate data columns and then assign a 0 or 1 to these new columns in
each record based on whether that specific value was present.  This approach is
complicated by the free-form answer as the unique columns could grow very large
in number - it might be better to figure out how to indicate the presence of the
free-form value in a data column called "Other" (or "C2
Other") and then hold the free-form value in a separate column.

The data would then look like this...

Column1single: a sample value
C2 a multi one: 1
C2 a multi two: 1
C2 a multi three: 0
C3 a multi one: 1
C3 a multi two: 1
C3 a free-form answer: 1
C3 another free-form answer: 0


Or in the second scenario...

Column1single: a sample value
C2 a multi one: 1
C2 a multi two: 1
C2 a multi three: 0
C3 a multi one: 1
C3 a multi two: 1
C3 Other: 1
C3 Other Text: a free-form answer


I am uncertain help to read this data into R in this format, so suggestions and
examples would help me greatly.

This is a pretty common data packing scenario, so perhaps there are better
approaches to reading this data and better ways in R to analyze it than what I
have presented.  Suggestions greatly appreciated.


Thanks,

Dale Hohm

	[[alternative HTML version deleted]]

jim holtman

2008-Jul-12 17:31 UTC

head link

[R] Reading Multi-value data fields for descriptive analysis

Can you provide a more complete example (say 10 lines) of what the
input is like. Does each line have a unique index that can be related
to it?  Do you want to summarize all the multi1-n values of Col2?  Do
you want to know the percentage of input lines that have a
Col3/multi-value4 on them?  You could read in the data as you have
indicated below and add a column that is the record number and
therefore you would have have to worry about trying to say if it
existed or not.  For example, you might have:

Rec#|col#|value
1|1|single
1|2|multi1
1|2|multi2
1|3|multi1
2|1|single
3|1|single
3|2|multi1
....

There are a number of potential ways of representing the data, but a
lot depends on what you want to do with it, so a more extensive
example of the input, along with the type of output you would like
will help in providing an answer.

On Sat, Jul 12, 2008 at 12:37 PM, Hohm, Dale <dale.hohm at hp.com>
wrote:> Hello,
>
> I'm looking for help on the best approach to get
"multi-value" data fields into R for simple descriptive analysis.
>
> -------------------------------------
>
> I am new to this list and new to R, but I really want to get over the hump
and get productive with it.  Some help with how to best get the following data
into R would be greatly appreciated.  I have programming experience and stale
experience with SPSS.
>
> I am trying to do some simple descriptive analysis (frequencies,
cross-tabs) of data stored in a Microsoft SharePoint list.  The data can be
accessed with ODBC or it can readily be extracted into an Excel or CSV format. 
One of the challenges with the data is that it uses several
"multi-value" fields (Microsoft Access provides the same data-type).
>
> By "multi-value" I mean that multiple responses are packed into a
single data column; the data input form presents a question with several
checkboxes and a free-format write-in response.  The individual values within
the data field are separated with the two characters ";#".  So, the
data would be of the following format (in CSV form with column headers and a
tilde as the field separator):
>
> Column1single~Column2multi~Column3multi
> a sample value~C2 a multi one;#C2 a multi two~C3 a multi one;#C3 a multi
two;#C3 a free-form answer
>
>
> The first approach that comes to mind is to explode the multi-value fields
into unique bi-variate data columns and then assign a 0 or 1 to these new
columns in each record based on whether that specific value was present.  This
approach is complicated by the free-form answer as the unique columns could grow
very large in number - it might be better to figure out how to indicate the
presence of the free-form value in a data column called "Other" (or
"C2 Other") and then hold the free-form value in a separate column.
>
> The data would then look like this...
>
> Column1single: a sample value
> C2 a multi one: 1
> C2 a multi two: 1
> C2 a multi three: 0
> C3 a multi one: 1
> C3 a multi two: 1
> C3 a free-form answer: 1
> C3 another free-form answer: 0
>
>
> Or in the second scenario...
>
> Column1single: a sample value
> C2 a multi one: 1
> C2 a multi two: 1
> C2 a multi three: 0
> C3 a multi one: 1
> C3 a multi two: 1
> C3 Other: 1
> C3 Other Text: a free-form answer
>
>
> I am uncertain help to read this data into R in this format, so suggestions
and examples would help me greatly.
>
> This is a pretty common data packing scenario, so perhaps there are better
approaches to reading this data and better ways in R to analyze it than what I
have presented.  Suggestions greatly appreciated.
>
>
> Thanks,
>
> Dale Hohm
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

Possibly Parallel Threads

Search for more possibly parallel threads

R help - Jul 2008 - Reading Multi-value data fields for descriptive analysis

[R] Reading Multi-value data fields for descriptive analysis

[R] Reading Multi-value data fields for descriptive analysis

Possibly Parallel Threads