thr3ads.net - R help - [R] 'R' Software Output Plagiarism [Sep 2015]

If this information is useful, please help other people find it:
Share via:

Marc Schwartz

2015-Sep-22 16:24 UTC

[R] 'R' Software Output Plagiarism

Hi,

With the usual caveat that I Am Not A Lawyer....and that I am not speaking on
behalf of any organization...

My guess is that they are claiming that the output of R, simply being copied and
pasted verbatim into your thesis constitutes the use of copyrighted output from
the software.

It is not clear to me that R's output is copyrighted by the R Foundation (or
by other parties for CRAN packages), albeit, the source code underlying R is,
along with other copyright owner's as apropos. There is some caselaw to
support the notion that the output alone is not protected in a similar manner,
but that may be country specific.

Did you provide any credit to R (see the output of citation() ) in your thesis
and indicate that your analyses were performed using R?

If R is uncredited, I could see them raising the issue.

You might check with your institution's legal/policy folks to see if there
is any guidance provided for students regarding the crediting of software used
in this manner, especially if that guidance is at no cost to you.

Regards,

Marc Schwartz

> On Sep 22, 2015, at 11:01 AM, Bert Gunter <bgunter.4567 at gmail.com>
wrote:
> 
> 1. It is highly unlikely that we could be of help (unless someone else
> has experienced this and knows what happened). You will have to
> contact the Urkund people and ask them why their algorithms raised the
> flags.
> 
> 2. But of course, the regression methodology is not "your own" --
it's
> just a standard tool that you used in your work, which is entirely
> legitimate of course.
> 
> Cheers,
> Bert
> 
> 
> Bert Gunter
> 
> "Data is not information. Information is not knowledge. And knowledge
> is certainly not wisdom."
>   -- Clifford Stoll
> 
> 
> On Tue, Sep 22, 2015 at 7:27 AM, BARRETT, Oliver
> <oliver.barrett at skema.edu> wrote:
>> 
>> Dear 'R' community support,
>> 
>> 
>> I am a student at Skema business school and I have recently submitted
my MSc thesis/dissertation. This has been passed on to an external plagiarism
service provider, Urkund, who have scanned my document and returned a plagiarism
report to my professor having detected 32% plagiarism.
>> 
>> 
>> I have contacted Urkund regarding this issue having committed no such
plagiarism and they have told me that all the plagiarism detected in my document
comes from the last 25% which consists only of 'R' regressions like the
one I have pasted below:
>> 
>> lm(formula = Prague50 ~ Fed + Fed.t.1. + Fed.t.2. + Fed.t.3. +
>>    Fed.t.4., data = OLS_CAR, x = TRUE)
>> 
>> Residuals:
>>      Min        1Q    Median        3Q       Max
>> -0.154587 -0.015961  0.001429  0.017196  0.110907
>> 
>> Coefficients:
>>             Estimate Std. Error t value Pr(>|t|)
>> (Intercept) -0.001630   0.001763  -0.925   0.3559
>> Fed         -0.121595   0.165359  -0.735   0.4627
>> Fed.t.1.     0.344014   0.140979   2.440   0.0153 *
>> Fed.t.2.     0.026529   0.143648   0.185   0.8536
>> Fed.t.3.     0.622357   0.142021   4.382 1.62e-05 ***
>> Fed.t.4.     0.291985   0.158914   1.837   0.0671 .
>> ---
>> Signif. codes:  0 '***' 0.001 '**' 0.01 '*'
0.05 '.' 0.1 ' ' 1
>> 
>> Residual standard error: 0.0293 on 304 degrees of freedom
>>  (20 observations deleted due to missingness)
>> Multiple R-squared:  0.08629,  Adjusted R-squared:  0.07126
>> F-statistic: 5.742 on 5 and 304 DF,  p-value: 4.422e-05
>> 
>> I have produced all of these regressions myself and pasted them
directly from the 'R' software package. My regression methodology is
entirely my own along with the sourcing and preperation of the data used to
produce these statistics.
>> 
>> I would be very grateful if you could provide my with some clarity as
to why this output from 'R' is reading as plagiarism.
>> 
>> I would like to thank you in advance,
>> 
>> Kind regards,
>> 
>> Oliver Barrett
>> (+44) 7341 834 217
>> 
>>        [[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

peter dalgaard

2015-Sep-22 20:06 UTC

head link

[R] 'R' Software Output Plagiarism

Marc,

I don't think Copyright/Intellectual property issues factor into this.
Urkund and similar tools are to my knowledge entirely about plagiarism. So the
issue would seem to be that the R output is considered identical or nearly
indentical to R output in other published orotherwise  submitted material.

What puzzles me (except for how a document can be deemed 32% plagiarized in 25%
of the text) is whether this includes the numbers and variable names. If those
are somehow factored out, then any R regression could be pretty much identical
to any other R regression. However, two analyses with similar variable names
could happen if they are based on the same cookbook recipe and analyses with
similar numerical output come from analyzing the same standard data. Such
situations would not necessarily be considered plagiarism (I mean: If you claim
that you are analyzing data from experiments that you yourself have performed,
and your numbers are exactly identical to something that has been previously
published, then it would be suspect. If you analyze something from public
sources, someone else might well have done the same thing.).

Similarly to John Kane, I think it is necessary to know exactly what sources the
text is claimed to be plagiarized from and/or what parts of the text that are
being matched by Urkund. If it turns out that Urkund is generating false
positives, then this needs to be pointed out to them and to the people basing
decisions on it.

-pd
> On 22 Sep 2015, at 18:24 , Marc Schwartz <marc_schwartz at me.com>
wrote:
> 
> Hi,
> 
> With the usual caveat that I Am Not A Lawyer....and that I am not speaking
on behalf of any organization...
> 
> My guess is that they are claiming that the output of R, simply being
copied and pasted verbatim into your thesis constitutes the use of copyrighted
output from the software.
> 
> It is not clear to me that R's output is copyrighted by the R
Foundation (or by other parties for CRAN packages), albeit, the source code
underlying R is, along with other copyright owner's as apropos. There is
some caselaw to support the notion that the output alone is not protected in a
similar manner, but that may be country specific.
> 
> Did you provide any credit to R (see the output of citation() ) in your
thesis and indicate that your analyses were performed using R?
> 
> If R is uncredited, I could see them raising the issue.
> 
> You might check with your institution's legal/policy folks to see if
there is any guidance provided for students regarding the crediting of software
used in this manner, especially if that guidance is at no cost to you.
> 
> Regards,
> 
> Marc Schwartz
> 
> 
>> On Sep 22, 2015, at 11:01 AM, Bert Gunter <bgunter.4567 at
gmail.com> wrote:
>> 
>> 1. It is highly unlikely that we could be of help (unless someone else
>> has experienced this and knows what happened). You will have to
>> contact the Urkund people and ask them why their algorithms raised the
>> flags.
>> 
>> 2. But of course, the regression methodology is not "your
own" -- it's
>> just a standard tool that you used in your work, which is entirely
>> legitimate of course.
>> 
>> Cheers,
>> Bert
>> 
>> 
>> Bert Gunter
>> 
>> "Data is not information. Information is not knowledge. And
knowledge
>> is certainly not wisdom."
>>  -- Clifford Stoll
>> 
>> 
>> On Tue, Sep 22, 2015 at 7:27 AM, BARRETT, Oliver
>> <oliver.barrett at skema.edu> wrote:
>>> 
>>> Dear 'R' community support,
>>> 
>>> 
>>> I am a student at Skema business school and I have recently
submitted my MSc thesis/dissertation. This has been passed on to an external
plagiarism service provider, Urkund, who have scanned my document and returned a
plagiarism report to my professor having detected 32% plagiarism.
>>> 
>>> 
>>> I have contacted Urkund regarding this issue having committed no
such plagiarism and they have told me that all the plagiarism detected in my
document comes from the last 25% which consists only of 'R' regressions
like the one I have pasted below:
>>> 
>>> lm(formula = Prague50 ~ Fed + Fed.t.1. + Fed.t.2. + Fed.t.3. +
>>>   Fed.t.4., data = OLS_CAR, x = TRUE)
>>> 
>>> Residuals:
>>>     Min        1Q    Median        3Q       Max
>>> -0.154587 -0.015961  0.001429  0.017196  0.110907
>>> 
>>> Coefficients:
>>>            Estimate Std. Error t value Pr(>|t|)
>>> (Intercept) -0.001630   0.001763  -0.925   0.3559
>>> Fed         -0.121595   0.165359  -0.735   0.4627
>>> Fed.t.1.     0.344014   0.140979   2.440   0.0153 *
>>> Fed.t.2.     0.026529   0.143648   0.185   0.8536
>>> Fed.t.3.     0.622357   0.142021   4.382 1.62e-05 ***
>>> Fed.t.4.     0.291985   0.158914   1.837   0.0671 .
>>> ---
>>> Signif. codes:  0 '***' 0.001 '**' 0.01 '*'
0.05 '.' 0.1 ' ' 1
>>> 
>>> Residual standard error: 0.0293 on 304 degrees of freedom
>>> (20 observations deleted due to missingness)
>>> Multiple R-squared:  0.08629,  Adjusted R-squared:  0.07126
>>> F-statistic: 5.742 on 5 and 304 DF,  p-value: 4.422e-05
>>> 
>>> I have produced all of these regressions myself and pasted them
directly from the 'R' software package. My regression methodology is
entirely my own along with the sourcing and preperation of the data used to
produce these statistics.
>>> 
>>> I would be very grateful if you could provide my with some clarity
as to why this output from 'R' is reading as plagiarism.
>>> 
>>> I would like to thank you in advance,
>>> 
>>> Kind regards,
>>> 
>>> Oliver Barrett
>>> (+44) 7341 834 217
>>> 
>>>       [[alternative HTML version deleted]]
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

Mitchell Maltenfort

2015-Sep-22 20:18 UTC

head link

[R] 'R' Software Output Plagiarism

Isn't plagiarism detection based on overlaps with sentence structure?
That way, it would catch plagiarism if someone simply did a
find-and-replace. But that would also catch regressions with the same
output format.

How long was the original thesis?  If 25% of it was all regression
output, sounds like a lot of regressions.



On Tue, Sep 22, 2015 at 4:06 PM, peter dalgaard <pdalgd at gmail.com>
wrote:> Marc,
>
> I don't think Copyright/Intellectual property issues factor into this.
Urkund and similar tools are to my knowledge entirely about plagiarism. So the
issue would seem to be that the R output is considered identical or nearly
indentical to R output in other published orotherwise  submitted material.
>
> What puzzles me (except for how a document can be deemed 32% plagiarized in
25% of the text) is whether this includes the numbers and variable names. If
those are somehow factored out, then any R regression could be pretty much
identical to any other R regression. However, two analyses with similar variable
names could happen if they are based on the same cookbook recipe and analyses
with similar numerical output come from analyzing the same standard data. Such
situations would not necessarily be considered plagiarism (I mean: If you claim
that you are analyzing data from experiments that you yourself have performed,
and your numbers are exactly identical to something that has been previously
published, then it would be suspect. If you analyze something from public
sources, someone else might well have done the same thing.).
>
> Similarly to John Kane, I think it is necessary to know exactly what
sources the text is claimed to be plagiarized from and/or what parts of the text
that are being matched by Urkund. If it turns out that Urkund is generating
false positives, then this needs to be pointed out to them and to the people
basing decisions on it.
>
> -pd
>
>> On 22 Sep 2015, at 18:24 , Marc Schwartz <marc_schwartz at
me.com> wrote:
>>
>> Hi,
>>
>> With the usual caveat that I Am Not A Lawyer....and that I am not
speaking on behalf of any organization...
>>
>> My guess is that they are claiming that the output of R, simply being
copied and pasted verbatim into your thesis constitutes the use of copyrighted
output from the software.
>>
>> It is not clear to me that R's output is copyrighted by the R
Foundation (or by other parties for CRAN packages), albeit, the source code
underlying R is, along with other copyright owner's as apropos. There is
some caselaw to support the notion that the output alone is not protected in a
similar manner, but that may be country specific.
>>
>> Did you provide any credit to R (see the output of citation() ) in your
thesis and indicate that your analyses were performed using R?
>>
>> If R is uncredited, I could see them raising the issue.
>>
>> You might check with your institution's legal/policy folks to see
if there is any guidance provided for students regarding the crediting of
software used in this manner, especially if that guidance is at no cost to you.
>>
>> Regards,
>>
>> Marc Schwartz
>>
>>
>>> On Sep 22, 2015, at 11:01 AM, Bert Gunter <bgunter.4567 at
gmail.com> wrote:
>>>
>>> 1. It is highly unlikely that we could be of help (unless someone
else
>>> has experienced this and knows what happened). You will have to
>>> contact the Urkund people and ask them why their algorithms raised
the
>>> flags.
>>>
>>> 2. But of course, the regression methodology is not "your
own" -- it's
>>> just a standard tool that you used in your work, which is entirely
>>> legitimate of course.
>>>
>>> Cheers,
>>> Bert
>>>
>>>
>>> Bert Gunter
>>>
>>> "Data is not information. Information is not knowledge. And
knowledge
>>> is certainly not wisdom."
>>>  -- Clifford Stoll
>>>
>>>
>>> On Tue, Sep 22, 2015 at 7:27 AM, BARRETT, Oliver
>>> <oliver.barrett at skema.edu> wrote:
>>>>
>>>> Dear 'R' community support,
>>>>
>>>>
>>>> I am a student at Skema business school and I have recently
submitted my MSc thesis/dissertation. This has been passed on to an external
plagiarism service provider, Urkund, who have scanned my document and returned a
plagiarism report to my professor having detected 32% plagiarism.
>>>>
>>>>
>>>> I have contacted Urkund regarding this issue having committed
no such plagiarism and they have told me that all the plagiarism detected in my
document comes from the last 25% which consists only of 'R' regressions
like the one I have pasted below:
>>>>
>>>> lm(formula = Prague50 ~ Fed + Fed.t.1. + Fed.t.2. + Fed.t.3. +
>>>>   Fed.t.4., data = OLS_CAR, x = TRUE)
>>>>
>>>> Residuals:
>>>>     Min        1Q    Median        3Q       Max
>>>> -0.154587 -0.015961  0.001429  0.017196  0.110907
>>>>
>>>> Coefficients:
>>>>            Estimate Std. Error t value Pr(>|t|)
>>>> (Intercept) -0.001630   0.001763  -0.925   0.3559
>>>> Fed         -0.121595   0.165359  -0.735   0.4627
>>>> Fed.t.1.     0.344014   0.140979   2.440   0.0153 *
>>>> Fed.t.2.     0.026529   0.143648   0.185   0.8536
>>>> Fed.t.3.     0.622357   0.142021   4.382 1.62e-05 ***
>>>> Fed.t.4.     0.291985   0.158914   1.837   0.0671 .
>>>> ---
>>>> Signif. codes:  0 '***' 0.001 '**' 0.01
'*' 0.05 '.' 0.1 ' ' 1
>>>>
>>>> Residual standard error: 0.0293 on 304 degrees of freedom
>>>> (20 observations deleted due to missingness)
>>>> Multiple R-squared:  0.08629,  Adjusted R-squared:  0.07126
>>>> F-statistic: 5.742 on 5 and 304 DF,  p-value: 4.422e-05
>>>>
>>>> I have produced all of these regressions myself and pasted them
directly from the 'R' software package. My regression methodology is
entirely my own along with the sourcing and preperation of the data used to
produce these statistics.
>>>>
>>>> I would be very grateful if you could provide my with some
clarity as to why this output from 'R' is reading as plagiarism.
>>>>
>>>> I would like to thank you in advance,
>>>>
>>>> Kind regards,
>>>>
>>>> Oliver Barrett
>>>> (+44) 7341 834 217
>>>>
>>>>       [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible
code.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Marc Schwartz

2015-Sep-22 20:27 UTC

head link

[R] 'R' Software Output Plagiarism

Peter,

Great distinction. 

I was leaning in the direction that the "look and feel" of the output
(standard wording, table structure, column headings, significance stars and so
forth in the output) is similar to whatever Urkund is using as the basis for the
comparison and less so on an exact replication (covariates, coefficients, etc.),
or nearly so, of prior work.

Thanks,

Marc

> On Sep 22, 2015, at 3:06 PM, peter dalgaard <pdalgd at gmail.com>
wrote:
> 
> Marc,
> 
> I don't think Copyright/Intellectual property issues factor into this.
Urkund and similar tools are to my knowledge entirely about plagiarism. So the
issue would seem to be that the R output is considered identical or nearly
indentical to R output in other published orotherwise  submitted material.
> 
> What puzzles me (except for how a document can be deemed 32% plagiarized in
25% of the text) is whether this includes the numbers and variable names. If
those are somehow factored out, then any R regression could be pretty much
identical to any other R regression. However, two analyses with similar variable
names could happen if they are based on the same cookbook recipe and analyses
with similar numerical output come from analyzing the same standard data. Such
situations would not necessarily be considered plagiarism (I mean: If you claim
that you are analyzing data from experiments that you yourself have performed,
and your numbers are exactly identical to something that has been previously
published, then it would be suspect. If you analyze something from public
sources, someone else might well have done the same thing.).
> 
> Similarly to John Kane, I think it is necessary to know exactly what
sources the text is claimed to be plagiarized from and/or what parts of the text
that are being matched by Urkund. If it turns out that Urkund is generating
false positives, then this needs to be pointed out to them and to the people
basing decisions on it.
> 
> -pd
> 
>> On 22 Sep 2015, at 18:24 , Marc Schwartz <marc_schwartz at
me.com> wrote:
>> 
>> Hi,
>> 
>> With the usual caveat that I Am Not A Lawyer....and that I am not
speaking on behalf of any organization...
>> 
>> My guess is that they are claiming that the output of R, simply being
copied and pasted verbatim into your thesis constitutes the use of copyrighted
output from the software.
>> 
>> It is not clear to me that R's output is copyrighted by the R
Foundation (or by other parties for CRAN packages), albeit, the source code
underlying R is, along with other copyright owner's as apropos. There is
some caselaw to support the notion that the output alone is not protected in a
similar manner, but that may be country specific.
>> 
>> Did you provide any credit to R (see the output of citation() ) in your
thesis and indicate that your analyses were performed using R?
>> 
>> If R is uncredited, I could see them raising the issue.
>> 
>> You might check with your institution's legal/policy folks to see
if there is any guidance provided for students regarding the crediting of
software used in this manner, especially if that guidance is at no cost to you.
>> 
>> Regards,
>> 
>> Marc Schwartz
>> 
>> 
>>> On Sep 22, 2015, at 11:01 AM, Bert Gunter <bgunter.4567 at
gmail.com> wrote:
>>> 
>>> 1. It is highly unlikely that we could be of help (unless someone
else
>>> has experienced this and knows what happened). You will have to
>>> contact the Urkund people and ask them why their algorithms raised
the
>>> flags.
>>> 
>>> 2. But of course, the regression methodology is not "your
own" -- it's
>>> just a standard tool that you used in your work, which is entirely
>>> legitimate of course.
>>> 
>>> Cheers,
>>> Bert
>>> 
>>> 
>>> Bert Gunter
>>> 
>>> "Data is not information. Information is not knowledge. And
knowledge
>>> is certainly not wisdom."
>>> -- Clifford Stoll
>>> 
>>> 
>>> On Tue, Sep 22, 2015 at 7:27 AM, BARRETT, Oliver
>>> <oliver.barrett at skema.edu> wrote:
>>>> 
>>>> Dear 'R' community support,
>>>> 
>>>> 
>>>> I am a student at Skema business school and I have recently
submitted my MSc thesis/dissertation. This has been passed on to an external
plagiarism service provider, Urkund, who have scanned my document and returned a
plagiarism report to my professor having detected 32% plagiarism.
>>>> 
>>>> 
>>>> I have contacted Urkund regarding this issue having committed
no such plagiarism and they have told me that all the plagiarism detected in my
document comes from the last 25% which consists only of 'R' regressions
like the one I have pasted below:
>>>> 
>>>> lm(formula = Prague50 ~ Fed + Fed.t.1. + Fed.t.2. + Fed.t.3. +
>>>>  Fed.t.4., data = OLS_CAR, x = TRUE)
>>>> 
>>>> Residuals:
>>>>    Min        1Q    Median        3Q       Max
>>>> -0.154587 -0.015961  0.001429  0.017196  0.110907
>>>> 
>>>> Coefficients:
>>>>           Estimate Std. Error t value Pr(>|t|)
>>>> (Intercept) -0.001630   0.001763  -0.925   0.3559
>>>> Fed         -0.121595   0.165359  -0.735   0.4627
>>>> Fed.t.1.     0.344014   0.140979   2.440   0.0153 *
>>>> Fed.t.2.     0.026529   0.143648   0.185   0.8536
>>>> Fed.t.3.     0.622357   0.142021   4.382 1.62e-05 ***
>>>> Fed.t.4.     0.291985   0.158914   1.837   0.0671 .
>>>> ---
>>>> Signif. codes:  0 '***' 0.001 '**' 0.01
'*' 0.05 '.' 0.1 ' ' 1
>>>> 
>>>> Residual standard error: 0.0293 on 304 degrees of freedom
>>>> (20 observations deleted due to missingness)
>>>> Multiple R-squared:  0.08629,  Adjusted R-squared:  0.07126
>>>> F-statistic: 5.742 on 5 and 304 DF,  p-value: 4.422e-05
>>>> 
>>>> I have produced all of these regressions myself and pasted them
directly from the 'R' software package. My regression methodology is
entirely my own along with the sourcing and preperation of the data used to
produce these statistics.
>>>> 
>>>> I would be very grateful if you could provide my with some
clarity as to why this output from 'R' is reading as plagiarism.
>>>> 
>>>> I would like to thank you in advance,
>>>> 
>>>> Kind regards,
>>>> 
>>>> Oliver Barrett
>>>> (+44) 7341 834 217

Duncan Murdoch

2015-Sep-23 00:33 UTC

head link

[R] 'R' Software Output Plagiarism

On 22/09/2015 4:06 PM, peter dalgaard wrote:> Marc,
> 
> I don't think Copyright/Intellectual property issues factor into this.
Urkund and similar tools are to my knowledge entirely about plagiarism. So the
issue would seem to be that the R output is considered identical or nearly
indentical to R output in other published orotherwise  submitted material.
> 
> What puzzles me (except for how a document can be deemed 32% plagiarized in
25% of the text) is whether this includes the numbers and variable names. If
those are somehow factored out, then any R regression could be pretty much
identical to any other R regression. However, two analyses with similar variable
names could happen if they are based on the same cookbook recipe and analyses
with similar numerical output come from analyzing the same standard data. Such
situations would not necessarily be considered plagiarism (I mean: If you claim
that you are analyzing data from experiments that you yourself have performed,
and your numbers are exactly identical to something that has been previously
published, then it would be suspect. If you analyze something from public
sources, someone else might well have done the same thing.).
I don't see why this puzzles you.  A simple explanation is that Urkund
is incompetent.

Many companies that sell software to university administrations are
incompetent, because the buyers have been promoted so far beyond their
competence that they'll buy anything if it is expensive enough.

This isn't uncommon.

Duncan Murdoch
> 
> Similarly to John Kane, I think it is necessary to know exactly what
sources the text is claimed to be plagiarized from and/or what parts of the text
that are being matched by Urkund. If it turns out that Urkund is generating
false positives, then this needs to be pointed out to them and to the people
basing decisions on it.
> 
> -pd
> 
>> On 22 Sep 2015, at 18:24 , Marc Schwartz <marc_schwartz at
me.com> wrote:
>>
>> Hi,
>>
>> With the usual caveat that I Am Not A Lawyer....and that I am not
speaking on behalf of any organization...
>>
>> My guess is that they are claiming that the output of R, simply being
copied and pasted verbatim into your thesis constitutes the use of copyrighted
output from the software.
>>
>> It is not clear to me that R's output is copyrighted by the R
Foundation (or by other parties for CRAN packages), albeit, the source code
underlying R is, along with other copyright owner's as apropos. There is
some caselaw to support the notion that the output alone is not protected in a
similar manner, but that may be country specific.
>>
>> Did you provide any credit to R (see the output of citation() ) in your
thesis and indicate that your analyses were performed using R?
>>
>> If R is uncredited, I could see them raising the issue.
>>
>> You might check with your institution's legal/policy folks to see
if there is any guidance provided for students regarding the crediting of
software used in this manner, especially if that guidance is at no cost to you.
>>
>> Regards,
>>
>> Marc Schwartz
>>
>>
>>> On Sep 22, 2015, at 11:01 AM, Bert Gunter <bgunter.4567 at
gmail.com> wrote:
>>>
>>> 1. It is highly unlikely that we could be of help (unless someone
else
>>> has experienced this and knows what happened). You will have to
>>> contact the Urkund people and ask them why their algorithms raised
the
>>> flags.
>>>
>>> 2. But of course, the regression methodology is not "your
own" -- it's
>>> just a standard tool that you used in your work, which is entirely
>>> legitimate of course.
>>>
>>> Cheers,
>>> Bert
>>>
>>>
>>> Bert Gunter
>>>
>>> "Data is not information. Information is not knowledge. And
knowledge
>>> is certainly not wisdom."
>>>  -- Clifford Stoll
>>>
>>>
>>> On Tue, Sep 22, 2015 at 7:27 AM, BARRETT, Oliver
>>> <oliver.barrett at skema.edu> wrote:
>>>>
>>>> Dear 'R' community support,
>>>>
>>>>
>>>> I am a student at Skema business school and I have recently
submitted my MSc thesis/dissertation. This has been passed on to an external
plagiarism service provider, Urkund, who have scanned my document and returned a
plagiarism report to my professor having detected 32% plagiarism.
>>>>
>>>>
>>>> I have contacted Urkund regarding this issue having committed
no such plagiarism and they have told me that all the plagiarism detected in my
document comes from the last 25% which consists only of 'R' regressions
like the one I have pasted below:
>>>>
>>>> lm(formula = Prague50 ~ Fed + Fed.t.1. + Fed.t.2. + Fed.t.3. +
>>>>   Fed.t.4., data = OLS_CAR, x = TRUE)
>>>>
>>>> Residuals:
>>>>     Min        1Q    Median        3Q       Max
>>>> -0.154587 -0.015961  0.001429  0.017196  0.110907
>>>>
>>>> Coefficients:
>>>>            Estimate Std. Error t value Pr(>|t|)
>>>> (Intercept) -0.001630   0.001763  -0.925   0.3559
>>>> Fed         -0.121595   0.165359  -0.735   0.4627
>>>> Fed.t.1.     0.344014   0.140979   2.440   0.0153 *
>>>> Fed.t.2.     0.026529   0.143648   0.185   0.8536
>>>> Fed.t.3.     0.622357   0.142021   4.382 1.62e-05 ***
>>>> Fed.t.4.     0.291985   0.158914   1.837   0.0671 .
>>>> ---
>>>> Signif. codes:  0 '***' 0.001 '**' 0.01
'*' 0.05 '.' 0.1 ' ' 1
>>>>
>>>> Residual standard error: 0.0293 on 304 degrees of freedom
>>>> (20 observations deleted due to missingness)
>>>> Multiple R-squared:  0.08629,  Adjusted R-squared:  0.07126
>>>> F-statistic: 5.742 on 5 and 304 DF,  p-value: 4.422e-05
>>>>
>>>> I have produced all of these regressions myself and pasted them
directly from the 'R' software package. My regression methodology is
entirely my own along with the sourcing and preperation of the data used to
produce these statistics.
>>>>
>>>> I would be very grateful if you could provide my with some
clarity as to why this output from 'R' is reading as plagiarism.
>>>>
>>>> I would like to thank you in advance,
>>>>
>>>> Kind regards,
>>>>
>>>> Oliver Barrett
>>>> (+44) 7341 834 217
>>>>
>>>>       [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible
code.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>

R help - Sep 2015 - 'R' Software Output Plagiarism

[R] 'R' Software Output Plagiarism

[R] 'R' Software Output Plagiarism

[R] 'R' Software Output Plagiarism

[R] 'R' Software Output Plagiarism

[R] 'R' Software Output Plagiarism