Displaying 12 results from an estimated 12 matches for "pcopy".
Did you mean:
copy
2014 Nov 20
2
Second copy engine on GF116
Hello,
There's a long-standing bug on nouveau (this is a sample bug, but the
issue has been around for a while:
https://bugs.freedesktop.org/show_bug.cgi?id=85465) whereby we attempt
to use the second PCOPY engine on GF116, and it is sometimes does
nothing, despite mmio register 22500 saying that it's not disabled
(0x22500 == 0 for this user). In the bug you can see a dump from
22400..22600, and all values after 22440 are read as 0. The issue
appears to be more common on mobile GF116's, but I...
2014 Nov 21
3
Second copy engine on GF116
...t is probably easiest to just ignore it. You can distinguish this
> decompress engine from normal copy engine by looking at the CE capability
> register on falcon (0x00000650). If bit 2 is '1', then the falcon is
> a decompress engine.
I presume you mean a +0x650 register on the pcopy engines (0x104000
and 0x105000). I only have access to the GF108 right now, which
returns 3 for 0x104650 and 4 for 0x105650. We're using the engine at
0x104000 for copy on the GF108...
>From my admittedly limited understanding, both 0x104000 and 0x105000
appear to be falcon engines, where t...
2016 Dec 31
2
Automatic Insertion of OpenACC/OpenMP directives
...both arrays, 'x' and 'y':
float saxpy(float a, float *x, float *y, int n) {
int j = 0;
long long int AI1[6];
AI1[0] = n + -1;
AI1[1] = 4 * AI1[0];
AI1[2] = AI1[1] + 4;
AI1[3] = AI1[2] / 4;
AI1[4] = (AI1[3] > 0);
AI1[5] = (AI1[4] ? AI1[3] : 0);
#pragma acc data pcopy(x[0:AI1[5]],y[0:AI1[5]])
#pragma acc kernels
for (int i = 0; i < n; ++i) {
y[j] = a * x[i] + y[j];
++j;
}
}
Regards,
Fernando
On Sat, Dec 31, 2016 at 3:58 PM, Mehdi Amini <mehdi.amini at apple.com> wrote:
> Hi,
>
>> On Dec 31, 2016, at 8:33 AM, Fernando Magno Q...
2016 Dec 31
3
Automatic Insertion of OpenACC/OpenMP directives
...= a*x[i] + y[i];
}
Then dawn-cc produces the code below:
void saxpy(float a, float *x, float *y, int n) {
long long int AI1[6];
AI1[0] = n - 1;
AI1[1] = 4 * AI1[0];
AI1[2] = AI1[1] + 4;
AI1[3] = AI1[2] / 4;
AI1[4] = (AI1[3] > 0);
AI1[5] = (AI1[4] ? AI1[3] : 0);
#pragma acc data pcopy(x[0:AI1[5]],y[0:AI1[5]])
#pragma acc kernels
for (int i = 0; i < n; ++i)
y[i] = a * x[i] + y[i];
}
I was wondering if we could add a link to dawn-cc in the LLVM's
project page (http://llvm.org/ProjectsWithLLVM/). There are a number
of papers that describe what dawn-cc does. The main...
2014 Nov 25
0
Second copy engine on GF116
...just ignore it. You can distinguish this
> > decompress engine from normal copy engine by looking at the CE capability
> > register on falcon (0x00000650). If bit 2 is '1', then the falcon is
> > a decompress engine.
>
> I presume you mean a +0x650 register on the pcopy engines (0x104000
> and 0x105000). I only have access to the GF108 right now, which
> returns 3 for 0x104650 and 4 for 0x105650. We're using the engine at
> 0x104000 for copy on the GF108...
Yes, 0x104650 and 0x105650 are the right addresses, from what I can tell.
FWIW, the other cap...
2016 Dec 31
0
Automatic Insertion of OpenACC/OpenMP directives
...the code below:
>
> void saxpy(float a, float *x, float *y, int n) {
> long long int AI1[6];
> AI1[0] = n - 1;
> AI1[1] = 4 * AI1[0];
> AI1[2] = AI1[1] + 4;
> AI1[3] = AI1[2] / 4;
> AI1[4] = (AI1[3] > 0);
> AI1[5] = (AI1[4] ? AI1[3] : 0);
> #pragma acc data pcopy(x[0:AI1[5]],y[0:AI1[5]])
> #pragma acc kernels
> for (int i = 0; i < n; ++i)
> y[i] = a * x[i] + y[i];
> }
That’s nice!
I’m wondering about how would you handle the issue that retro-fitting the pragma on the original code from the optimized can be fragile. For example if I us...
2014 Nov 21
0
Second copy engine on GF116
...On Thu, Nov 20, 2014 at 02:18:02PM -0500, Ilia Mirkin wrote:
> Hello,
>
> There's a long-standing bug on nouveau (this is a sample bug, but the
> issue has been around for a while:
> https://bugs.freedesktop.org/show_bug.cgi?id=85465) whereby we attempt
> to use the second PCOPY engine on GF116, and it is sometimes does
> nothing, despite mmio register 22500 saying that it's not disabled
> (0x22500 == 0 for this user). In the bug you can see a dump from
> 22400..22600, and all values after 22440 are read as 0. The issue
> appears to be more common on mobile...
2017 Jan 03
2
Automatic Insertion of OpenACC/OpenMP directives
...y, int n) {
>> int j = 0;
>>
>> long long int AI1[6];
>> AI1[0] = n + -1;
>> AI1[1] = 4 * AI1[0];
>> AI1[2] = AI1[1] + 4;
>> AI1[3] = AI1[2] / 4;
>> AI1[4] = (AI1[3] > 0);
>> AI1[5] = (AI1[4] ? AI1[3] : 0);
>> #pragma acc data pcopy(x[0:AI1[5]],y[0:AI1[5]])
>> #pragma acc kernels
>> for (int i = 0; i < n; ++i) {
>> y[j] = a * x[i] + y[j];
>> ++j;
>> }
>
> I'm not familiar with OpenACC, but doesn't this still have a loop carried dependence on j, and therefore isn't co...
2014 Nov 25
3
Second copy engine on GF116
...u can distinguish this
>> > decompress engine from normal copy engine by looking at the CE capability
>> > register on falcon (0x00000650). If bit 2 is '1', then the falcon is
>> > a decompress engine.
>>
>> I presume you mean a +0x650 register on the pcopy engines (0x104000
>> and 0x105000). I only have access to the GF108 right now, which
>> returns 3 for 0x104650 and 4 for 0x105650. We're using the engine at
>> 0x104000 for copy on the GF108...
>
> Yes, 0x104650 and 0x105650 are the right addresses, from what I can tell....
2014 Jan 17
19
[Bug 73744] New: Constant lock ups with NVIDIA GeForce 8400 GS
https://bugs.freedesktop.org/show_bug.cgi?id=73744
Priority: medium
Bug ID: 73744
Assignee: nouveau at lists.freedesktop.org
Summary: Constant lock ups with NVIDIA GeForce 8400 GS
QA Contact: xorg-team at lists.x.org
Severity: blocker
Classification: Unclassified
OS: Linux (All)
Reporter: grave_123 at
2014 Nov 25
0
Second copy engine on GF116
...t; >> > decompress engine from normal copy engine by looking at the CE capability
> >> > register on falcon (0x00000650). If bit 2 is '1', then the falcon is
> >> > a decompress engine.
> >>
> >> I presume you mean a +0x650 register on the pcopy engines (0x104000
> >> and 0x105000). I only have access to the GF108 right now, which
> >> returns 3 for 0x104650 and 4 for 0x105650. We're using the engine at
> >> 0x104000 for copy on the GF108...
> >
> > Yes, 0x104650 and 0x105650 are the right address...
2014 Nov 26
1
Second copy engine on GF116
...t;> decompress engine from normal copy engine by looking at the CE capability
>>>>> register on falcon (0x00000650). If bit 2 is '1', then the falcon is
>>>>> a decompress engine.
>>>>
>>>> I presume you mean a +0x650 register on the pcopy engines (0x104000
>>>> and 0x105000). I only have access to the GF108 right now, which
>>>> returns 3 for 0x104650 and 4 for 0x105650. We're using the engine at
>>>> 0x104000 for copy on the GF108...
>>>
>>> Yes, 0x104650 and 0x105650 are th...