search for: pcopy

Displaying 12 results from an estimated 12 matches for "pcopy".

Did you mean: copy
2014 Nov 20
2
Second copy engine on GF116
Hello, There's a long-standing bug on nouveau (this is a sample bug, but the issue has been around for a while: https://bugs.freedesktop.org/show_bug.cgi?id=85465) whereby we attempt to use the second PCOPY engine on GF116, and it is sometimes does nothing, despite mmio register 22500 saying that it's not disabled (0x22500 == 0 for this user). In the bug you can see a dump from 22400..22600, and all values after 22440 are read as 0. The issue appears to be more common on mobile GF116's, but I...
2014 Nov 21
3
Second copy engine on GF116
...t is probably easiest to just ignore it. You can distinguish this > decompress engine from normal copy engine by looking at the CE capability > register on falcon (0x00000650). If bit 2 is '1', then the falcon is > a decompress engine. I presume you mean a +0x650 register on the pcopy engines (0x104000 and 0x105000). I only have access to the GF108 right now, which returns 3 for 0x104650 and 4 for 0x105650. We're using the engine at 0x104000 for copy on the GF108... >From my admittedly limited understanding, both 0x104000 and 0x105000 appear to be falcon engines, where t...
2016 Dec 31
2
Automatic Insertion of OpenACC/OpenMP directives
...both arrays, 'x' and 'y': float saxpy(float a, float *x, float *y, int n) { int j = 0; long long int AI1[6]; AI1[0] = n + -1; AI1[1] = 4 * AI1[0]; AI1[2] = AI1[1] + 4; AI1[3] = AI1[2] / 4; AI1[4] = (AI1[3] > 0); AI1[5] = (AI1[4] ? AI1[3] : 0); #pragma acc data pcopy(x[0:AI1[5]],y[0:AI1[5]]) #pragma acc kernels for (int i = 0; i < n; ++i) { y[j] = a * x[i] + y[j]; ++j; } } Regards, Fernando On Sat, Dec 31, 2016 at 3:58 PM, Mehdi Amini <mehdi.amini at apple.com> wrote: > Hi, > >> On Dec 31, 2016, at 8:33 AM, Fernando Magno Q...
2016 Dec 31
3
Automatic Insertion of OpenACC/OpenMP directives
...= a*x[i] + y[i]; } Then dawn-cc produces the code below: void saxpy(float a, float *x, float *y, int n) { long long int AI1[6]; AI1[0] = n - 1; AI1[1] = 4 * AI1[0]; AI1[2] = AI1[1] + 4; AI1[3] = AI1[2] / 4; AI1[4] = (AI1[3] > 0); AI1[5] = (AI1[4] ? AI1[3] : 0); #pragma acc data pcopy(x[0:AI1[5]],y[0:AI1[5]]) #pragma acc kernels for (int i = 0; i < n; ++i) y[i] = a * x[i] + y[i]; } I was wondering if we could add a link to dawn-cc in the LLVM's project page (http://llvm.org/ProjectsWithLLVM/). There are a number of papers that describe what dawn-cc does. The main...
2014 Nov 25
0
Second copy engine on GF116
...just ignore it. You can distinguish this > > decompress engine from normal copy engine by looking at the CE capability > > register on falcon (0x00000650). If bit 2 is '1', then the falcon is > > a decompress engine. > > I presume you mean a +0x650 register on the pcopy engines (0x104000 > and 0x105000). I only have access to the GF108 right now, which > returns 3 for 0x104650 and 4 for 0x105650. We're using the engine at > 0x104000 for copy on the GF108... Yes, 0x104650 and 0x105650 are the right addresses, from what I can tell. FWIW, the other cap...
2016 Dec 31
0
Automatic Insertion of OpenACC/OpenMP directives
...the code below: > > void saxpy(float a, float *x, float *y, int n) { > long long int AI1[6]; > AI1[0] = n - 1; > AI1[1] = 4 * AI1[0]; > AI1[2] = AI1[1] + 4; > AI1[3] = AI1[2] / 4; > AI1[4] = (AI1[3] > 0); > AI1[5] = (AI1[4] ? AI1[3] : 0); > #pragma acc data pcopy(x[0:AI1[5]],y[0:AI1[5]]) > #pragma acc kernels > for (int i = 0; i < n; ++i) > y[i] = a * x[i] + y[i]; > } That’s nice! I’m wondering about how would you handle the issue that retro-fitting the pragma on the original code from the optimized can be fragile. For example if I us...
2014 Nov 21
0
Second copy engine on GF116
...On Thu, Nov 20, 2014 at 02:18:02PM -0500, Ilia Mirkin wrote: > Hello, > > There's a long-standing bug on nouveau (this is a sample bug, but the > issue has been around for a while: > https://bugs.freedesktop.org/show_bug.cgi?id=85465) whereby we attempt > to use the second PCOPY engine on GF116, and it is sometimes does > nothing, despite mmio register 22500 saying that it's not disabled > (0x22500 == 0 for this user). In the bug you can see a dump from > 22400..22600, and all values after 22440 are read as 0. The issue > appears to be more common on mobile...
2017 Jan 03
2
Automatic Insertion of OpenACC/OpenMP directives
...y, int n) { >> int j = 0; >> >> long long int AI1[6]; >> AI1[0] = n + -1; >> AI1[1] = 4 * AI1[0]; >> AI1[2] = AI1[1] + 4; >> AI1[3] = AI1[2] / 4; >> AI1[4] = (AI1[3] > 0); >> AI1[5] = (AI1[4] ? AI1[3] : 0); >> #pragma acc data pcopy(x[0:AI1[5]],y[0:AI1[5]]) >> #pragma acc kernels >> for (int i = 0; i < n; ++i) { >> y[j] = a * x[i] + y[j]; >> ++j; >> } > > I'm not familiar with OpenACC, but doesn't this still have a loop carried dependence on j, and therefore isn't co...
2014 Nov 25
3
Second copy engine on GF116
...u can distinguish this >> > decompress engine from normal copy engine by looking at the CE capability >> > register on falcon (0x00000650). If bit 2 is '1', then the falcon is >> > a decompress engine. >> >> I presume you mean a +0x650 register on the pcopy engines (0x104000 >> and 0x105000). I only have access to the GF108 right now, which >> returns 3 for 0x104650 and 4 for 0x105650. We're using the engine at >> 0x104000 for copy on the GF108... > > Yes, 0x104650 and 0x105650 are the right addresses, from what I can tell....
2014 Jan 17
19
[Bug 73744] New: Constant lock ups with NVIDIA GeForce 8400 GS
https://bugs.freedesktop.org/show_bug.cgi?id=73744 Priority: medium Bug ID: 73744 Assignee: nouveau at lists.freedesktop.org Summary: Constant lock ups with NVIDIA GeForce 8400 GS QA Contact: xorg-team at lists.x.org Severity: blocker Classification: Unclassified OS: Linux (All) Reporter: grave_123 at
2014 Nov 25
0
Second copy engine on GF116
...t; >> > decompress engine from normal copy engine by looking at the CE capability > >> > register on falcon (0x00000650). If bit 2 is '1', then the falcon is > >> > a decompress engine. > >> > >> I presume you mean a +0x650 register on the pcopy engines (0x104000 > >> and 0x105000). I only have access to the GF108 right now, which > >> returns 3 for 0x104650 and 4 for 0x105650. We're using the engine at > >> 0x104000 for copy on the GF108... > > > > Yes, 0x104650 and 0x105650 are the right address...
2014 Nov 26
1
Second copy engine on GF116
...t;> decompress engine from normal copy engine by looking at the CE capability >>>>> register on falcon (0x00000650). If bit 2 is '1', then the falcon is >>>>> a decompress engine. >>>> >>>> I presume you mean a +0x650 register on the pcopy engines (0x104000 >>>> and 0x105000). I only have access to the GF108 right now, which >>>> returns 3 for 0x104650 and 4 for 0x105650. We're using the engine at >>>> 0x104000 for copy on the GF108... >>> >>> Yes, 0x104650 and 0x105650 are th...