I have been trying to determine why scp runs slower on T1000 and T2000 systems as compared to other SPARC systems. I have been using the profile provider to help me determine what instructions run "hot" on the T1000 I have available for my to test. Here is the script I am using: #!/usr/sbin/dtrace -s int cnter; profile-997 /arg1 && pid == $target/ { @aes[ustack(1)]=count(); cnter++; } profile-997 /cnter >=40000 / { exit(0); } I then take this data to find the hot functions and the hot instructions within those functions. Looking at the data, the hottest function is AES_encrypt (no surprise there). It has 50% more hits on the T1000 than a V220 I am testing on. This is interesting, because using timex, scp takes about 50% longer on the T1000 than the V220, overall, even though the T2000 has a clock of 1GHz and the V220 has only 450MHz. Even more interesting, the AES_encrypt function shows less hotspots on the T1000 than the V220. Here is a bit of output from the listing of AES_encrypt from both systems. The numbers are the number of hits from 5 runs of the profile script above. Blanks are "no hits". V220: 311 AES_encrypt+0x110: e0 02 c0 01 ld [%o3 + %g1], %l0 2 AES_encrypt+0x114: 97 34 60 0e srl %l1, 0xe, %o3 AES_encrypt+0x118: 84 04 fc 00 add %l3, -0x400, %g2 342 AES_encrypt+0x11c: c6 02 80 16 ld [%o2 + %l6], %g3 1 AES_encrypt+0x120: 90 0a 63 fc and %o1, 0x3fc, %o0 4 AES_encrypt+0x124 88 0d 20 ff and %l4, 0xff, %g4 338 AES_encrypt+0x128: b1 29 20 02 sll %g4, 0x2, %i0 AES_encrypt+0x12c: f8 02 00 02 ld [%o0 + %g2], %i4 AES_encrypt+0x130: 9e 0b 7f fc and %o5, -0x4, %o7 314 AES_encrypt+0x134 93 35 20 06 srl %l4, 0x6, %o1 AES_encrypt+0x138 fa 06 00 13 ld [%i0 + %l3], %i5 AES_encrypt+0x13c b6 1c 00 03 xor %l0, %g3, %i3 317 AES_encrypt+0x140 c6 03 c0 01 ld [%o7 + %g1], %g3 11 AES_encrypt+0x144 94 0a e3 fc and %o3, 0x3fc, %o2 AES_encrypt+0x148 98 1e c0 1c xor %i3, %i4, %o4 T1000: 123 AES_encrypt+0x110: e0 02 c0 01 ld [%o3 + %g1], %l0 257 AES_encrypt+0x114: 97 34 60 0e srl %l1, 0xe, %o3 109 AES_encrypt+0x118: 84 04 fc 00 add %l3, -0x400, %g2 123 AES_encrypt+0x11c: c6 02 80 16 ld [%o2 + %l6], %g3 285 AES_encrypt+0x120: 90 0a 63 fc and %o1, 0x3fc, %o0 137 AES_encrypt+0x124: 88 0d 20 ff and %l4, 0xff, %g4 116 AES_encrypt+0x128: b1 29 20 02 sll %g4, 0x2, %i0 122 AES_encrypt+0x12c: f8 02 00 02 ld [%o0 + %g2], %i4 257 AES_encrypt+0x130: 9e 0b 7f fc and %o5, -0x4, %o7 123 AES_encrypt+0x134: 93 35 20 06 srl %l4, 0x6, %o1 109 AES_encrypt+0x138: fa 06 00 13 ld [%i0 + %l3], %i5 309 AES_encrypt+0x13c: b6 1c 00 03 xor %l0, %g3, %i3 136 AES_encrypt+0x140: c6 03 c0 01 ld [%o7 + %g1], %g3 196 AES_encrypt+0x144: 94 0a e3 fc and %o3, 0x3fc, %o2 116 AES_encrypt+0x148: 98 1e c0 1c xor %i3, %i4, %o4 Notice that there are particular instructions on the V220 that take awhile, and then the next couple have no hits. On the T1000, they are all hit, in a range of 100 to 300. I suspect this has something to do with the CMT, that where a cache miss on the V220 stalls the instruction, on the T1000 it causes a thread switch, and when the profile probe fires the script I have above won''t record it. Does the profile provider fire for the virtual CPUs? Anyway, I am at a loss. Is there any way to cause the profile provider to record the PC of a process even if that process is not on the CPU? Perhaps the tick provider would be better? Any ideas how to figure this out? -- blu There are two rules in life: Rule 1- Don''t tell people everything you know ---------------------------------------------------------------------- Brian Utterback - Solaris RPE, Sun Microsystems, Inc. Ph:877-259-7345, Em:brian.utterback-at-ess-you-enn-dot-kom