We''ve been talking a lot recently about failure rates and types of failures. As you may know, I do look at field data and generally don''t ask the group for more data. But this time, for various reasons (I might have found a bug or deficiency) I''m soliciting for more data at large. What I''d like to gather is the error rates per bytes transferred. This data is collected in kstats, but is reset when you reboot. One of the features of my vast collection of field data is that it is often collected rather soon after a reboot. Thus, there aren''t very many bytes transferred yet, and the corresponding error rates tend to be small (often 0). A perfect collection would be from a machine connected to lots of busy disks which has been up for a very long time. Can you help? It is real simple. Just email me the output of: kstat -pc disk kstat -pc device_error for systems which have been up a while and, preferably, have lots of disks. From this data I (or you) can calculate error rates per iops or bytes. I''ll be doing statistical analysis, so the more samples, the better it is. Note: the context of the data is somewhat imprecise regarding specific failure rates. kstats are usually just counters. Detailed per-failure analysis requires different telemetry than kstats. However, overall rates, even simple counts, should be a leading indicator. e-mail directly to me. Thanks. -- richard
Hello Richard, Friday, January 26, 2007, 11:36:07 PM, you wrote: RE> We''ve been talking a lot recently about failure rates and types of RE> failures. As you may know, I do look at field data and generally don''t RE> ask the group for more data. But this time, for various reasons (I RE> might have found a bug or deficiency) I''m soliciting for more data at RE> large. RE> What I''d like to gather is the error rates per bytes transferred. This RE> data is collected in kstats, but is reset when you reboot. One of the RE> features of my vast collection of field data is that it is often collected RE> rather soon after a reboot. Thus, there aren''t very many bytes transferred RE> yet, and the corresponding error rates tend to be small (often 0). A perfect RE> collection would be from a machine connected to lots of busy disks which RE> has been up for a very long time. RE> Can you help? It is real simple. Just email me the output of: I''ve sent you off list. Will those results (total statistics, not site specific) be publicly provided by you (here?)? -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Robert Milkowski wrote:> Hello Richard, > > Friday, January 26, 2007, 11:36:07 PM, you wrote: > > RE> We''ve been talking a lot recently about failure rates and types of > RE> failures. As you may know, I do look at field data and generally don''t > RE> ask the group for more data. But this time, for various reasons (I > RE> might have found a bug or deficiency) I''m soliciting for more data at > RE> large. > > RE> What I''d like to gather is the error rates per bytes transferred. This > RE> data is collected in kstats, but is reset when you reboot. One of the > RE> features of my vast collection of field data is that it is often collected > RE> rather soon after a reboot. Thus, there aren''t very many bytes transferred > RE> yet, and the corresponding error rates tend to be small (often 0). A perfect > RE> collection would be from a machine connected to lots of busy disks which > RE> has been up for a very long time. > > RE> Can you help? It is real simple. Just email me the output of: > > I''ve sent you off list.Thanks.> Will those results (total statistics, not site specific) be publicly > provided by you (here?)?Sure. -- richard