zhihui yan
2004-Nov-29 08:13 UTC
[Ocfs-users] Re: TAR 4182518.999: OCFS problems encoutered during support Intel project
Hi Dean, Thanks for your kind help. Since I couldn't use vpn in Intel and be a little late after I arrived home, I didn't get chance to modify the TAR on metalink. Tomorrow, I will go CDC and update it accordingly. We think all the problems should be caused by OCFS itself. There are two outstanding problem for now. First, we will get below error message when we issue "srvconfig -init" command to initialize OCR. oracle.ops.mgmt.rawdevice.RawDeviceException: PRKR-1064 : General Exception in OCR at oracle.ops.mgmt.rawdevice.RawDeviceUtil.<init>(RawDeviceUtil.java:136) at oracle.ops.mgmt.rawdevice.RawDeviceUtil.main(RawDeviceUtil.java:2071) I think this is one bug of OCFS for IA64 machine. I will ask Colin to confirm whether there be available fix or patch for this problem for OCFSv1 on IA64. Second, it will hang at 37% when we issue DBCA to create database. We found it hangs when it begin to create redo log files. Also need to ask OCFS team for help. I tried two version of OCFS for IA64. They are 1.0.13-11 and 1.0.11-1. Of course, we could exclude the possible reason of mismatch between linux kerenl and that needed by OCFS driver. (We first used RHEL3 Update4, kernel version is 2.4.21-20 while OCFS driver want 2.4.21-4. Then I reinstalled OS with Update 1, kernel version is 2.4.21-20, we found OCFS driver need 2.4.21- 4.102 something like that). I think maybe that is too strict for OCFS driver to use. Please give your suggestion. Best regards, Zhihui. Dean Tan wrote:> Now that should've some idea of how the problem occur, here's few > suggestions. > - Do you have all the required OS and Oracle s/w version, and patches? > - Search ITS/WebIV/Metalink on similiar cases > - Maybe set the owner of the TAR back to the Oracle Support person who > was trying to help us earlier, and work with him. > http://webiv.oraclecorp.com/cgi-bin/webiv/do.pl/Get?WwwID=tar:4182518.999 > > I've seen no update on this TAR, any problem you're having? > > regards, > dean. > > xiaohui he said the following on 11/28/2004 10:39 PM: > >> Zhihui wonder if there is some mismatch between the ocfs and os >> currently used . So we're confirming it with ocfs team. In the >> meantime, the customer is finding the update 1 OS so zhihui can do a >> testing on it. >> xiaohui he wrote: >> >>> zhihui called me that such error disappear if putting them on raw >>> devices but still hang at the step when creating redo log. >>> xiaohui he wrote: >>> >>>> How about set two raw devices for qurom and ocr instead of puting >>>> them on ocfs partition? >>>> zhihui yan wrote: >>>> >>>>> Hi Takeshi, >>>>> >>>>> Sorry for the late reply since it is difficult for me to check >>>>> mail in Intel. >>>>> >>>>> The error number is PRKR-1064 and PRKR-1005 during I executed DBCA to >>>>> create cluster database. I think all of them are resulted from the >>>>> problem that we >>>>> can't create OCR successfully using "srvconfig -init". In this >>>>> case, gsd couldn't >>>>> work normally. These two PRKR error are caused by this problem. >>>>> >>>>> In this stage, I think there is one possible reason. That is this >>>>> version of OCFS >>>>> for Itanium has one bug so that it couldn't support OCR >>>>> successfully. Also maybe >>>>> this problem could be related to bug2400133. I had reset up one >>>>> fresh environment >>>>> for our continuous analyzation last week. Hope we could get one >>>>> reasonable and >>>>> feasible fix this week. >>>>> >>>>> Thanks and best regards, >>>>> >>>>> Zhihui. >>>>> >>>>> Takeshi Watanabe wrote: >>>>> >>>>>> Zhihui, >>>>>> >>>>>> >>>>>> >>>>>>> All those problems are resulted from PRKR or PRKC error. >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> What's the exact error number? >>>>>> >>>>>> If it's same as Intel's case, is it PRKR-1064, right? >>>>>> Note:212631.1 mentions several reason for this. Did you check? >>>>>> But PRKC error doesn't exist in the log Xihua sent. >>>>>> >>>>>> And I'd like to know test server's ip/username/passwd in CDC. >>>>>> >>>>>> Regards >>>>>> Takeshi >>>>>> >>>>>> >>>> >>>> >>> >>> >>
Wim Coekaerts
2004-Nov-29 10:08 UTC
[Ocfs-users] Re: TAR 4182518.999: OCFS problems encoutered during support Intel project
> First, we will get below error message when we issue "srvconfig -init" > command > to initialize OCR.checkk the bug databaase. this is an srv bug, been around for a long time, there are patche at least for ia32. not an ocfs problem.> Second, it will hang at 37% when we issue DBCA to create database. We found > it hangs when it begin to create redo log files. Also need to ask OCFS > team for > help.hard to tell - we have this used in production so I suggest you do some more debugging, good exercise to check stack dumps and see if dbca is hanging inside dbca or inside a system call. a simple kernel stack dump or ps aux and see if procesess are (and remain) in D state.