Damon Miller
2009-May-15 17:30 UTC
[Ocfs2-users] Debugging help / Guidance on architecture
Hi all. This will be my first post to the mailing list so I apologize in advance if I omit relevant configuration / setup details. Please let me know what additional information is needed and I'll gladly supply it. We're running a 3-node OCFS2 1.2.9 cluster with a 5-TB iSCSI block device as the backing store. All machines are running CentOS, with the iSCSI target running CentOS 5.2 and the initiators running CentOS 4.7. The purpose of the cluster is to evaluate alternatives to our current solution for replicating audio files which are generated from multiple PBX servers running Asterisk. We currently use Unison for file-level replication to and from a dedicated machine such that there are multiple copies of the audio tree--one per PBX server. This allows us to quickly and easily move customers among our servers for load-balancing and disaster recovery purposes. Unfortunately, we're encountering scalability problems with the Unison-based approach, e.g. conflicts, slow propogation time, etc. The hope was that moving to a clustered filesystem would improve propogation time, reduce conflicts, and allow us to scale more effectively. I chose OCFS2 because it seemed the simplest solution architecturally and because of its certification by Oracle for use with the database product. (My thought was that Oracle's certification requirements would likely supercede those of a general-purpose filesystem, though please correct me if this was na?ve or misguided.) Having said all that, this morning around 7:00am EDT we began seeing OCFS2-related errors in one of our server's syslog. Specifically: -- May 15 07:08:00 cam-c6 kernel: o2net: no longer connected to node cam-p1 (num 1) at 10.10.89.110:7777 May 15 07:08:00 cam-c6 kernel: (17170,0):ocfs2_broadcast_vote:731 ERROR: status = -112 May 15 07:08:00 cam-c6 kernel: (17170,0):ocfs2_do_request_vote:804 ERROR: status = -112 May 15 07:08:00 cam-c6 kernel: (17170,0):ocfs2_rename:1207 ERROR: status = -112 May 15 07:08:00 cam-c6 kernel: (17170,0):ocfs2_broadcast_vote:731 ERROR: status = -107 May 15 07:08:00 cam-c6 kernel: (17170,0):ocfs2_do_request_vote:804 ERROR: status = -107 May 15 07:08:00 cam-c6 kernel: (17170,0):ocfs2_rename:1103 ERROR: status = -107 [last message repeated many times] May 15 07:08:30 cam-c6 kernel: (4335,0):o2net_connect_expired:1585 ERROR: no connection established with n ode 1 after 30.0 seconds, giving up and returning errors. ... May 15 09:22:29 cam-c6 kernel: (4335,0):o2net_connect_expired:1585 ERROR: no connection established with n ode 1 after 30.0 seconds, giving up and returning errors. -- This continued until 9:22am EDT, at which point one of our engineers manually rebooted the machine in an attempt to remedy the voicemail problems in response to Asterisk complaining of read/write problems to its voicemail tree. I was surprised OCFS2 didn't panic the kernel and automatically reboot the machine after the 30-second timeout. I thought this was the default behavior and in fact I forced this condition by manually stopping the iSCSI daemon during preliminary testing. Instead, the kernel complained for over two hours before someone manually rebooted the machine, at which point the cluster reconnected and resumed operation. Is this expected? According to the relevant switch (a managed Cisco) there was no interruption in network connectivity between these two machines. Neither server logged anything related to a network link failure so the only real information I have is from OCFS2. Frankly I'm not sure how to proceed from here but I obviously want to address the reliability concerns this problem raises since we're considering OCFS2 for replacing our existing solution throughout our datacenters. I tried to map the numerical error codes -112 and -107 to specific problems based on the code ('tcp.c' and 'vote.c' in particular) but I was unsuccessful. In general, I suppose I'm curious if anyone has high-level feedback on the planned use of OCFS2 in this scenario. Am I overcomplicating things? Assuming the pilot works, we do plan to roll out a dedicated storage network which will include redundant switching, NICs, iSCSI targets with multiple paths to the physical storage, etc. I just need to validate the basic approach at present. Thanks in advance for any information you can provide. I've attached our 'cluster.conf' file to this message. At present, only nodes 0, 1, and 7 are connected to the cluster. I included the other nodes in the config file so we could easily add them if we confirmed reliable operation through the pilot. In this configuration, 'cam-s1' is the iSCSI target while 'cam-p1' and 'cam-c6' are the connected nodes in the cluster. Here is output from 'df', 'mounted.ocfs2', and 'iscsi-ls': [root at cam-c6 ~]# df -H Filesystem Size Used Avail Use% Mounted on /dev/mapper/VolGroup00-LogVol00 294G 66G 213G 24% / /dev/sda1 104M 22M 78M 22% /boot none 7.5G 0 7.5G 0% /dev/shm none 8.6G 0 8.6G 0% /mnt/ramdisk /dev/sdc1 5.0T 82G 4.9T 2% /store1 [root at cam-c6 ~]# mounted.ocfs2 -d Device FS UUID Label /dev/sdc1 ocfs2 52415cf6-22e8-4a2c-a090-0f0448366e63 store1 [root at cam-c6 ~]# iscsi-ls ******************************************************************************* SFNet iSCSI Driver Version ...4:0.1.11-7(14-Apr-2008) ******************************************************************************* TARGET NAME : iqn.2009-01.com.thinkingphones:iscsi-tgt1:store1 TARGET ALIAS : HOST ID : 3 BUS ID : 0 TARGET ID : 0 TARGET ADDRESS : 10.10.89.105:3260,1 SESSION STATUS : ESTABLISHED AT Fri May 15 09:28:05 EDT 2009 SESSION ID : ISID 00023d000001 TSIH f00 ******************************************************************************* Regards, Damon -------------- next part -------------- A non-text attachment was scrubbed... Name: cluster.conf Type: application/octet-stream Size: 864 bytes Desc: cluster.conf Url : http://oss.oracle.com/pipermail/ocfs2-users/attachments/20090515/5d645d60/attachment.obj
Sunil Mushran
2009-May-16 01:52 UTC
[Ocfs2-users] Debugging help / Guidance on architecture
Damon Miller wrote:> We're running a 3-node OCFS2 1.2.9 cluster with a 5-TB iSCSI block device as the backing store. All machines are running CentOS, with the iSCSI target running CentOS 5.2 and the initiators running CentOS 4.7. The purpose of the cluster is to evaluate alternatives to our current solution for replicating audio files which are generated from multiple PBX servers running Asterisk. > > We currently use Unison for file-level replication to and from a dedicated machine such that there are multiple copies of the audio tree--one per PBX server. This allows us to quickly and easily move customers among our servers for load-balancing and disaster recovery purposes. Unfortunately, we're encountering scalability problems with the Unison-based approach, e.g. conflicts, slow propogation time, etc. > > The hope was that moving to a clustered filesystem would improve propogation time, reduce conflicts, and allow us to scale more effectively. I chose OCFS2 because it seemed the simplest solution architecturally and because of its certification by Oracle for use with the database product. (My thought was that Oracle's certification requirements would likely supercede those of a general-purpose filesystem, though please correct me if this was na?ve or misguided.) >Oracle cert requirements are based on the Oracle db workload. General purpose is all encompassing. There is no one certification that can be used for general purpose as it is hard to capture the essence of all possible workloads. Having said that, we have many users who are using it many different environments for many years now. So you are not breaking any new ground.> Having said all that, this morning around 7:00am EDT we began seeing OCFS2-related errors in one of our server's syslog. Specifically: > > -- > > May 15 07:08:00 cam-c6 kernel: o2net: no longer connected to node cam-p1 (num 1) at 10.10.89.110:7777 > May 15 07:08:00 cam-c6 kernel: (17170,0):ocfs2_broadcast_vote:731 ERROR: status = -112 > May 15 07:08:00 cam-c6 kernel: (17170,0):ocfs2_do_request_vote:804 ERROR: status = -112 > May 15 07:08:00 cam-c6 kernel: (17170,0):ocfs2_rename:1207 ERROR: status = -112 > May 15 07:08:00 cam-c6 kernel: (17170,0):ocfs2_broadcast_vote:731 ERROR: status = -107 > May 15 07:08:00 cam-c6 kernel: (17170,0):ocfs2_do_request_vote:804 ERROR: status = -107 > May 15 07:08:00 cam-c6 kernel: (17170,0):ocfs2_rename:1103 ERROR: status = -107 > > [last message repeated many times] > > May 15 07:08:30 cam-c6 kernel: (4335,0):o2net_connect_expired:1585 ERROR: no connection established with n > ode 1 after 30.0 seconds, giving up and returning errors. > > ... > > May 15 09:22:29 cam-c6 kernel: (4335,0):o2net_connect_expired:1585 ERROR: no connection established with n > ode 1 after 30.0 seconds, giving up and returning errors. > > -- > > > This continued until 9:22am EDT, at which point one of our engineers manually rebooted the machine in an attempt to remedy the voicemail problems in response to Asterisk complaining of read/write problems to its voicemail tree. > > I was surprised OCFS2 didn't panic the kernel and automatically reboot the machine after the 30-second timeout. I thought this was the default behavior and in fact I forced this condition by manually stopping the iSCSI daemon during preliminary testing. Instead, the kernel complained for over two hours before someone manually rebooted the machine, at which point the cluster reconnected and resumed operation. Is this expected? >Connection between two nodes can snap. But it not reconnecting is strange. One would think that 30 secs would be more than adequate for two nodes to make a tcp connect. Do you have any firewalls in-between that could be interfering?> According to the relevant switch (a managed Cisco) there was no interruption in network connectivity between these two machines. Neither server logged anything related to a network link failure so the only real information I have is from OCFS2. Frankly I'm not sure how to proceed from here but I obviously want to address the reliability concerns this problem raises since we're considering OCFS2 for replacing our existing solution throughout our datacenters. >If a firewall (iptables) is responsible, then it will not show up as a link failure.> I tried to map the numerical error codes -112 and -107 to specific problems based on the code ('tcp.c' and 'vote.c' in particular) but I was unsuccessful. >ENOTCONN.> In general, I suppose I'm curious if anyone has high-level feedback on the planned use of OCFS2 in this scenario. Am I overcomplicating things? Assuming the pilot works, we do plan to roll out a dedicated storage network which will include redundant switching, NICs, iSCSI targets with multiple paths to the physical storage, etc. I just need to validate the basic approach at present. >You email does not actually say how you are using the fs. You have mentioned the older replication method. I would imagine that that is not your concern now. The qs is: how are the nodes accessing the fs now? How many files do you have in a dir? Are all nodes creating files in one dir? What other types of contention is there? OCFS2 can handle contention. The thing to remember is that contention even on a single node will affect the performance. It only affects more in a clustered setup.