thr3ads.net - similar to: "Semi-OT: torque, pbs

Displaying 20 results from an estimated 3000 matches similar to: "Semi-OT: torque, pbs_mom, cpuset, loglevel"

2008 Sep 30

Broken pipe, x86_64 CentOS 5.2

Hi All, I have a problem with torque (openPBS) on x86_64 CentOS 5.2. Just to add there's no problem on a 32bit CentOS 5.2 or 64bit Ubuntu 8.04. The problem is that pbs_mom's child quits without giving any error logs. [root at frodo9 torque-2.3.3]# strace -f pbs_mom . . . bind(6, {sa_family=AF_INET, sin_port=htons(15002), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 time(NULL)

abrtd problems

2012 Jun 26

abrtd problems

We're trying to debug a problem: a server that reboots spontaneously when this user's large, multithreaded program's running. Sometimes it won't do it for hours, other times it's literally every 10 min. I've run iostat, netstat, have top running, tail -f /var/log/dmesg, *nada*. Nothing out of the ordinary. One thing that's constant: as the system's coming back up,

Xen and Torque

2008 Apr 26

Xen and Torque

Dear Xen users. Have anyone tried to integrate Xen with Torque resource management system? Could you please help me with an advice for a system I''m developing that relies on torque. Let me describe the system first. The part of the system that talks with torque should request a certain amount on nodes of a cluster and launch there a virtual machine instance (one vm instance per host).

SIGPIPE in assorted apps after "yum update"

2008 Jul 07

SIGPIPE in assorted apps after "yum update"

Hello, I have several systems which I recently updated with yum -y update to all the latest packages. These systems use yum-priorities and use the CentOS (priority 1) EPEL (priority 5) and rpmforge (priority 10) repositories. After the updates, dhcpd stopped working with a SIGPIPE error which occurs shortly after it attempts to fork into the background. I worked around that problem by building

serious problem with torque

2015 May 27

serious problem with torque

Mark, You might really want to compile torque from source (into an RPM if you'd like) and redistribute that. Every version is a little wonky and those of us that use(d) it often will poke around until we find a version / patch-set that makes us happy and stick with that for a bit. It's not an exact science and newer / higher versions are not always better. As for the downgrade comment:

serious problem with torque

2015 May 27

serious problem with torque

On Wed, May 27, 2015 10:55 am, Zachary Giles wrote: > Mark, You might really want to compile torque from source (into an RPM > if you'd like) and redistribute that. Every version is a little wonky > and those of us that use(d) it often will poke around until we find a > version / patch-set that makes us happy and stick with that for a bit. > It's not an exact science and

serious problem with torque

2015 May 27

serious problem with torque

On 05/27/2015 09:07 AM, m.roth at 5-cent.us wrote: > Hi, folks, > > The other admin updated torque without testing it on one machine, and > we had Issues. The first I knew was when a user reported qstat > returning > socket_connect_unix failed: 15137 > socket_connect_unix failed: 15137 > socket_connect_unix failed: 15137 > qstat: cannot connect to server (null)

serious problem with torque

2015 May 27

serious problem with torque

On Wed, May 27, 2015 9:46 am, m.roth at 5-cent.us wrote: > Johnny Hughes wrote: >> On 05/27/2015 09:07 AM, m.roth at 5-cent.us wrote: >>> Hi, folks, >>> >>> The other admin updated torque without testing it on one machine, >>> and >>> we had Issues. The first I knew was when a user reported qstat >>> returning >>>

Anyone using torque/pbs/munge?

2015 Feb 19

Anyone using torque/pbs/munge?

CentOS 6.6 I've got two servers, server1 and hbs (honkin' big server). Both are running munge, and torque... *separately*. My problem is that I've got users who want to be able to submit from server1 to hbs. I see that munged can be pointed to an alternate keyfile... but is there any way to tell qsub what to use? (And yes, I got on the torque users' list, and I'm trying

serious problem with torque

2015 May 27

serious problem with torque

Johnny Hughes wrote: > On 05/27/2015 09:07 AM, m.roth at 5-cent.us wrote: >> Hi, folks, >> >> The other admin updated torque without testing it on one machine, and >> we had Issues. The first I knew was when a user reported qstat >> returning >> socket_connect_unix failed: 15137 >> socket_connect_unix failed: 15137 >> socket_connect_unix

[libvirtd] qemu_process: reset CPU affinity to all enabled CPUs, when runs in custom cpuset

2019 Aug 29

[libvirtd] qemu_process: reset CPU affinity to all enabled CPUs, when runs in custom cpuset

Hello All, Since 4.5.0-23.el7 version (Red Hat 7.7), when I launch pinned VM, libvirtd reset CPU affinity to all enabled in host CPUs, if it runs in custom cpuset. I can't reproduce this behavior with 4.5.0-10.el7_6.12 with the same kernel version (Red Hat 7.7). Libvirt runs in a custom cpuset 'libvirt', where the number of available cpus is restricted to 0,2,4,6,8. And this

torque/psb & snow library

2008 Oct 22

torque/psb & snow library

Hello all; I'm trying to execute parallel jobs trough library snow on a cluster built through torque/PSB. I'm succesfully obtaining the cluster with: >system("cat $PBS_NODEFILE > cluster.txt") >mycluster <- scan(file="cluster.txt",what="character") >cl <- makeSOCKcluster(mycluster) The only problem, at the moment, is that if I use

OpenMPI not compiled with Torque support

2007 Dec 29

OpenMPI not compiled with Torque support

The OpenMPI package that ships with CentOS 5.1 does not seem to be compiled with torque support. It does, however, seem to be compiled with gridengine and slurm support. Would it be possible to get this changed?

serious problem with torque

2015 May 27

serious problem with torque

Hi, folks, The other admin updated torque without testing it on one machine, and we had Issues. The first I knew was when a user reported qstat returning socket_connect_unix failed: 15137 socket_connect_unix failed: 15137 socket_connect_unix failed: 15137 qstat: cannot connect to server (null) (errno=15137) could not connect to trqauthd Attempting to restart the pbs_server did the same.

HPC question: torques replacement

2020 Apr 17

HPC question: torques replacement

Hey Valeri - IIRC, midway (and maybe midway2?) use slurm for job scheduling. I don't know how many of your faculty use both your nodes and midway, but maybe consolidating on to a single scheduler would be easier for them? (also, it's been a while ... hi! ? ) Richard -----Original Message----- From: CentOS <centos-bounces at centos.org> On Behalf Of Valeri Galtsev Sent: Friday,

HPC question: torques replacement

2020 Apr 17

HPC question: torques replacement

Dear Experts, I know there are many HPC (high performance computing) experts on this list. I'd like to ask your advise. Almost two decades ago I chose to go with OpenPBS (turned down condor and other alternatives for whatever reason) for clusters and number crunchers I support for the Department at the university. It turned out to be not bad, long lived choice. At some point I smoothly

was, Re: serious problem with torque, is firefox

2015 May 27

was, Re: serious problem with torque, is firefox

Valeri Galtsev wrote: > On Wed, May 27, 2015 9:46 am, m.roth at 5-cent.us wrote: >> Johnny Hughes wrote: >>> On 05/27/2015 09:07 AM, m.roth at 5-cent.us wrote: <snip> >> >> Thanks, Johnny. I *just* posted an apology, that I realized it was an >> EPEL issue.... Talk about an "upgrade disaster"! I think the other admin - >> he's been here

Problem After adding Bricks

2013 May 24

Problem After adding Bricks

Hello, I have run into some performance issues after adding bricks to a 3.3.1 volume. Basically I am seeing very high CPU usage and extremely degraded performance. I started a re-balance but stopped it after a couple days. The logs have a lot of entries for split-brain as well as "Non Blocking entrylks failed for". For some of the directories on the client doing an ls will show multiple

cpuset not affecting real pid placement

2012 Oct 16

cpuset not affecting real pid placement

Hi, At least on 0.10.2 setting a cpuset doesn`t match a real process placement - VM still consumes all available cores. VM config: .snip. <vcpu placement='static' cpuset='0-5,12-17'>12</vcpu> .snip. for cpuset in $(find /cgroup/cpuset/libvirt/qemu/vmid/ -name cpuset.cpus) ; do grep 0-5 $cpuset ; done got: empty responce, e.g. 0-23 in my setup expected: at least

Does libvirt lxc driver support "cpuset" attribute?

2014 Jan 15

Does libvirt lxc driver support "cpuset" attribute?

Dear all I allocate only one vcpu for the container by the following statement, that is, I want to pin the vcpu to physical core "2". <vcpu placement='static' cpuset="2" >1</vcpu> My host has 4 physical cores. Before test, all the 4 cores are idle. After I run 4 processes in the container, I found all the 4 cores in the host are 100% used. That is, the

similar to: Semi-OT: torque, pbs_mom, cpuset, loglevel