Displaying 20 results from an estimated 3000 matches similar to: "Semi-OT: torque, pbs_mom, cpuset, loglevel"
2008 Sep 30
1
Broken pipe, x86_64 CentOS 5.2
Hi All,
I have a problem with torque (openPBS) on x86_64 CentOS 5.2. Just to add there's
no problem on a 32bit CentOS 5.2 or 64bit Ubuntu 8.04.
The problem is that pbs_mom's child quits without giving any error logs.
[root at frodo9 torque-2.3.3]# strace -f pbs_mom
.
.
.
bind(6, {sa_family=AF_INET, sin_port=htons(15002),
sin_addr=inet_addr("0.0.0.0")}, 16) = 0
time(NULL)
2012 Jun 26
0
abrtd problems
We're trying to debug a problem: a server that reboots spontaneously when
this user's large, multithreaded program's running. Sometimes it won't do
it for hours, other times it's literally every 10 min. I've run iostat,
netstat, have top running, tail -f /var/log/dmesg, *nada*. Nothing out of
the ordinary.
One thing that's constant: as the system's coming back up,
2008 Apr 26
1
Xen and Torque
Dear Xen users.
Have anyone tried to integrate Xen with Torque resource management system?
Could you please help me with an advice for a system I''m developing that
relies on torque.
Let me describe the system first.
The part of the system that talks with torque should request a certain
amount on nodes of a cluster and launch there a virtual machine instance
(one vm instance per host).
2008 Jul 07
1
SIGPIPE in assorted apps after "yum update"
Hello,
I have several systems which I recently updated with
yum -y update
to all the latest packages. These systems use yum-priorities and use
the CentOS (priority 1) EPEL (priority 5) and rpmforge (priority 10)
repositories. After the updates, dhcpd stopped working with a SIGPIPE
error which occurs shortly after it attempts to fork into the
background. I worked around that problem by building
2015 May 27
0
serious problem with torque
Mark, You might really want to compile torque from source (into an RPM
if you'd like) and redistribute that. Every version is a little wonky
and those of us that use(d) it often will poke around until we find a
version / patch-set that makes us happy and stick with that for a bit.
It's not an exact science and newer / higher versions are not always better.
As for the downgrade comment:
2015 May 27
1
serious problem with torque
On Wed, May 27, 2015 10:55 am, Zachary Giles wrote:
> Mark, You might really want to compile torque from source (into an RPM
> if you'd like) and redistribute that. Every version is a little wonky
> and those of us that use(d) it often will poke around until we find a
> version / patch-set that makes us happy and stick with that for a bit.
> It's not an exact science and
2015 May 27
0
serious problem with torque
On 05/27/2015 09:07 AM, m.roth at 5-cent.us wrote:
> Hi, folks,
>
> The other admin updated torque without testing it on one machine, and
> we had Issues. The first I knew was when a user reported qstat
> returning
> socket_connect_unix failed: 15137
> socket_connect_unix failed: 15137
> socket_connect_unix failed: 15137
> qstat: cannot connect to server (null)
2015 May 27
0
serious problem with torque
On Wed, May 27, 2015 9:46 am, m.roth at 5-cent.us wrote:
> Johnny Hughes wrote:
>> On 05/27/2015 09:07 AM, m.roth at 5-cent.us wrote:
>>> Hi, folks,
>>>
>>> The other admin updated torque without testing it on one machine,
>>> and
>>> we had Issues. The first I knew was when a user reported qstat
>>> returning
>>>
2015 Feb 19
0
Anyone using torque/pbs/munge?
CentOS 6.6
I've got two servers, server1 and hbs (honkin' big server). Both are
running munge, and torque... *separately*. My problem is that I've got
users who want to be able to submit from server1 to hbs. I see that munged
can be pointed to an alternate keyfile... but is there any way to tell
qsub what to use?
(And yes, I got on the torque users' list, and I'm trying
2015 May 27
2
serious problem with torque
Johnny Hughes wrote:
> On 05/27/2015 09:07 AM, m.roth at 5-cent.us wrote:
>> Hi, folks,
>>
>> The other admin updated torque without testing it on one machine, and
>> we had Issues. The first I knew was when a user reported qstat
>> returning
>> socket_connect_unix failed: 15137
>> socket_connect_unix failed: 15137
>> socket_connect_unix
2019 Aug 29
0
[libvirtd] qemu_process: reset CPU affinity to all enabled CPUs, when runs in custom cpuset
Hello All,
Since 4.5.0-23.el7 version (Red Hat 7.7), when I launch pinned VM,
libvirtd reset CPU affinity to all enabled in host CPUs, if it runs in
custom cpuset.
I can't reproduce this behavior with 4.5.0-10.el7_6.12 with the same
kernel version (Red Hat 7.7).
Libvirt runs in a custom cpuset 'libvirt', where the number of
available cpus is restricted to 0,2,4,6,8.
And this
2008 Oct 22
1
torque/psb & snow library
Hello all;
I'm trying to execute parallel jobs trough library snow on a cluster built
through torque/PSB. I'm succesfully obtaining the cluster with:
>system("cat $PBS_NODEFILE > cluster.txt")
>mycluster <- scan(file="cluster.txt",what="character")
>cl <- makeSOCKcluster(mycluster)
The only problem, at the moment, is that if I use
2007 Dec 29
2
OpenMPI not compiled with Torque support
The OpenMPI package that ships with CentOS 5.1 does not seem to be
compiled with torque support. It does, however, seem to be compiled
with gridengine and slurm support. Would it be possible to get this
changed?
2015 May 27
5
serious problem with torque
Hi, folks,
The other admin updated torque without testing it on one machine, and
we had Issues. The first I knew was when a user reported qstat
returning
socket_connect_unix failed: 15137
socket_connect_unix failed: 15137
socket_connect_unix failed: 15137
qstat: cannot connect to server (null) (errno=15137) could not connect to
trqauthd
Attempting to restart the pbs_server did the same.
2020 Apr 17
0
HPC question: torques replacement
Hey Valeri -
IIRC, midway (and maybe midway2?) use slurm for job scheduling. I don't know how many of your faculty use both your nodes and midway, but maybe consolidating on to a single scheduler would be easier for them?
(also, it's been a while ... hi! ? )
Richard
-----Original Message-----
From: CentOS <centos-bounces at centos.org> On Behalf Of Valeri Galtsev
Sent: Friday,
2020 Apr 17
4
HPC question: torques replacement
Dear Experts,
I know there are many HPC (high performance computing) experts on this
list. I'd like to ask your advise.
Almost two decades ago I chose to go with OpenPBS (turned down condor
and other alternatives for whatever reason) for clusters and number
crunchers I support for the Department at the university. It turned out
to be not bad, long lived choice. At some point I smoothly
2015 May 27
1
was, Re: serious problem with torque, is firefox
Valeri Galtsev wrote:
> On Wed, May 27, 2015 9:46 am, m.roth at 5-cent.us wrote:
>> Johnny Hughes wrote:
>>> On 05/27/2015 09:07 AM, m.roth at 5-cent.us wrote:
<snip>
>>
>> Thanks, Johnny. I *just* posted an apology, that I realized it was an
>> EPEL issue.... Talk about an "upgrade disaster"! I think the other admin -
>> he's been here
2013 May 24
0
Problem After adding Bricks
Hello, I have run into some performance issues after adding bricks to
a 3.3.1 volume. Basically I am seeing very high CPU usage and
extremely degraded performance. I started a re-balance but stopped it
after a couple days. The logs have a lot of entries for split-brain as
well as "Non Blocking entrylks failed for". For some of the
directories on the client doing an ls will show multiple
2012 Oct 16
1
cpuset not affecting real pid placement
Hi,
At least on 0.10.2 setting a cpuset doesn`t match a real process
placement - VM still consumes all available cores.
VM config:
.snip.
<vcpu placement='static' cpuset='0-5,12-17'>12</vcpu>
.snip.
for cpuset in $(find /cgroup/cpuset/libvirt/qemu/vmid/ -name
cpuset.cpus) ; do grep 0-5 $cpuset ; done
got: empty responce, e.g. 0-23 in my setup
expected: at least
2014 Jan 15
2
Does libvirt lxc driver support "cpuset" attribute?
Dear all
I allocate only one vcpu for the container by the following statement, that is, I want to pin the vcpu to physical core "2".
<vcpu placement='static' cpuset="2" >1</vcpu>
My host has 4 physical cores. Before test, all the 4 cores are idle. After I run 4 processes in the container, I found all the 4 cores in the host are 100% used. That is, the