Torque PBS系统的安装

帮同学在小集群上安装了Torque PBS系统,简要地记录了一些的安装的过程。但是当时急匆匆的,很多细节没有记录,这里的记录只作为一个很粗略的笔记。 ----------------------------------

NFS

yum -y install nfs-utils
systemctl enable nfs
systemctl restart nfs
showmount -e 192.168.0.115

修改 /etc/fstab:
192.168.0.115:/home/customer /home/customer nfs defaults,nfsvers=3 0 0

挂载

1
2
3
4
cd /home
mkdir customer
mount -a
mount

edit /etc/hosts:

1
2
vim /etc/hosts
# 192.168.0.125 node06

NIS

yum -y install ypserv ypbind yp-tools rpcbind

cal node:
vim /etc/sysconfig/network NISDOMAIN=licluster
vim /etc/rc.local nisdomainname licluster
vim /etc/yp.conf domain licluster server 192.168.0.115
nisdomainname licluster / nisdomainname 192.168.0.115

vim /etc/nsswitch.conf

1
2
3
4
systemctl start ypbind.service
systemctl start rpcbind.service
systemctl enable ypbind.service
systemctl enable rpcbind.service

检查是否启动: ypwhich
测试用户customer信息: id customer
检查同步的文件: yptest

ssh免输密码访问:基于密钥

NTP客户端

1
2
yum -y install ntp ntpdate
ntpdate master

Cluster

主节点上的安装

1
2
3
./configure --prefix=/usr/local/torque --with-default-server=$HOSTNAME
make
make install

pbs_mom, PBS MOM守护进程, 负责监控本机并执行作业,位于所有计算节点上

pbs_sched, PBS调度守护进程,负责调度作业,位于服务节点上

pbs_server, PBS服务守护进程,负责接收作业提交,位于服务节点上

  • 添加系统服务

    1
    2
    3
    4
    cp  contrib/systemd/trqauthd.service  /usr/lib/systemd/system/
    cp contrib/systemd/pbs_server.service /usr/lib/systemd/system/
    cp contrib/systemd/pbs_sched.service /usr/lib/systemd/system/
    cp contrib/systemd/pbs_mom.service /usr/lib/systemd/system/

  • 添加开机启动

    1
    2
    3
    4
    systemctl enable pbs_server
    systemctl enable pbs_sched
    systemctl enable pbs_mom
    systemctl enable trqauthd

  • Configure Torque on headnode

    1
    2
    3
    echo [correct_hostname] > /var/spool/torque/server_name
    echo "/opt/torque/lib" > /etc/ld.so.conf.d/torque.conf
    ldconfig
    adding the information of nodes in /var/spool/torque/server_priv/nodes.
    1
    2
    node01 np=1
    node02 np=1

  • Initialize serverdb:

    1
    2
    ./torque.setup  root
    qterm

启动

1
2
3
4
systemctl start pbs_sched
systemctl start pbs_mom
systemctl start trqauthd
systemctl start pbs_server

计算节点上的安装

在解压缩后的目录/opt/src/torque-6.1.2下运行make packages

1
2
3
4
5
6
cd /home/customer/software/torque-6.1.2/
./torque-package-clients-linux-x86_64.sh --install
./torque-package-devel-linux-x86_64.sh --install
./torque-package-doc-linux-x86_64.sh --install
./torque-package-mom-linux-x86_64.sh --install
./torque-package-server-linux-x86_64.sh --install

计算节点配置

/var/spool/torque 下创建一个文件 server_name
修改 /var/spool/torque/mom_priv/config

1
2
3
4
$pbsserver  master  #服务端主机名
$logevent 255 #日志级别
$usecp master:/home/customer /home/customer #对NFS共享目录采用cp而不是scp复制文件
$spool_as_final_name true
1
2
systemctl enable pbs_mom.service
systemctl start pbs_mom.service

防火墙

获取 firewalld 状态 firewall-cmd --state
列出全部启用的区域的特性 firewall-cmd --list-all-zones
开放端口(开放后需要要重启防火墙才生效)

1
2
3
4
5
6
firewall-cmd --zone=public --add-port=2049/tcp --permanent
firewall-cmd --zone=public --add-port=2049/udp --permanent
firewall-cmd --zone=public --add-port=15003/tcp --permanent
firewall-cmd --zone=public --add-port=15003/udp --permanent
firewall-cmd --zone=public --add-port=15002/tcp --permanent
firewall-cmd --zone=public --add-port=15002/udp --permanent

重启防火墙 firewall-cmd --reload

change the uid


Torque PBS系统的安装
https://quantum-cyborg.github.io/2021/04/03/CS/cluster/Torque的安装/
作者
碳基机器
发布于
2021年4月3日
许可协议