1) slurmでgrandを開けてください。
2) prod32ジョブで以下のようなメッセージでジョブ(2つ)がハングしました。
t1n031が変なのでしょうか?
[tishika@blue-fe 20180511]$ grep Broke slurm-11*
slurm-1175.out:[t1n031:14761] [[1393,0],0]->[[1393,0],8] mca_oob_tcp_msg_send_bytes: write failed: Broken pipe (32) [sd = 17]
slurm-1175.out:[t1n031:14761] [[1393,0],0]->[[1393,0],4] mca_oob_tcp_msg_send_bytes: write failed: Broken pipe (32) [sd = 18]
slurm-1175.out:[t1n031:14761] [[1393,0],0]->[[1393,0],2] mca_oob_tcp_msg_send_bytes: write failed: Broken pipe (32) [sd = 23]
slurm-1175.out:[t1n031:14761] [[1393,0],0]->[[1393,0],1] mca_oob_tcp_msg_send_bytes: write failed: Broken pipe (32) [sd = 31]
slurm-1175.out:[t1n031:14761] [[1393,0],0]->[[1393,0],8] mca_oob_tcp_msg_send_bytes: write failed: Broken pipe (32) [sd = 17]
slurm-1175.out:[t1n031:14761] [[1393,0],0]->[[1393,0],16] mca_oob_tcp_msg_send_bytes: write failed: Broken pipe (32) [sd = 19]
slurm-1178.out:[t1n031:23694] [[24662,0],0]->[[24662,0],1] mca_oob_tcp_msg_send_bytes: write failed: Broken pipe (32) [sd = 34]
slurm-1178.out:[t1n031:23694] [[24662,0],0]->[[24662,0],8] mca_oob_tcp_msg_send_bytes: write failed: Broken pipe (32) [sd = 14]
slurm-1178.out:[t1n031:23694] [[24662,0],0]->[[24662,0],4] mca_oob_tcp_msg_send_bytes: write failed: Broken pipe (32) [sd = 18]
slurm-1178.out:[t1n031:23694] [[24662,0],0]->[[24662,0],2] mca_oob_tcp_msg_send_bytes: write failed: Broken pipe (32) [sd = 20]
石川先生
エクサ平原です
ご連絡遅れました。
申し訳ありません。
1)grand 開けました
2) 開発にエスカレーションします
よろしくお願いします。