Thunderhead Engineering Forum

Please login or register.

Login with username, password and session length
Advanced search  

News:

Forum moved to https://forum.thunderheadeng.com

Pages: [1] 2

Author Topic: RUN Cluster  (Read 11852 times)

bvsrl

  • Member
  • **
  • Posts: 19
    • View Profile
RUN Cluster
« on: September 30, 2015, 06:22:03 am »

This is my configuration for RUN Cluster Test

PC1 - Computer name AABB1122
Win 8.1 64 bit
User account XXZZ with same password PC2
FDS 6.2.0
SMV 6.2.2
PyroSim 2015 2 0604 x64 with license active
MPI for Win 8
PyrosimCluster 2014-4-1105-x64 installed

Pyrosim file is in shared folder C:\abc\abc.psm

PC2 - Computer name TTWW8899
Win 7 64 bit
User account XXZZ (same PC1) with same password PC1
FDS 6.2.0
SMV 6.2.2
PyroSim 2015 2 0604 x64 installation only
MPI MPICH2 1.4.1p1 x86-64
PyrosimCluster 2014-4-1105-x64 installed

Question:

1) when  start the Run Cluster, it open a white window with the words
   Starting fds: fds.exe....while time elapsed proceeds, the progress Time remains at Zero..
   I can check because the calculation does not start ?
 
2) The MPI version must be the same version for all PC ?

3) We followed the guides available, is there a configuration model type?

   Any suggestion appreciate.


« Last Edit: September 30, 2015, 08:51:51 am by bvsrl »
Logged

Charlie Thornton

  • Thunderhead
  • *****
  • Posts: 851
    • View Profile
Re: RUN Cluster
« Reply #1 on: October 01, 2015, 08:43:35 am »

1. Unfortunately, some MPI errors don't produce any output during the FDS run. If other approaches don't work, the usual approach it to invoke various combinations of mpiexec.exe from the command line and look for results.

2. The version of PyroSim you are using is the last version to use Argonne's MPICH2 library for MPI. To work, it needs to find the version of MPI that PyroSim installs running on the non-default port where PyroSim installs it. If you think the MPI installed by PyroSim may have been removed or otherwise wrecked by other MPI installers, you can use the cluster installer for your version - which you appear to have done. I guess you could run it again if it's been a while.

3. Not really. The most basic procedure is to install PyroSim on two computers within the same domain and then use the Run Cluster option. You don't even need to cluster installer.

I'll attach my list of commands helpful for troubleshooting - just in case you want to really dig in with some console debugging. This might produce a more satisfying lead than <no output>.
Logged

bvsrl

  • Member
  • **
  • Posts: 19
    • View Profile
Re: RUN Cluster
« Reply #2 on: October 02, 2015, 06:54:05 am »

thank you for reply,

after the suggest test, if run FDS Cluster, the window message wrote: see txt file attachement

 note: in the Cluster FDS PARAMETERS windows, is present only the PC machine where Pyrosim program is installed but non activated. is correct ?
Logged

Charlie Thornton

  • Thunderhead
  • *****
  • Posts: 851
    • View Profile
Re: RUN Cluster
« Reply #3 on: October 05, 2015, 10:55:28 am »

PyroSim's license does not need to be activated on the remote machine. In this case, the installation process is used to put on the SMPD service and a firewall exception.

It looks like the SMPD service on tow7prosbv106 is either unreachable or not responding. You can test this using the first two commands shown in the PDF I sent you. In particular, running:

smpd –port 52700 –status tow7prosbv106

from a console window, should show if SMPD is properly installed and running on that machine. Smpd.exe is located in one of the fds* folders of your PyroSim install folder.
Logged

bvsrl

  • Member
  • **
  • Posts: 19
    • View Profile
Re: RUN Cluster
« Reply #4 on: October 08, 2015, 08:51:31 am »

Update 001
thank for the support, after some progress this is the general status:

with the NetStalker SterJob program is possible monitoring process and port.
This is status service after complete check for both pc machine W8 and W7pro :
smpd service - listening status -  on port 52700
port validate 52700 running

C:\...\Pyrosim fds Cluster node\fds 6.1.2\ test_mpi
Hello world   running on

mpiexec running only in pyrosim main machine Win 8 - file not present in PC win 7
hydra_service running only in pyrosim main machine Win 8
After check HydraSercice with command suggest in this web page
https://software.intel.com/en-us/node/528873
I get this message:
Hydra_service running on PC win 8
C:\programmi\pyrosim 2015\hydra_service -status utente locale
...1hydra\utils\sock\sock.c (224):
unable to get host address for utente (11001)
no hydra service running on utente

and when running pyrosim fds cluster command in Pyrosim software
I get this message:

Starting FDS: fds.exe...

Credentials for utente locale\mpi_user rejected connecting to tow8sbv50
read from stdin failed, error 9.
[mpiexec@TOW8SBV50] ..\hydra\tools\demux\demux_select.c (78): select error (No such file or directory)
[mpiexec@TOW8SBV50] ..\hydra\pm\pmiserv\pmiserv_pmci.c (501): error waiting for event
[mpiexec@TOW8SBV50] ..\hydra\ui\mpich\mpiexec.c (1059): process manager error waiting for completion

Question:
-please confirm the port number 52700 for version 64bit.
-in some online document the port is 52500
-if is possible, confirm the process for mpiexec - register
-if is correct type the password
any help is appreciate.







Logged

Charlie Thornton

  • Thunderhead
  • *****
  • Posts: 851
    • View Profile
Re: RUN Cluster
« Reply #5 on: October 08, 2015, 09:18:05 am »

Quote
Credentials for utente locale\mpi_user rejected connecting to tow8sbv50

It looks like you are having an authentication problem. The user name "mpi_user" is the example used in the troubleshooting section of the manual to  instruct MPI to authenticate using an alternate account. If that account doesn't exist on all machines where MPI is used and have the same password on every machine, you will receive this authentication error.

Unless your normal account does not exist on all machines of your cluster or your normal account is not password protected, I recommend you repeat the mpiexec remove/register/validate steps using your normal account.
Logged

bvsrl

  • Member
  • **
  • Posts: 19
    • View Profile
Re: RUN Cluster
« Reply #6 on: October 09, 2015, 09:31:06 am »

update 02

after repeat the mpiexec remove/register/validate steps
we set the shortcut target with ..\Pyrosim 2015\Pyrosim.exe -j-Ddebugcluster
we got the log file with fds cluster running: see attached file

No error report - for test, the simulation parameter is setting with only 10 second.
while time runs, the time simulation remain at zero, The window does not close,
and the time of the calculation continues, infinitely
Smokeview does not run.
Questions:
- the calculation is in progress ?
- is correct the elapsed time is forever set at zero ?
- the fds.exe process run only on Main PC or also on the Parallel PC ?
Logged

Charlie Thornton

  • Thunderhead
  • *****
  • Posts: 851
    • View Profile
Re: RUN Cluster
« Reply #7 on: October 12, 2015, 10:12:41 am »

One other user reported this error through support email. For whatever reason, Intel's HYDRA process scheduler is just hanging and not producing output - error or otherwise. I haven't been able to reproduce the issue in the office. Unfortunately, the other user needed to roll back to a previous version of PyroSim to get their cluster working. My wild guess is that the problem is some sort of install issue or an interaction with other services on one of the cluster PCs.

To your specific questions: I do not think there calculation is in progress. Since the job is hanging, I don't think the elapsed time will ever change from zero. For cluster execution, fds.exe is run on each cluster node, once for each mesh.

Logged

bvsrl

  • Member
  • **
  • Posts: 19
    • View Profile
Re: RUN Cluster
« Reply #8 on: October 14, 2015, 08:00:44 am »

to start the fds run cluster in pyrosim, without any error, you need to stop the Smpd process on port 52700.
we stopped the process smpd on port 52700, and the command run with any error message.
after following the procedure validation process mpiexec, and validate the 52700 port for mpiexec,
mpiexec -validate -port 52700 - Success
the command fds cluster run in pyrosim run with no error, but the port where mpiexec remain 8679.
this test on Win 8 pc

Note: after stop the smpd service on port 52700 - the service restart  when the system is re-  booted
voluntarily, the smpd process has not been excluded from running in boot

« Last Edit: October 15, 2015, 02:19:48 am by bvsrl »
Logged

bvsrl

  • Member
  • **
  • Posts: 19
    • View Profile
RUN Cluster - WIN 8 - Configuration OK
« Reply #9 on: December 01, 2015, 07:59:04 am »

after a lot of test with 2 PC ( win 8 to win 8 )
the Cluster calculation in Pyrosim 2015 Run Perfect

test with Win 8 to Win 7 pro on Standbay

see attached files for Configuration Data
Logged

bvsrl

  • Member
  • **
  • Posts: 19
    • View Profile
Re: RUN Cluster - win 8 to win 8
« Reply #10 on: December 02, 2015, 02:04:45 am »

file udp tcp process service and port


Logged

bvsrl

  • Member
  • **
  • Posts: 19
    • View Profile
Re: RUN Cluster - log file
« Reply #11 on: December 02, 2015, 02:07:59 am »

log file cluster simulation
Logged

bvsrl

  • Member
  • **
  • Posts: 19
    • View Profile
Re: RUN Cluster - New Problem with Cluster Calculation
« Reply #12 on: January 12, 2016, 10:17:21 am »

New Problem with Cluster Calculation

after updating  Pyrosim  - New Version 2015.4   -- 2015-4-1214-x64-en.msi
the Cluster Calculation fails to start
see attached log file
and file with update operation

before update the PC ( pyrosim file only) the cluster run perfectly

error message:
Starting FDS: fds.exe...

Error connecting to the Service[mpiexec@TOW8SBV50] ..\hydra\utils\sock\sock.c (270): unable to connect from "TOW8SBV50" to "tow8sbv54" (No error)

read from stdin failed, error 6.
[mpiexec@TOW8SBV50] ..\hydra\tools\demux\demux_select.c (78): select error (No such file or directory)
[mpiexec@TOW8SBV50] ..\hydra\pm\pmiserv\pmiserv_pmci.c (500): error waiting for event
[mpiexec@TOW8SBV50] ..\hydra\ui\mpich\mpiexec.c (1119): process manager error waiting for completion
.....



 
Logged

Charlie Thornton

  • Thunderhead
  • *****
  • Posts: 851
    • View Profile
Re: RUN Cluster
« Reply #13 on: January 12, 2016, 11:39:41 am »

All machines need the same version of PyroSim to be installed to ensure that the mpi libraries and FDS version is in sync. Did all machines receive the PyroSim update to 2015.4?
Logged

bvsrl

  • Member
  • **
  • Posts: 19
    • View Profile
Re: RUN Cluster
« Reply #14 on: January 13, 2016, 07:43:23 am »

we checked all PC ( PC1-SBV50 and PC2-SBV54)
pyrosim It was updated  ( PC1-SBV50 and PC2-SBV54) with  New Version 2015.4   -- 2015-4-1214-x64-en.msi
folder \\pyrosim 2015\fds  include the files:

Hydra_service    version 5.1.1.0 5.1 update 1  ( previous version pyrosim 2015.3 file Hydra_service 5.0.3.0 )
mpiexec            version 5.1.1.0 5.1 update 1  ( previous version pyrosim 2015.3 file mpiexec 5.0.3.0 )
pmi_proxy         version 5.1.1.0 5.1 update 1  ( previous version pyrosim 2015.3 file pmi_proxy 5.0.3.0 )

with pyrosim active,  the files    - Hydra_service     - mpiexec    -pmi_proxy  run

This is ERROR LOG:
Error connecting to the Service[mpiexec@TOW8SBV50] ..\hydra\utils\sock\sock.c (270): unable to connect from "TOW8SBV50" to "tow8sbv54" (No error)

read from stdin failed, error 6.
[mpiexec@TOW8SBV50] ..\hydra\tools\demux\demux_select.c (78): select error (No such file or directory)
[mpiexec@TOW8SBV50] ..\hydra\pm\pmiserv\pmiserv_pmci.c (500): error waiting for event
[mpiexec@TOW8SBV50] ..\hydra\ui\mpich\mpiexec.c (1119): process manager error waiting for completion
.....

what is  the problem?


Logged
Pages: [1] 2