Question: What should I do if I get this error running distributed jobs across nodes on my Windows HPC Cluster ? Error Fatal error in MPI_Comm_create: Other MPI error, error stack: MPI_Comm_create(MPI_COMM_WORLD, group=0x88000001, new_comm=0x000000E071458E90) failed unable to connect to on port #####, no endpoint matches the netmask Jobs requiring a single node run without issues.
Tagged: 19.2, HPC Pack, HPC/Parallel, Installation/Licensing/Systems, IP, Mechanical - SYS, mpi, N/A, netmask, Windows HPC
-
-
January 25, 2023 at 7:28 am
FAQ
ParticipantAnswer: Either the bind order of interfaces or an incorrectly set MPI NETMASK is causing the issue. A typical error may look like this unable to connect to 10.0.0.12 node12 on port 52935, no endpoint matches the netmask 10.0.1.0/255.255.255.0 Note the difference in subnets Please have you cluster / network administrator review the suggestions below a) Please check how many network interfaces do the compute nodes have. If multiple interfaces then please make sure that the bind order is set correctly. b) If there is only one interface and still seeing this error, then the MPI NETMASK may need to be configured correctly for this example it will need to be set to the 10.0.0.* subnet so the command will look like cluscfg setenvs CCP_MPI_NETMASK=10.0.0.0/255.255.255.0 Additional information If using RSM to submit job to the cluster then the RSM job log may show errors like the example below Running Solver : C:Program FilesANSYS Incv192ansysbinwinx64ANSYS192.exe -b nolist -s noread -p ansys -i remote.dat -o solve.out -dis -mpi msmpi -np 12 -dir “C:/scratch/n3r39eoc.i2n” job aborted: [ranks] message [0] fatal error Fatal error in MPI_Comm_create: Other MPI error, error stack: MPI_Comm_create(MPI_COMM_WORLD, group=0x88000001, new_comm=0x000000E071458E90) failed [ch3:sock] rank 0 unable to connect to rank 8 using business card
unable to connect to 10.0.0.12 node12 on port 52935, no endpoint matches the netmask 10.0.1.0/255.255.255.0 [1-11] terminated —- error analysis —– [0] on node01 mpi has detected a fatal error and aborted C:Program FilesANSYS Incv192ANSYSbinwinx64ANSYS.EXE —- error analysis —– . . . Command Exit Code: -4 ClusterJobs Exiting with code: -4 Individual Command Exit Codes are: [-4]
-

Introducing Ansys Electronics Desktop on Ansys Cloud
The Watch & Learn video article provides an overview of cloud computing from Electronics Desktop and details the product licenses and subscriptions to ANSYS Cloud Service that are...

How to Create a Reflector for a Center High-Mounted Stop Lamp (CHMSL)
This video article demonstrates how to create a reflector for a center high-mounted stop lamp. Optical Part design in Ansys SPEOS enables the design and validation of multiple...

Introducing the GEKO Turbulence Model in Ansys Fluent
The GEKO (GEneralized K-Omega) turbulence model offers a flexible, robust, general-purpose approach to RANS turbulence modeling. Introducing 2 videos: Part 1 provides background information on the model and a...

Postprocessing on Ansys EnSight
This video demonstrates exporting data from Fluent in EnSight Case Gold format, and it reviews the basic postprocessing capabilities of EnSight.
- When I am trying to launch Fluent, the GUI is stuck at this message. Host spawning Node 0 on machine “abcd-pc” (win64) There is no error. Same problem in serial mode I am not connected to VPN.
- Unexpected error: The following required addins could not be loaded: Ans.SceneGraphChart.scencegraphaddin. The software will exit
- Ansys Licensing: Managing Activations
- FLEXnet Licensing Error -96: Error getting status: License server machine is down or not responding. (-96,7:11001 “WinSock: Host not found (HOST_NOT_FOUND)”) ansyslmd: The desired vendor daemon is down. (-97,121)
- Installing ANSYS License Manager on Windows
- Unable to start the Geometry or Mechanical Editor (Linux)
- 2019R1 Workbench Fluent design point update (foreground in Solution and foreground in Parameter Set) stuck at end of 1st DP and doesn’t proceed to next DP. Fluent processes could not be terminated after iterations and cas/dat files are written in 1st DP.
© 2023 Copyright ANSYS, Inc. All rights reserved.