For awhile now, we’ve been unable to run using the “-nolocal” flag to the mpirun command in order to execute our simulation across the slave nodes without also treating the master as a slave. When I would run with:

$ mpirun -nolocal -np 7 ./simulation 

I would see an error like the following:

p0_10460: p4_error: Could not gethostbyname for host hostname; may be. invalid name

Being lazy, I spent quite a bit of time trawling the intertubes looking for a canned solution and found many dead-ends talking about the resolver, nsswitch.com, etc/hosts, etc. So, I finally sighed and reluctantly pulled out my programming hat and wrote the following:

#include <netdb.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <malloc.h>
int main(int argc, char *argv[])
{
  char *user_host = calloc(32, sizeof(char));
  int error = gethostname(user_host,32);
  if (error != 0) {
    printf("gethostname failed with error %d\n", error);
    return error;
  } else
    printf("user_host = %s\n", user_host);

  struct addrinfo *res0, hint;
  memset(&hint, 0, sizeof(hint));
  hint.ai_family = PF_UNSPEC;
  hint.ai_flags = AI_CANONNAME;
  error = getaddrinfo(user_host, NULL, &hint, &res0);
  if (error != 0) {
    printf("getaddrinfo failed with error %d.\n", error);
  }

  struct hostent *he;
  he = gethostbyname(user_host);
  if (!he) 
    printf("gethostbyname(gethostname()) failed\n");
  else
    printf("gethostbyname(gethostname()) = %s\n", he->h_name);

  free(user_host);
  user_host = "hostname";
  he = gethostbyname(user_host);
  if (!he) 
    printf("hostname(/*hardcoded*/ \"hostname\") failed\n");
  else
    printf("hostname(/*hardcoded*/ \"hostname\") = %s\n", he->h_name);
  return 0;
}

And that showed me that the string returned by gethostname() was flawed in some way... some insidiously evil and occult way that doesn't show up on the terminal screen. Redirecting the output to a file and editing that file with vi showed me that there was a SPACE on the end of the string returned by gethostname(). Ugh.

Other symptoms you might see if you have cruft in your /etc/hostname entry:

sudo: unable to resolve host
hostname => hostname
hostname -f => Unknown host
hostname -a => Unknown host