HPC cluster performance metrics

2007-05-17T20:22:59Z
I am a HPC developer looking for input from system administrators of HPC clusters. I would like to hear opinions and ideas about what kind of HPC network performance metrics would be useful in managing and debugging the HPC cluster. For example, are things like "the maximum utilization of this link (cable) during the last 24 hours" useful? What about things like "the set of links with the top 10% of activity during the last hour"? I'd like to get a handle on some specific kinds of measurements that HPC customers would like to see for the HPC interconnect, as opposed to the management LAN or control network of a cluster.