-1

I want to evaluate the performance of a specific application on two different CPU servers without actually running on both server. I can obtain scores for both machines under different workloads using some common benchmarks (such as SPECint), but I am more concerned about the performance of the actual application. Is there a way to establish a connection between the performance of the actual application and certain workloads using metrics collected from the PMU (or other information)?

The model can be described as follows: assuming the two machines are M1 and M2, and there are N workloads {T1,...,Tn} in the benchmark suit, with the actual application being A. For each workloads Tx, there are various metrics and according throughput for running on both M1 and M2. For application A, there are metrics and throughput data for running on machine M1. Can we estimate the throughput of application A when running on machine M2?

.----.---------------.------------------.---------------.------------------.
|    | metrics on M1 | throughput on M1 | metrics on M2 | throughput on M2 |
............................................................................
| T1 | available     | available        | available     | available        |
| T2 | available     | available        | available     | available        |
  ...
| Tx | available     | available        | available     | available        |
| A  | available     | available        | not available | target           |
.----.---------------.------------------.---------------.------------------.

I found that this paper provides an idea based on recommendation system, but it requires a variety of machine types to work effectively. Is there a method that can be effective for small scenarios involving two machines?

1
  • I don't think this is very possible for most workloads unless you're willing to accept a pretty wide error margin, or if your microbenchmark profiling is detailed enough to let you build a cycle-accurate model of the machine to simulate your workload on. Some HPC workloads might more or less bottleneck on memory bandwidth or on FLOPS, in which case you just need to know those numbers. Commented 7 hours ago

0