================================================================================== Scaling results of SMDEP on two different architectures using different compilers. P corresponds to the ammount of SMDEP that has been parallelized according to Amdahl's law. 28 sept 2008. ================================================================================== 1000 particles HARPERTOWN: Intel(R) Xeon(R) CPU E5430 @ 2.66GHz icc 9.1 -fast serial => 127 s 1 cpu => 280 s 2 cpu => 146 s, P=0.95714 3 cpu => 99 s, P=0.96964 4 cpu => 76 s, P=0.97143 5 cpu => 62 s, P=0.97321 6 cpu => 53 s, P=0.97286 7 cpu => 46 s, P=0.97500 8 cpu => 42 s, P=0.97143 P=0.970102 -------------------------------------- 1000 particles HARPERTOWN: Intel(R) Xeon(R) CPU E5430 @ 2.66GHz gcc 4.1.2 -O3 -funroll-loops -march=core2 serial => 599 s 1 cpu => 676 s 2 cpu => 360 s, P=0.93491 4 cpu => 198 s, P=0.94280 8 cpu => 118 s, P=0.94336 P=0.94036 -------------------------------------- 1000 particles HUYGENS: POWER6 (architected), altivec supported @ 4.704GHz gcc 4.2.1 -O3 -funroll-loops -mpowerpc64 serial => 1000 s 1 cpu => 1026 s 2 cpu => 557 s, P=0.91423 4 cpu => 308 s, P=0.93307 8 cpu => 185 s, P=0.93679 16 cpu => 128 s, P=0.93359 32 cpu => 97 s, P=0.93467 64 cpu => 90 s, P=0.92676 P=0.92985 -------------------------------------- 1000 particles HUYGENS: POWER6 (architected), altivec supported @ 4.704GHz xlc_r 9.0 -qarch=auto -qcache=auto -qtune=auto -O5 -qhot -qnostrict -qmaxmem=-1 serial => 546 s 1 cpu => 580 s 2 cpu => 295 s, P=0.98276 3 cpu => 197 s, P=0.99052 4 cpu => 151 s, P=0.98621 5 cpu => 122 s, P=0.98707 6 cpu => 103 s, P=0.98690 7 cpu => 89 s, P=0.96749 8 cpu => 80 s, P=0.98522 9 cpu => 84 s, P=0.96207 10 cpu => 66 s, P=0.98467 11 cpu => 61 s, P=0.98431 12 cpu => 58 s, P=0.98182 13 cpu => 62 s, P=0.96753 14 cpu => 58 s, P=0.96923 15 cpu => 48 s, P=0.98276 16 cpu => 46 s, P=0.98207 17 cpu => 43 s, P=0.98373 18 cpu => 46 s, P=0.97485 19 cpu => 45 s, P=0.97366 20 cpu => 39 s, P=0.98185 21 cpu => 38 s, P=0.98121 22 cpu => 37 s, P=0.98079 23 cpu => 37 s, P=0.97876 24 cpu => 35 s, P=0.98051 28 cpu => 33 s, P=0.97803 32 cpu => 33 s, P=0.97353 64 cpu => 35 s, P=0.95457 P=0.97854