V poměru čas:výkon mi nejlíp vyšlo 4 WU naráz a vypnuté HT - 1418,13 iter/s. Když bych počítal 8 WU což by bylo efektivnější ale delší, nejsem si jistý s časem, je to hodně blízko konce turnaje a raději odevzdám 4WU jistě a 8 skoro jistě, než 8 skoro jistě. Nejvýkonnější je samozřejmě 1 WU na 1 jádro ale to tu nedá použít. Každopádně počítat tento projekt nějak dlouhodoběji, žádné paralelizování se nevyplatí, minimálně na AMD ne a taky HT v tomto případě ničemu nepomůžou, aplikace potřebuje AVX jednotku a ta je jen jedna na dvě vlákna.Prime95 64-bit version 29.2, RdtscTiming=1
FFTlen=1728K all-complex, Type=3, Arch=4, Pass1=384, Pass2=4608, clm=4 (32 cores, 1 worker): 1.36 ms. Throughput: 736.88 iter/sec.
FFTlen=1728K all-complex, Type=3, Arch=4, Pass1=384, Pass2=4608, clm=4 (32 cores, 2 workers): 1.78, 1.85 ms. Throughput: 1103.42 iter/sec.
FFTlen=1728K all-complex, Type=3, Arch=4, Pass1=384, Pass2=4608, clm=4 (32 cores, 4 workers): 2.87, 2.76, 2.87, 2.78 ms. Throughput: 1418.13 iter/sec.
FFTlen=1728K all-complex, Type=3, Arch=4, Pass1=384, Pass2=4608, clm=4 (32 cores, 6 workers): 4.37, 4.94, 4.39, 4.64, 3.93, 3.65 ms. Throughput: 1403.04 iter/sec.
FFTlen=1728K all-complex, Type=3, Arch=4, Pass1=384, Pass2=4608, clm=4 (32 cores, 8 workers): 5.22, 5.66, 5.20, 5.29, 5.62, 5.59, 5.33, 5.23 ms. Throughput: 1485.24 iter/sec.
FFTlen=1728K all-complex, Type=3, Arch=4, Pass1=384, Pass2=4608, clm=4 (32 cores, 12 workers): 10.27, 10.59, 10.65, 10.82, 7.04, 7.65, 7.68, 7.16, 7.09, 7.52, 7.35, 6.84 ms. Throughput: 1476.96 iter/sec.
FFTlen=1728K all-complex, Type=3, Arch=4, Pass1=384, Pass2=4608, clm=4 (32 cores, 16 workers): 10.45, 10.21, 11.07, 10.99, 10.08, 10.18, 10.17, 10.07, 10.21, 10.62, 10.25, 10.27, 10.13, 10.45, 10.05, 10.08 ms. Throughput: 1550.08 iter/sec.
FFTlen=1728K all-complex, Type=3, Arch=4, Pass1=384, Pass2=4608, clm=4 (32 cores, 20 workers): 20.19, 19.45, 19.72, 19.50, 19.99, 21.15, 20.85, 21.78, 11.04, 11.70, 12.59, 12.99, 10.33, 10.65, 10.96, 10.44, 11.77, 11.05, 12.71, 12.71 ms. Throughput: 1437.00 iter/sec.
[Fri Sep 21 10:19:54 2018]
FFTlen=1728K all-complex, Type=3, Arch=4, Pass1=384, Pass2=4608, clm=4 (32 cores, 24 workers): 20.78, 19.53, 19.43, 19.55, 19.90, 19.80, 22.77, 19.95, 19.53, 19.65, 19.46, 19.51, 19.53, 19.53, 19.52, 19.52, 10.19, 10.28, 10.56, 10.45, 10.32, 10.33, 10.33, 10.55 ms. Throughput: 1577.40 iter/sec.
FFTlen=1728K all-complex, Type=3, Arch=4, Pass1=384, Pass2=4608, clm=4 (32 cores, 28 workers): 19.76, 19.71, 20.25, 19.64, 22.05, 20.75, 20.82, 20.07, 19.75, 19.81, 19.74, 19.67, 19.83, 19.75, 19.74, 19.74, 20.24, 20.09, 20.10, 20.10, 20.28, 20.08, 20.13, 20.53, 10.62, 10.82, 10.46, 10.53 ms. Throughput: 1571.42 iter/sec.
FFTlen=1728K all-complex, Type=3, Arch=4, Pass1=384, Pass2=4608, clm=4 (32 cores, 32 workers): 19.84, 20.61, 20.18, 21.15, 19.99, 20.30, 20.42, 20.43, 19.70, 19.68, 19.68, 19.68, 19.71, 19.65, 19.66, 19.65, 19.83, 19.68, 19.76, 19.73, 21.62, 20.82, 19.97, 19.64, 19.72, 20.27, 19.68, 19.78, 19.64, 19.63, 19.81, 19.63 ms. Throughput: 1602.07 iter/sec.
FFTlen=1728K all-complex, Type=3, Arch=4, Pass1=384, Pass2=4608, clm=4 (32 cores hyperthreaded, 1 worker): 2.10 ms. Throughput: 476.54 iter/sec.
FFTlen=1728K all-complex, Type=3, Arch=4, Pass1=384, Pass2=4608, clm=4 (32 cores hyperthreaded, 2 workers): 1.93, 1.85 ms. Throughput: 1057.37 iter/sec.
FFTlen=1728K all-complex, Type=3, Arch=4, Pass1=384, Pass2=4608, clm=4 (32 cores hyperthreaded, 4 workers): 3.01, 2.87, 2.86, 2.87 ms. Throughput: 1379.00 iter/sec.
FFTlen=1728K all-complex, Type=3, Arch=4, Pass1=384, Pass2=4608, clm=4 (32 cores hyperthreaded, 6 workers): 4.72, 4.90, 4.53, 4.62, 3.89, 3.79 ms. Throughput: 1374.08 iter/sec.
FFTlen=1728K all-complex, Type=3, Arch=4, Pass1=384, Pass2=4608, clm=4 (32 cores hyperthreaded, 8 workers): 5.44, 5.84, 5.40, 5.41, 5.42, 5.58, 5.43, 5.38 ms. Throughput: 1459.00 iter/sec.
FFTlen=1728K all-complex, Type=3, Arch=4, Pass1=384, Pass2=4608, clm=4 (32 cores hyperthreaded, 12 workers): 10.63, 10.62, 10.80, 10.94, 7.23, 7.57, 7.75, 7.45, 7.82, 8.51, 7.84, 8.14 ms. Throughput: 1401.80 iter/sec.
FFTlen=1728K all-complex, Type=3, Arch=4, Pass1=384, Pass2=4608, clm=4 (32 cores hyperthreaded, 16 workers): 10.85, 10.68, 10.89, 10.96, 10.75, 11.11, 10.68, 10.66, 10.81, 10.89, 10.69, 10.65, 10.81, 10.66, 10.83, 10.90 ms. Throughput: 1481.52 iter/sec.
FFTlen=1728K all-complex, Type=3, Arch=4, Pass1=384, Pass2=4608, clm=4 (32 cores hyperthreaded, 20 workers): 21.13, 21.15, 21.20, 21.25, 21.12, 21.26, 22.06, 21.14, 11.00, 10.68, 10.68, 10.68, 10.80, 10.87, 10.76, 10.75, 11.00, 11.12, 10.65, 10.63 ms. Throughput: 1487.06 iter/sec.
FFTlen=1728K all-complex, Type=3, Arch=4, Pass1=384, Pass2=4608, clm=4 (32 cores hyperthreaded, 24 workers): 21.25, 21.26, 21.26, 21.30, 21.18, 21.38, 21.57, 21.21, 21.31, 21.30, 21.29, 21.29, 21.23, 21.25, 21.28, 21.29, 10.97, 11.22, 10.70, 10.77, 10.77, 10.84, 10.75, 10.73 ms. Throughput: 1489.42 iter/sec.
FFTlen=1728K all-complex, Type=3, Arch=4, Pass1=384, Pass2=4608, clm=4 (32 cores hyperthreaded, 28 workers): 21.17, 21.20, 21.19, 21.15, 21.16, 21.44, 21.37, 21.17, 21.17, 21.14, 21.13, 21.15, 21.20, 21.24, 21.19, 21.26, 21.19, 21.25, 21.22, 21.23, 21.21, 21.25, 21.15, 21.12, 11.29, 10.69, 10.72, 10.71 ms. Throughput: 1500.58 iter/sec.
[Fri Sep 21 10:25:34 2018]
FFTlen=1728K all-complex, Type=3, Arch=4, Pass1=384, Pass2=4608, clm=4 (32 cores hyperthreaded, 32 workers): 21.20, 21.59, 21.25, 21.55, 21.20, 21.31, 21.22, 21.26, 21.18, 21.18, 21.20, 21.18, 21.26, 21.22, 21.13, 21.19, 21.22, 21.17, 21.27, 21.20, 21.18, 21.17, 21.26, 21.13, 21.32, 21.16, 21.31, 21.25, 21.16, 21.17, 21.17, 21.13 ms. Throughput: 1507.28 iter/sec.
Co se malého výkonu týká 3000:1600, uvědom si že je to 32 jader vs. 44 jader s vyšší frekvencí a mnohem výkonější AVX jednotkou, takže mi to zase tak překvapující nepřijde. A hlavně těch 32 jader slupne 255W v zátěži celkově, to pak situace vypadá zase trochu veseleji když se začne počítat i poměr watt/body .