BenchmarkDotNet v0.13.0新增功能

起因

在获取BenchmarkDotNet源码,发现BenchmarkDotNet已经正式发布v0.13.0,看看主要增加了哪些特性.

v0.13.0增加特性:
  1. 支持.Net 5(主要是TargetFramework从netcoreapp50,改为net5.0)和.Net 6(net6.0)
  2. 支持单文件发布
  3. 支持Xamarin应用(Android和IOS)
  4. 支持WASM(Web Assembly),这里主要指Blazor Client
  5. 支持Mono AOT

BenchmarkDotNet 在v0.13.0版本增加特性
改善:
  1. 内存分配随机
  2. Method-specific job attributes(因为不知道是什么鬼,暂时不翻译)
  3. 参数支持排序(设置优先级)
  4. 支持自定义Ratio(比例)显示
  5. 改进CoreRun和CoreRT的支持
  6. 改进硬件计数器

先用 编码中代码优化 代码,用.Net 5做基线版本和.Net 6(Preview 3)对比一下.
使用BenchmarkDotNet v0.13.0 版本,对比一下.Net 5 和.Net 6在堆上分配性能测试
可以看到在.Net 6对堆上内存分配和内存池,还是有微小改进.

因为新的特性,暂时只用到.Net 6测试,其他没使用,就不多说了,还是说说暂时需要用到的功能.

内存随机分配:
using BenchmarkDotNet.Attributes;

namespace dotnet_perf
{
    [MemoryDiagnoser]
    [DisassemblyDiagnoser]
    public class IntroMemoryRandomization
    {
        [Params(512 * 4)]
        public int Size;

        private int[] _array;
        private int[] _destination;

        [GlobalSetup]
        public void Setup()
        {
            _array = new int[Size];
            _destination = new int[Size];
        }

        [Benchmark]
        public void Array() => System.Array.Copy(_array, _destination, Size);
    }
}
# 参数--memoryRandomization true
dotnet run -c Release -f net5.0 --runtimes net50 net60  --filter ** --memoryRandomization true  --join

BenchmarkDotNet v0.13.0 内存随机分配测试

通过这个发现在.Net 6 Preview3 在内存随机分配性能还没有.Net5好.

关于MemoryRandomization实现源码(BenchmarkDotNet/src/BenchmarkDotNet/Engines/Engine.cs):
public Measurement RunIteration(IterationData data)
{
    // Initialization
    long invokeCount = data.InvokeCount;
    int unrollFactor = data.UnrollFactor;
    long totalOperations = invokeCount * OperationsPerInvoke;
    bool isOverhead = data.IterationMode == IterationMode.Overhead;
    bool randomizeMemory = !isOverhead && MemoryRandomization;  //获取内存随机分配参数
    var action = isOverhead ? OverheadAction : WorkloadAction;

    if (!isOverhead)
        IterationSetupAction();

    GcCollect();

    if (EngineEventSource.Log.IsEnabled())
        EngineEventSource.Log.IterationStart(data.IterationMode, data.IterationStage, totalOperations);

    Span<byte> stackMemory = randomizeMemory ? stackalloc byte[random.Next(32)] : Span<byte>.Empty;

    // Measure
    var clock = Clock.Start();
    action(invokeCount / unrollFactor);
    var clockSpan = clock.GetElapsed();

    if (EngineEventSource.Log.IsEnabled())
        EngineEventSource.Log.IterationStop(data.IterationMode, data.IterationStage, totalOperations);

    if (!isOverhead)
        IterationCleanupAction();

    if (randomizeMemory)
        RandomizeManagedHeapMemory();  //内存随机分配实现

    GcCollect();

    // Results
    var measurement = new Measurement(0, data.IterationMode, data.IterationStage, data.Index, totalOperations, clockSpan.GetNanoseconds());
    WriteLine(measurement.ToString());

    Consume(stackMemory);

    return measurement;
}

private (GcStats, ThreadingStats) GetExtraStats(IterationData data)
{
    // we enable monitoring after main target run, for this single iteration which is executed at the end
    // so even if we enable AppDomain monitoring in separate process
    // it does not matter, because we have already obtained the results!
    EnableMonitoring();

    IterationSetupAction(); // we run iteration setup first, so even if it allocates, it is not included in the results

    var initialThreadingStats = ThreadingStats.ReadInitial(); // this method might allocate
    var initialGcStats = GcStats.ReadInitial();

    WorkloadAction(data.InvokeCount / data.UnrollFactor);

    var finalGcStats = GcStats.ReadFinal();
    var finalThreadingStats = ThreadingStats.ReadFinal();

    IterationCleanupAction(); // we run iteration cleanup after collecting GC stats

    GcStats gcStats = (finalGcStats - initialGcStats).WithTotalOperations(data.InvokeCount * OperationsPerInvoke);
    ThreadingStats threadingStats = (finalThreadingStats - initialThreadingStats).WithTotalOperations(data.InvokeCount * OperationsPerInvoke);

    return (gcStats, threadingStats);
}

//空实现,禁止jit编译器内联优化
[MethodImpl(MethodImplOptions.NoInlining)]
private void Consume(in Span<byte> _) { }

private void RandomizeManagedHeapMemory()
{
    // invoke global cleanup before global setup
    GlobalCleanupAction?.Invoke();

    var gen0object = new byte[random.Next(32)];  //在GC 0代分配小对象
    var lohObject = new byte[85 * 1024 + random.Next(32)];   //在堆上分配大对象

    // we expect the key allocations to happen in global setup (not ctor)
    // so we call it while keeping the random-size objects alive
    GlobalSetupAction?.Invoke();

    GC.KeepAlive(gen0object);
    GC.KeepAlive(lohObject);

    // we don't enforce GC.Collects here as engine does it later anyway
}

参数显示排序:

using System.Threading;
using BenchmarkDotNet.Attributes;

namespace dotnet_perf
{
    public class IntroParamsPriority
    {
        [Params(100)]
        public int A { get; set; }

        //Priority默认为0 最小为int.MinValue 最大为int.MaxValue
        //Priority值越小,显示也越早
        [Params(10, Priority = -100)]  
        public int B { get; set; }

        [Benchmark]
        public void Benchmark() => Thread.Sleep(A + B + 5);
    }
}

在Benchmark.DotNet在V0.13.0支持显示列的顺序

自定义显示比例(倍数):

using System.Threading;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Columns;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Reports;

namespace dotnet_perf
{
    [Config(typeof(Config))]
    public class IntroRatioStyle
    {
        [Benchmark(Baseline = true)]
        public void Baseline() => Thread.Sleep(1000);

        [Benchmark]
        public void Bar() => Thread.Sleep(150);

        [Benchmark]
        public void Foo() => Thread.Sleep(1150);

        private class Config : ManualConfig
        {
            public Config()
            {
                SummaryStyle = SummaryStyle.Default.WithRatioStyle(RatioStyle.Trend);
            }
        }
    }
}

BenchmarkDotNet v0.13.0 基线版本对比,显示倍数

秋风 2021-05-20