Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: get rid of MemoryStream in KeyRingBasedDataProtector #59322

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

DeagleGross
Copy link
Contributor

While taking a look at allocations in DataProtection layer executed on the linux machine in benchmarks (see #59287), I found it is using a MemoryStream, which i decided to improve (get rid of). Benchmark code

Method Mean Error StdDev Median Gen0 Allocated
MemoryStream 76.86 ns 6.342 ns 17.145 ns 68.12 ns 0.0086 448 B
Manual 44.82 ns 3.043 ns 8.974 ns 48.49 ns 0.0008 64 B

New code is set to be executed only under NET10_OR_GREATER because `Span<>'ish APIs are not supported for netstandard2.0 \ netFx.

Notes:

  1. The code is not on a hot-path. It is executed during the DataProtector construction, therefore it is not showing the perf boost in for example this benchmark code when I was running it with crank:
_dataProtector= new ServiceCollection()
    .AddDataProtection()
    .Services.BuildServiceProvider()
    .GetDataProtector("SamplePurpose");
        
// and somewhere later
_dataProtector.Protect(data);
  1. It is performant in both cases where string[] purposes is having multiple items where string values are of a "big" length, and when it is a single purpose with lets say "SamplePurpose" value. Here are the results when there are 9 purposes:
Method Mean Error StdDev Median Gen0 Gen1 Gen2 Allocated
MemoryStream 304.2 ns 23.93 ns 70.57 ns 267.5 ns 0.0067 - - 512 B
Manual 192.1 ns 10.76 ns 31.74 ns 206.6 ns 0.0021 0.0001 0.0001 -

big thanks to @mgravell for help

Related to #59287

@DeagleGross DeagleGross self-assigned this Dec 4, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-dataprotection Includes: DataProtection label Dec 4, 2024
var keySize = sizeof(Guid);
int totalPurposeLen = 4 + keySize + 4;

var purposeLengthsPool = ArrayPool<int>.Shared.Rent(purposes.Length);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you measure if it was worth renting an array only to prevent an extra call to SecureUtf8Encoding.GetByteCount and Measure7BitEncodedUIntLength

Copy link
Member

@mgravell mgravell Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean just pay the cost twice,? Yeah, should probably measure it.

Side thought: I'm guessing we don't actually expect many purposes. Another thing we could do here, if you're concerned about ArrayPool overhead, is use the stack - something like:

int[]? lease = null;
Span<int> lengths = purposes.Length <= 32 ? stackalloc int[purposes.Length] : (lease = ArrayPool<int>.Shared.Rent(purposes.Length)).AsSpan(0, purposes.Length);
...
if (lease is not null)
{
    ArrayPool<int>.Shared.Return(lease);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark code

Method Mean Error StdDev Gen0 Gen1 Gen2 Allocated
MemoryStream 141.4 ns 2.62 ns 3.31 ns 0.0067 - - 512 B
Manual 101.8 ns 0.98 ns 0.92 ns 0.0023 0.0002 0.0002 320 B
Manual_CalculateByteCountEveryTime 124.7 ns 1.62 ns 1.52 ns 0.0017 - - 128 B
Manual_StackAllocForSmallPurposeArrays 122.4 ns 0.70 ns 0.54 ns 0.0014 - - 128 B

It seems like that last option Marc suggested is the most balanced - we indeed dont expect many purposes (I think), and this solution does not contest for the ArrayPool rent.

What do you think?

Copy link
Member

@mgravell mgravell Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are the last two doing the same work as Manual? it seems odd that the perf is worse than Manual - we're doing less work (by not touching the pool at all) - why is it taking longer? question: can you try this with [SkipLocalsInit] on the method? or the entire type for fairness (this is applied globally in aspnetcore, I believe, but I don't know whether it'll apply to your benchmark which looks to be custom).

I also don't know why the memory is less if it is doing the same work - the only non-amortized allocation here is the result byte[]; so I assume this is a different data test than Manual?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you try this with [SkipLocalsInit] on the method? or the entire type for fairness (this is applied globally in aspnetcore, I believe

Don't think it is. #26586

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have talked with Marc: he launched the benchmarks on his machine and there is no crazy allocs which are confusing. I think we should merge with the current state of code (stackalloc for under 32 dynamic length of purposes)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you try this with [SkipLocalsInit] on the method? or the entire type for fairness (this is applied globally in aspnetcore, I believe

Don't think it is. #26586

We should revisit this, IMHO.

@@ -367,7 +367,8 @@ internal static byte[] BuildAadTemplateBytes(string[] purposes)
var keySize = sizeof(Guid);
int totalPurposeLen = 4 + keySize + 4;

var purposeLengthsPool = ArrayPool<int>.Shared.Rent(purposes.Length);
int[]? lease = null;
Span<int> purposeLengthsPool = purposes.Length <= 32 ? stackalloc int[purposes.Length] : (lease = ArrayPool<int>.Shared.Rent(purposes.Length)).AsSpan(0, purposes.Length);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 from me, interested in @sebastienros thoughts

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing we don't actually expect many purposes

I don't know exactly where this DP is used, in the case of antiforgery we saw a single purpose. But assuming this is the default in any ASP.NET it may be used for way more things. Would be good to get this data for the benchmarks, because we might just screw other scenarios. At least we could add a [Parameter] to the benchmark with other numbers of purposes. (1, 8, 64) for starter? But it might also be ever better for other scenarios where the allocations would be worse and this would make even more sense than just for AF.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there shouldn't really be any allocation here - or a least, it is amortized over the pool usage; we aren't borrowing it for an extended period of time or anything, so: the only time we can really get into problems here is if the pool itself is already totally borked and drowning, in which case we'd be the victim, not the perpetrator

@sebastienros
Copy link
Member

Since the data protector is supposed to be reused, and it is for AF in aspnetcore (https://github.com/dotnet/aspnetcore/blob/74cdf3811fc59b29c07b49a949c5e335da870beb/src/Antiforgery/src/Internal/DefaultAntiforgeryTokenSerializer.cs is registered as a singleton), there shouldn't be any measurable advantage in improving the creation of this "template" field. It shows in the micro benchmark, but in practice this shouldn't in an actual app (unless users create this protector again and again). We should just be saving a single allocation for the first request.

If we are seeing these streams allocated in a web benchmarks then maybe we should find the source, and change it to reuse the data protector.

@mgravell
Copy link
Member

mgravell commented Dec 5, 2024

Yup, already discussed offline. If we're happy that this is functionally correct, suggest we get this merged, and move on to look deeper at the per-data protect/unprotect paths.

@dotnet-policy-service dotnet-policy-service bot added the pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun label Dec 13, 2024
@DeagleGross
Copy link
Contributor Author

/azp run

@dotnet-policy-service dotnet-policy-service bot removed the pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun label Dec 13, 2024
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-dataprotection Includes: DataProtection Perf
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants