-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DRAFT] perf: improve ManagedAuthenticatedEncryptor.Decrypt flow #59424
base: main
Are you sure you want to change the base?
[DRAFT] perf: improve ManagedAuthenticatedEncryptor.Decrypt flow #59424
Conversation
prfFactory: _kdkPrfFactory, | ||
output: new ArraySegment<byte>(derivedKeysBuffer)); | ||
|
||
Buffer.BlockCopy(derivedKeysBuffer, 0, decryptionSubkey, 0, decryptionSubkey.Length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was curious; a quick test with BDN for a buffer size 1024 shows span as 0.83
relative time; nice; however, you should be able to use .AsSpan(0, decryptionSubkey.Length)
and .AsSpan(decryptionSubkey.Length, validationSubkey.Length)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the benchmark I used to check the copy approaches - https://github.com/DeagleGross/Baraholka/blob/main/Benchmarks/Benchmarks/CopyBytesBenchmark.cs
| Method | N | Mean | Error | StdDev | Allocated |
|----------------- |----- |----------:|----------:|----------:|----------:|
| Array_Copy | 10 | 2.402 ns | 0.0181 ns | 0.0142 ns | - |
| Buffer_BlockCopy | 10 | 3.475 ns | 0.0232 ns | 0.0217 ns | - |
| Span_CopyTo | 10 | 1.126 ns | 0.0139 ns | 0.0124 ns | - |
| Array_Copy | 100 | 3.755 ns | 0.0313 ns | 0.0277 ns | - |
| Buffer_BlockCopy | 100 | 4.188 ns | 0.0419 ns | 0.0392 ns | - |
| Span_CopyTo | 100 | 2.660 ns | 0.0196 ns | 0.0183 ns | - |
| Array_Copy | 1000 | 10.202 ns | 0.1154 ns | 0.0964 ns | - |
| Buffer_BlockCopy | 1000 | 12.023 ns | 0.1411 ns | 0.1251 ns | - |
| Span_CopyTo | 1000 | 8.986 ns | 0.1847 ns | 0.1814 ns | - |
@@ -161,6 +166,8 @@ public byte[] Decrypt(ArraySegment<byte> protectedPayload, ArraySegment<byte> ad | |||
protectedPayload.Validate(); | |||
additionalAuthenticatedData.Validate(); | |||
|
|||
var protectedPayloadSpan = protectedPayload.AsSpan(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused?
@@ -111,7 +111,7 @@ internal static TimeSpan MaxServerClockSkew | |||
/// <remarks> | |||
/// Settable for testing. | |||
/// </remarks> | |||
internal TimeSpan DefaultKeyResolverRetryDelay { get; set; } = TimeSpan.FromMilliseconds(200); | |||
internal TimeSpan DefaultKeyResolverRetryDelay { get; set; } = TimeSpan.FromMilliseconds(200.0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: an odd thing to include in a PR; was this to make a warning happy?
return symmetricAlgorithm.DecryptCbc(ciphertext, iv); | ||
#else | ||
var iv = new byte[_symmetricAlgorithmBlockSizeInBytes]; | ||
Buffer.BlockCopy(protectedPayload.Array!, ivOffset, iv, 0, iv.Length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe still use span copy here?
@@ -187,8 +193,6 @@ public byte[] Decrypt(ArraySegment<byte> protectedPayload, ArraySegment<byte> ad | |||
} | |||
|
|||
ArraySegment<byte> keyModifier = new ArraySegment<byte>(protectedPayload.Array!, keyModifierOffset, ivOffset - keyModifierOffset); | |||
var iv = new byte[_symmetricAlgorithmBlockSizeInBytes]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just want to check: this is the main reduction (on suitable TFMs), yes?
cryptoStream.Write(protectedPayload.Array!, ciphertextOffset, macOffset - ciphertextOffset); | ||
cryptoStream.FlushFinalBlock(); | ||
// note: here protectedPayload.Array is taken without an offset (cant use AsSpan() on ArraySegment) | ||
var ciphertext = protectedPayload.Array.AsSpan().Slice(ciphertextOffset, macOffset - ciphertextOffset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move range to AsSpan()
?
// At this point, outputStream := { plaintext }, and we're done! | ||
return outputStream.ToArray(); | ||
} | ||
return symmetricAlgorithm.DecryptCbc(ciphertext, iv); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we using CBC mode? I genuinely don't know - mostly after validation / explanation of why this choice when migrating - if this is what CryptoStream
would use, maybe link?
Also, note that since this is a private method, we might also wonder whether we can remove the decrypted buffer by using the overload that takes a Span<byte>
input, and moving the allocation (maybe as a lease) to the consuming code site; if the input is an oversized array, then note that MemoryStream
allows an array to be passed in via the .ctor, which might mean it works fine in the fallback mode too, although that is usually intended for read buffers - just make sure you use the writable: true
param and check the length written.
I can't tell you how the size of cipher-bytes and plain-bytes relate; is it 1:1? 🤷 don't know!
but, presumably, removing this result array would have a significant impact on the remaining allocs
if (!CryptoUtil.TimeConstantBuffersAreEqual(correctHash, 0, correctHash.Length, protectedPayload.Array!, macOffset, eofOffset - macOffset)) | ||
{ | ||
throw Error.CryptCommon_PayloadInvalid(); // integrity check failure | ||
} | ||
|
||
// Step 5: Decipher the ciphertext and return it to the caller. | ||
#if NET10_0_OR_GREATER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFM choice: do we support down-level? if not, just burn the old code; if we do: note that many of these APIs exist from net6 onwards, IIRC.
@@ -206,16 +210,16 @@ public byte[] Decrypt(ArraySegment<byte> protectedPayload, ArraySegment<byte> ad | |||
try | |||
{ | |||
_keyDerivationKey.WriteSecretIntoBuffer(new ArraySegment<byte>(decryptedKdk)); | |||
ManagedSP800_108_CTR_HMACSHA512.DeriveKeysWithContextHeader( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any clues on the difference between these two?
prfInput[3] = (byte)(i); | ||
|
||
// Run the PRF and copy the results to the output buffer | ||
var prfOutput = prf.ComputeHash(prfInput); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this HashAlgorithm
(or sublass) ? If so, note that there is a ComputeHash(byte[], int, int)
from .NET 9 onwards which would allo prfInput
to be a pooled+oversized array, constrained i.e.
// See SP800-108, Sec. 5.1 for the format of the input to the PRF routine.
var prfLength = checked(sizeof(uint) /* [i]_2 */ + label.Length + 1 /* 0x00 */ + contextSharedLength + sizeof(uint) /* [K]_2 */)
#if NET9_0_OR_GREATER // check exact token
var prfInput = ArrayPool<byte>.Shared.Rent(prfLength);
#else
var prfInput = new byte[prfLength];
#endif
// ...
#if NET9_0_OR_GREATER
var prfOutput = prf.ComputeHash(prfInput, 0, prfLength);
ArrayPool<byte>.Shared.Return(prfInput, clearArray: true); // clearing because security-related
#else
var prfOutput = prf.ComputeHash(prfInput);
#endif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooh, there's also TryComputeHash
that allows a buffer to be passed in; if we can do that and move the alloc to a call-site lease, the TFM fallback path would need to do an extra copy, but: I'm fine with that
(Encrypt yet to be improved)
I looked into
Unprotect
method forManagedAuthenticatedEncryptor
and spottedMemoryStream
usage and multipleBuffer.BlockCopy
usages. Also I saw that there is some shuffling ofbyte[]
data, which I think can be skipped and performed in such a way, that some allocations are skipped.Related to #59287