Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT] perf: improve ManagedAuthenticatedEncryptor.Decrypt flow #59424

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

DeagleGross
Copy link
Contributor

@DeagleGross DeagleGross commented Dec 10, 2024

(Encrypt yet to be improved)

I looked into Unprotect method for ManagedAuthenticatedEncryptor and spotted MemoryStream usage and multiple Buffer.BlockCopy usages. Also I saw that there is some shuffling of byte[] data, which I think can be skipped and performed in such a way, that some allocations are skipped.

code Method Mean Error StdDev Gen0 Allocated
main branch Unprotect 9.706 µs 0.0143 µs 0.0134 µs 0.1068 3.34 KB
changed Unprotect 8.950 µs 0.0256 µs 0.0239 µs 0.4272 2.64 KB

Related to #59287

prfFactory: _kdkPrfFactory,
output: new ArraySegment<byte>(derivedKeysBuffer));

Buffer.BlockCopy(derivedKeysBuffer, 0, decryptionSubkey, 0, decryptionSubkey.Length);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was curious; a quick test with BDN for a buffer size 1024 shows span as 0.83 relative time; nice; however, you should be able to use .AsSpan(0, decryptionSubkey.Length) and .AsSpan(decryptionSubkey.Length, validationSubkey.Length) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the benchmark I used to check the copy approaches - https://github.com/DeagleGross/Baraholka/blob/main/Benchmarks/Benchmarks/CopyBytesBenchmark.cs

    | Method           | N    | Mean      | Error     | StdDev    | Allocated |
    |----------------- |----- |----------:|----------:|----------:|----------:|
    | Array_Copy       | 10   |  2.402 ns | 0.0181 ns | 0.0142 ns |         - |
    | Buffer_BlockCopy | 10   |  3.475 ns | 0.0232 ns | 0.0217 ns |         - |
    | Span_CopyTo      | 10   |  1.126 ns | 0.0139 ns | 0.0124 ns |         - |
    | Array_Copy       | 100  |  3.755 ns | 0.0313 ns | 0.0277 ns |         - |
    | Buffer_BlockCopy | 100  |  4.188 ns | 0.0419 ns | 0.0392 ns |         - |
    | Span_CopyTo      | 100  |  2.660 ns | 0.0196 ns | 0.0183 ns |         - |
    | Array_Copy       | 1000 | 10.202 ns | 0.1154 ns | 0.0964 ns |         - |
    | Buffer_BlockCopy | 1000 | 12.023 ns | 0.1411 ns | 0.1251 ns |         - |
    | Span_CopyTo      | 1000 |  8.986 ns | 0.1847 ns | 0.1814 ns |         - |

@@ -161,6 +166,8 @@ public byte[] Decrypt(ArraySegment<byte> protectedPayload, ArraySegment<byte> ad
protectedPayload.Validate();
additionalAuthenticatedData.Validate();

var protectedPayloadSpan = protectedPayload.AsSpan();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused?

@@ -111,7 +111,7 @@ internal static TimeSpan MaxServerClockSkew
/// <remarks>
/// Settable for testing.
/// </remarks>
internal TimeSpan DefaultKeyResolverRetryDelay { get; set; } = TimeSpan.FromMilliseconds(200);
internal TimeSpan DefaultKeyResolverRetryDelay { get; set; } = TimeSpan.FromMilliseconds(200.0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: an odd thing to include in a PR; was this to make a warning happy?

return symmetricAlgorithm.DecryptCbc(ciphertext, iv);
#else
var iv = new byte[_symmetricAlgorithmBlockSizeInBytes];
Buffer.BlockCopy(protectedPayload.Array!, ivOffset, iv, 0, iv.Length);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe still use span copy here?

@@ -187,8 +193,6 @@ public byte[] Decrypt(ArraySegment<byte> protectedPayload, ArraySegment<byte> ad
}

ArraySegment<byte> keyModifier = new ArraySegment<byte>(protectedPayload.Array!, keyModifierOffset, ivOffset - keyModifierOffset);
var iv = new byte[_symmetricAlgorithmBlockSizeInBytes];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just want to check: this is the main reduction (on suitable TFMs), yes?

cryptoStream.Write(protectedPayload.Array!, ciphertextOffset, macOffset - ciphertextOffset);
cryptoStream.FlushFinalBlock();
// note: here protectedPayload.Array is taken without an offset (cant use AsSpan() on ArraySegment)
var ciphertext = protectedPayload.Array.AsSpan().Slice(ciphertextOffset, macOffset - ciphertextOffset);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move range to AsSpan() ?

// At this point, outputStream := { plaintext }, and we're done!
return outputStream.ToArray();
}
return symmetricAlgorithm.DecryptCbc(ciphertext, iv);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we using CBC mode? I genuinely don't know - mostly after validation / explanation of why this choice when migrating - if this is what CryptoStream would use, maybe link?

Also, note that since this is a private method, we might also wonder whether we can remove the decrypted buffer by using the overload that takes a Span<byte> input, and moving the allocation (maybe as a lease) to the consuming code site; if the input is an oversized array, then note that MemoryStream allows an array to be passed in via the .ctor, which might mean it works fine in the fallback mode too, although that is usually intended for read buffers - just make sure you use the writable: true param and check the length written.

I can't tell you how the size of cipher-bytes and plain-bytes relate; is it 1:1? 🤷 don't know!

but, presumably, removing this result array would have a significant impact on the remaining allocs

if (!CryptoUtil.TimeConstantBuffersAreEqual(correctHash, 0, correctHash.Length, protectedPayload.Array!, macOffset, eofOffset - macOffset))
{
throw Error.CryptCommon_PayloadInvalid(); // integrity check failure
}

// Step 5: Decipher the ciphertext and return it to the caller.
#if NET10_0_OR_GREATER
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFM choice: do we support down-level? if not, just burn the old code; if we do: note that many of these APIs exist from net6 onwards, IIRC.

@@ -206,16 +210,16 @@ public byte[] Decrypt(ArraySegment<byte> protectedPayload, ArraySegment<byte> ad
try
{
_keyDerivationKey.WriteSecretIntoBuffer(new ArraySegment<byte>(decryptedKdk));
ManagedSP800_108_CTR_HMACSHA512.DeriveKeysWithContextHeader(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any clues on the difference between these two?

prfInput[3] = (byte)(i);

// Run the PRF and copy the results to the output buffer
var prfOutput = prf.ComputeHash(prfInput);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this HashAlgorithm (or sublass) ? If so, note that there is a ComputeHash(byte[], int, int) from .NET 9 onwards which would allo prfInput to be a pooled+oversized array, constrained i.e.

// See SP800-108, Sec. 5.1 for the format of the input to the PRF routine.
var prfLength = checked(sizeof(uint) /* [i]_2 */ + label.Length + 1 /* 0x00 */ + contextSharedLength + sizeof(uint) /* [K]_2 */)
#if NET9_0_OR_GREATER // check exact token
var prfInput = ArrayPool<byte>.Shared.Rent(prfLength);
#else
var prfInput = new byte[prfLength];
#endif

// ...

#if NET9_0_OR_GREATER
var prfOutput = prf.ComputeHash(prfInput, 0, prfLength);
ArrayPool<byte>.Shared.Return(prfInput, clearArray: true); // clearing because security-related
#else
var prfOutput = prf.ComputeHash(prfInput);
#endif

Copy link
Member

@mgravell mgravell Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooh, there's also TryComputeHash that allows a buffer to be passed in; if we can do that and move the alloc to a call-site lease, the TFM fallback path would need to do an extra copy, but: I'm fine with that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-dataprotection Includes: DataProtection
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants