WIP: fortify: swap to _chk-ish intrinsics #980

gburgessiv · 2020-04-08T00:19:35Z

This is a first pass at converting the FORTIFY magics in the kernel to use _chk intrinsics, rather than having all checks inline. Mostly throwing this up to facilitate discussion about #979.

This has a lot of reviewer-only artifacts (e.g., the NOTEs). If we think that upstream would be willing to accept this or some subset of it, happy to tidy it up, add docs about the optimizations we're relying on here, actually test it with anything but make under gcc, etc etc.

First pass at converting the FORTIFY magics in the kernel to use _chk intrinsics, rather than having all size checks inline. Notes are included for which `_chk` functions clang/gcc (since 5.1) recognize. If a compiler recognizes `__foo_chk`, the compiler is capable of optimizing it to `foo` if the `__foo_chk` variant would just call `foo()` anyway. For clang in particular, these transformations are a win. For GCC, it's _potentially_ better, but less certain than for clang.

nathanchance

Just a couple of style/organization comments, the overall idea seems relatively sane to me but I am farrrrrr from an expect in this area :)

cc @kees who is (context: #979)

nathanchance · 2020-04-08T00:22:24Z

include/linux/string.h

+ * Presented is the 'best' middle ground I think we can reach for both
+ * compilers WRT optimizability.
+ */
+#ifdef __clang__


CONFIG_CC_IS_CLANG is used more than __clang__ in the kernel (it actually looks like there is some clean up opportunity there...).

nathanchance · 2020-04-08T00:26:02Z

include/linux/string.h

 	size_t p_size = __builtin_object_size(p, 0);
-
+#ifdef __clang__


This #ifdef block is an eyesore but I do not really have another solution to offer :/

In general, the kernel prefers IS_ENABLED(CONFIG_...) to #ifdef (https://elixir.bootlin.com/linux/v5.6.2/source/include/linux/kconfig.h#L71). I have no idea if that is possible here.

Another possible solution would be possibly hoisting this into compiler-gcc.h and compiler-clang.h?

Err, if __strlen_chk() is implemented below, can't this be unconditional?

I'll try to answer both of your questions together. :)

This CL's intent is to use code that's as optimizeable as possible for clang + GCC. It does so by leaning heavily on Clang and GCC's special knowledge of well-known FORTIFY functions spelled __*_chk.

For example, Clang has special knowledge of __strlen_chk, so it's capable of e.g., folding it into a constant or deferring to a non-FORTIFY'ed strlen if doing so is obviously safe, whereas GCC isn't: https://godbolt.org/z/BBjEDH . The hope is to produce ~equivalent code to having the safety checks inline, for cases where the safety checks can be trivially elided (importantly, including __builtin_object_size(x, n) == -1).

The issue this is all trying to work around is that the sort of inherent "recursion" in many inline FORTIFY functions apparently forces LLVM to consider the FORTIFY'ed version to not be a builtin. I haven't deeply understood the implications of trying to fix that, but I know we've had many issues in the past of the optimizer being 'too smart' with FORTIFY'ed inline functions, so I'm not optimistic that it'll be a straightforward fix.

FWIW, there's a way cleaner approach we can take here (that'd also give clang the power to give nice compile-time diagnostics for trivial bugs. Not as well as GCC, to be fair, but something is better than nothing), but it'd require a fair amount code in an #ifdef clang block. IDK how kernel people feel about those these days. Happy to whip up a minimal example of what that may look like, if you're interested.

kees · 2020-04-08T21:25:06Z

lib/string.c

+		fortify_panic(__func__);
+	return __builtin_memmove(dest, src, count);
+}
+EXPORT_SYMBOL(__memmove_chk);


The original intent was to have all checking inline. Pushing these out external function calls seems not great. Why can't these just all be inline too?

This seems to be the least invasive change that appears to work in both GCC and Clang, if my testing is to be believed:

diff --git a/include/linux/string.h b/include/linux/string.h index 6dfbb2efa815..2388383921bd 100644 --- a/include/linux/string.h +++ b/include/linux/string.h @@ -366,6 +366,16 @@ __FORTIFY_INLINE void *memset(void *p, int c, __kernel_size_t size) return __builtin_memset(p, c, size); } +#ifdef CONFIG_CC_IS_CLANG +static inline void *__memcpy_chk(void *p, const void *q, size_t size, size_t p_size) +{ + return __builtin_memcpy(p, q, size); +} +#define __fortified_memcpy(p, q, size, p_size) __builtin___memcpy_chk(p, q, size, p_size) +#else +#define __fortified_memcpy(p, q, size, p_size) __builtin_memcpy(p, q, size) +#endif + __FORTIFY_INLINE void *memcpy(void *p, const void *q, __kernel_size_t size) { size_t p_size = __builtin_object_size(p, 0); @@ -378,7 +388,7 @@ __FORTIFY_INLINE void *memcpy(void *p, const void *q, __kernel_size_t size) } if (p_size < size || q_size < size) fortify_panic(__func__); - return __builtin_memcpy(p, q, size); + return __fortified_memcpy(p, q, size, p_size); } __FORTIFY_INLINE void *memmove(void *p, const void *q, __kernel_size_t size)

I don't have a great mental model for how clang codegens __builtin_memcpy. Namely, whether there are any guarantees that that will always go to a memcpy defined outside of the current TU.

In the past, we've had issues with LLVM seeing things like the above as mutual recursion, and optimizing libfuncs like that to basically nop. Maybe clang has grown guarantees in the interim, but I haven't been paying close enough attention, so I can't say for sure offhand. :)

FWIW, if we don't care about &memcpy == &memcpy or __builtin_object_size(x, 1) being equivalent to __builtin_object_size(x, 0) ISTM we can do

#define __FORTIFY_INLINE_BASE inline __attribute__((always_inline)) #ifdef CONFIG_CC_IS_CLANG #define __FORTIFY_INLINE static __FORTIFY_INLINE_BASE __attribute__((overloadable)) __attribute__((enable_if(1, ""))) #else #define __FORTIFY_INLINE extern __FORTIFY_INLINE_BASE __attribute__((gnu_inline)) #endif

And everything else should Just Work. This has clang emit our FORTIFY'ed memcpy as a C++-style overload for memcpy: https://godbolt.org/z/qqBP8P .

If we do care about either (or both) of the constraints above, we can declare victory by sprinkling __attribute__((pass_object_size(N)))s around on the params of these functions. It's super simple to do on its own, though admittedly noisy:

/* N.B. -- this also requires the overloadable change in the snippet above */ #ifdef CONFIG_CC_IS_CLANG #define __pass_object_size const __attribute__((pass_object_size(0))) #else #define __pass_object_size #endif __FORTIFY_INLINE void *memcpy(void * __pass_object_size p, const void *__pass_object_size q, __kernel_size_t size) { ... }

Unfortunately, I need to keep __builtin_object_size(x, 0) for memcpy() because the kernel somewhat regularly likes to use the address of a struct member to find where to start writing and then proceeds to intentionally copy across several members. :( The more common flaw is array bounds being overrun with direct access (not with memcpy()), and I'm also trying to get that addressed with UBSan's bounds checking (KSPP#25).

I do want to switch to __builtin_object_size(x, 1) for the str*() family, though, but that's also a separate effort (KSPP#6).

Another issue was the inline requirement being wanted to avoid making function calls. So, while __pass_object_size could absolutely be helpful, I'm not entirely certain it's the right thing for here. I'll need to try it out and see what the resulting kernel image looks like.

My first goal right now is to build testing for the fortify behaviors so any given compiler's output can be validated. I had avoided this in the past because of needing to invent a way to have a build failure be seen as a success by the kernel's build system (i.e. to verify compile-time checks are triggering) in addition to the run-time tests (which I'd add to the kernel's LKDTM subsystem that checks for kernel-faulting style failure conditions).

Unfortunately, I need to keep __builtin_object_size(x, 0) for memcpy

Yeah, Bionic and glibc need to do similar things. It's sad, but still I think the best trade-off.

Another issue was the inline requirement being wanted to avoid making function calls.

Totally understandable. To clarify about the following though:

So, while __pass_object_size could absolutely be helpful, I'm not entirely certain it's the right thing for here. I'll need to try it out and see what the resulting kernel image looks like.

FWIW, __pass_object_size doesn't affect inlieability at all when always_inline is present. It only lifts the call to __builtin_object_size() up to the caller: https://godbolt.org/z/LV5TJT . Because of how clang is architected, this helps improve results with __builtin_object_size(x, 1) in some cases. (And can be used for cross-TU __builtin_object_size, but I know of 0 users of that in practice ;) ).

It comes with the side-benefit of making it impossible to take the address of the function that pass_object_size is on, since we're inserting "invisible" params with that attribute. So all &memcpys, for example, will grab the address of the not-FORTIFY'ed wrapper.

I had avoided this in the past because of needing to invent a way to have a build failure be seen as a success by the kernel's build system (i.e. to verify compile-time checks are triggering) in addition to the run-time tests (which I'd add to the kernel's LKDTM subsystem that checks for kernel-faulting style failure conditions).

Fair fair. If it helps spark inspiration, Bionic wraps diagnostic & run-time tests for FORTIFY all up in a single file: https://android.googlesource.com/platform/bionic/+/refs/heads/master/tests/clang_fortify_tests.cpp#600

The diagnostic testing in particular uses clang's internal -verify option, which clang uses for its own diagnostic tests.

nathanchance reviewed Apr 8, 2020

View reviewed changes

kees reviewed Apr 8, 2020

View reviewed changes

gburgessiv mentioned this pull request Apr 24, 2020

verify Clang's fortified string routines are called in the kernel #1002

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: fortify: swap to _chk-ish intrinsics #980

WIP: fortify: swap to _chk-ish intrinsics #980

gburgessiv commented Apr 8, 2020

nathanchance left a comment

nathanchance Apr 8, 2020

nathanchance Apr 8, 2020

kees Apr 8, 2020

gburgessiv Apr 8, 2020

kees Apr 8, 2020

kees Apr 10, 2020

gburgessiv Apr 15, 2020

kees Apr 15, 2020

gburgessiv Apr 15, 2020

		size_t p_size = __builtin_object_size(p, 0);

		#ifdef __clang__

WIP: fortify: swap to _chk-ish intrinsics #980

Are you sure you want to change the base?

WIP: fortify: swap to _chk-ish intrinsics #980

Conversation

gburgessiv commented Apr 8, 2020

nathanchance left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment