-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: fortify: swap to _chk-ish intrinsics #980
base: master
Are you sure you want to change the base?
Conversation
First pass at converting the FORTIFY magics in the kernel to use _chk intrinsics, rather than having all size checks inline. Notes are included for which `_chk` functions clang/gcc (since 5.1) recognize. If a compiler recognizes `__foo_chk`, the compiler is capable of optimizing it to `foo` if the `__foo_chk` variant would just call `foo()` anyway. For clang in particular, these transformations are a win. For GCC, it's _potentially_ better, but less certain than for clang.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Presented is the 'best' middle ground I think we can reach for both | ||
* compilers WRT optimizability. | ||
*/ | ||
#ifdef __clang__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CONFIG_CC_IS_CLANG
is used more than __clang__
in the kernel (it actually looks like there is some clean up opportunity there...).
size_t p_size = __builtin_object_size(p, 0); | ||
|
||
#ifdef __clang__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This #ifdef
block is an eyesore but I do not really have another solution to offer :/
In general, the kernel prefers IS_ENABLED(CONFIG_...)
to #ifdef
(https://elixir.bootlin.com/linux/v5.6.2/source/include/linux/kconfig.h#L71). I have no idea if that is possible here.
Another possible solution would be possibly hoisting this into compiler-gcc.h
and compiler-clang.h
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Err, if __strlen_chk() is implemented below, can't this be unconditional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to answer both of your questions together. :)
This CL's intent is to use code that's as optimizeable as possible for clang + GCC. It does so by leaning heavily on Clang and GCC's special knowledge of well-known FORTIFY functions spelled __*_chk
.
For example, Clang has special knowledge of __strlen_chk
, so it's capable of e.g., folding it into a constant or deferring to a non-FORTIFY'ed strlen
if doing so is obviously safe, whereas GCC isn't: https://godbolt.org/z/BBjEDH . The hope is to produce ~equivalent code to having the safety checks inline, for cases where the safety checks can be trivially elided (importantly, including __builtin_object_size(x, n) == -1
).
The issue this is all trying to work around is that the sort of inherent "recursion" in many inline
FORTIFY functions apparently forces LLVM to consider the FORTIFY'ed version to not be a builtin. I haven't deeply understood the implications of trying to fix that, but I know we've had many issues in the past of the optimizer being 'too smart' with FORTIFY'ed inline functions, so I'm not optimistic that it'll be a straightforward fix.
FWIW, there's a way cleaner approach we can take here (that'd also give clang the power to give nice compile-time diagnostics for trivial bugs. Not as well as GCC, to be fair, but something is better than nothing), but it'd require a fair amount code in an #ifdef clang
block. IDK how kernel people feel about those these days. Happy to whip up a minimal example of what that may look like, if you're interested.
fortify_panic(__func__); | ||
return __builtin_memmove(dest, src, count); | ||
} | ||
EXPORT_SYMBOL(__memmove_chk); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original intent was to have all checking inline. Pushing these out external function calls seems not great. Why can't these just all be inline too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be the least invasive change that appears to work in both GCC and Clang, if my testing is to be believed:
diff --git a/include/linux/string.h b/include/linux/string.h
index 6dfbb2efa815..2388383921bd 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -366,6 +366,16 @@ __FORTIFY_INLINE void *memset(void *p, int c, __kernel_size_t size)
return __builtin_memset(p, c, size);
}
+#ifdef CONFIG_CC_IS_CLANG
+static inline void *__memcpy_chk(void *p, const void *q, size_t size, size_t p_size)
+{
+ return __builtin_memcpy(p, q, size);
+}
+#define __fortified_memcpy(p, q, size, p_size) __builtin___memcpy_chk(p, q, size, p_size)
+#else
+#define __fortified_memcpy(p, q, size, p_size) __builtin_memcpy(p, q, size)
+#endif
+
__FORTIFY_INLINE void *memcpy(void *p, const void *q, __kernel_size_t size)
{
size_t p_size = __builtin_object_size(p, 0);
@@ -378,7 +388,7 @@ __FORTIFY_INLINE void *memcpy(void *p, const void *q, __kernel_size_t size)
}
if (p_size < size || q_size < size)
fortify_panic(__func__);
- return __builtin_memcpy(p, q, size);
+ return __fortified_memcpy(p, q, size, p_size);
}
__FORTIFY_INLINE void *memmove(void *p, const void *q, __kernel_size_t size)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a great mental model for how clang codegens __builtin_memcpy
. Namely, whether there are any guarantees that that will always go to a memcpy
defined outside of the current TU.
In the past, we've had issues with LLVM seeing things like the above as mutual recursion, and optimizing libfuncs like that to basically nop
. Maybe clang has grown guarantees in the interim, but I haven't been paying close enough attention, so I can't say for sure offhand. :)
FWIW, if we don't care about &memcpy == &memcpy
or __builtin_object_size(x, 1)
being equivalent to __builtin_object_size(x, 0)
ISTM we can do
#define __FORTIFY_INLINE_BASE inline __attribute__((always_inline))
#ifdef CONFIG_CC_IS_CLANG
#define __FORTIFY_INLINE static __FORTIFY_INLINE_BASE __attribute__((overloadable)) __attribute__((enable_if(1, "")))
#else
#define __FORTIFY_INLINE extern __FORTIFY_INLINE_BASE __attribute__((gnu_inline))
#endif
And everything else should Just Work. This has clang emit our FORTIFY'ed memcpy as a C++-style overload for memcpy
: https://godbolt.org/z/qqBP8P .
If we do care about either (or both) of the constraints above, we can declare victory by sprinkling __attribute__((pass_object_size(N)))
s around on the params of these functions. It's super simple to do on its own, though admittedly noisy:
/* N.B. -- this also requires the overloadable change in the snippet above */
#ifdef CONFIG_CC_IS_CLANG
#define __pass_object_size const __attribute__((pass_object_size(0)))
#else
#define __pass_object_size
#endif
__FORTIFY_INLINE void *memcpy(void * __pass_object_size p, const void *__pass_object_size q, __kernel_size_t size)
{
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, I need to keep __builtin_object_size(x, 0) for memcpy() because the kernel somewhat regularly likes to use the address of a struct member to find where to start writing and then proceeds to intentionally copy across several members. :( The more common flaw is array bounds being overrun with direct access (not with memcpy()), and I'm also trying to get that addressed with UBSan's bounds checking (KSPP#25).
I do want to switch to __builtin_object_size(x, 1) for the str*() family, though, but that's also a separate effort (KSPP#6).
Another issue was the inline requirement being wanted to avoid making function calls. So, while __pass_object_size could absolutely be helpful, I'm not entirely certain it's the right thing for here. I'll need to try it out and see what the resulting kernel image looks like.
My first goal right now is to build testing for the fortify behaviors so any given compiler's output can be validated. I had avoided this in the past because of needing to invent a way to have a build failure be seen as a success by the kernel's build system (i.e. to verify compile-time checks are triggering) in addition to the run-time tests (which I'd add to the kernel's LKDTM subsystem that checks for kernel-faulting style failure conditions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, I need to keep __builtin_object_size(x, 0) for memcpy
Yeah, Bionic and glibc need to do similar things. It's sad, but still I think the best trade-off.
Another issue was the inline requirement being wanted to avoid making function calls.
Totally understandable. To clarify about the following though:
So, while __pass_object_size could absolutely be helpful, I'm not entirely certain it's the right thing for here. I'll need to try it out and see what the resulting kernel image looks like.
FWIW, __pass_object_size
doesn't affect inlieability at all when always_inline
is present. It only lifts the call to __builtin_object_size()
up to the caller: https://godbolt.org/z/LV5TJT . Because of how clang is architected, this helps improve results with __builtin_object_size(x, 1)
in some cases. (And can be used for cross-TU __builtin_object_size
, but I know of 0 users of that in practice ;) ).
It comes with the side-benefit of making it impossible to take the address of the function that pass_object_size
is on, since we're inserting "invisible" params with that attribute. So all &memcpy
s, for example, will grab the address of the not-FORTIFY'ed wrapper.
I had avoided this in the past because of needing to invent a way to have a build failure be seen as a success by the kernel's build system (i.e. to verify compile-time checks are triggering) in addition to the run-time tests (which I'd add to the kernel's LKDTM subsystem that checks for kernel-faulting style failure conditions).
Fair fair. If it helps spark inspiration, Bionic wraps diagnostic & run-time tests for FORTIFY all up in a single file: https://android.googlesource.com/platform/bionic/+/refs/heads/master/tests/clang_fortify_tests.cpp#600
The diagnostic testing in particular uses clang's internal -verify
option, which clang uses for its own diagnostic tests.
This is a first pass at converting the FORTIFY magics in the kernel to use _chk intrinsics, rather than having all checks inline. Mostly throwing this up to facilitate discussion about #979.
This has a lot of reviewer-only artifacts (e.g., the
NOTE
s). If we think that upstream would be willing to accept this or some subset of it, happy to tidy it up, add docs about the optimizations we're relying on here, actually test it with anything butmake
under gcc, etc etc.