Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: fortify: swap to _chk-ish intrinsics #980

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

gburgessiv
Copy link
Member

This is a first pass at converting the FORTIFY magics in the kernel to use _chk intrinsics, rather than having all checks inline. Mostly throwing this up to facilitate discussion about #979.

This has a lot of reviewer-only artifacts (e.g., the NOTEs). If we think that upstream would be willing to accept this or some subset of it, happy to tidy it up, add docs about the optimizations we're relying on here, actually test it with anything but make under gcc, etc etc.

First pass at converting the FORTIFY magics in the kernel to use _chk
intrinsics, rather than having all size checks inline.

Notes are included for which `_chk` functions clang/gcc (since 5.1)
recognize. If a compiler recognizes `__foo_chk`, the compiler is capable
of optimizing it to `foo` if the `__foo_chk` variant would just call
`foo()` anyway.

For clang in particular, these transformations are a win. For GCC, it's
_potentially_ better, but less certain than for clang.
Copy link
Member

@nathanchance nathanchance left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of style/organization comments, the overall idea seems relatively sane to me but I am farrrrrr from an expect in this area :)

cc @kees who is (context: #979)

* Presented is the 'best' middle ground I think we can reach for both
* compilers WRT optimizability.
*/
#ifdef __clang__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CONFIG_CC_IS_CLANG is used more than __clang__ in the kernel (it actually looks like there is some clean up opportunity there...).

size_t p_size = __builtin_object_size(p, 0);

#ifdef __clang__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This #ifdef block is an eyesore but I do not really have another solution to offer :/

In general, the kernel prefers IS_ENABLED(CONFIG_...) to #ifdef (https://elixir.bootlin.com/linux/v5.6.2/source/include/linux/kconfig.h#L71). I have no idea if that is possible here.

Another possible solution would be possibly hoisting this into compiler-gcc.h and compiler-clang.h?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Err, if __strlen_chk() is implemented below, can't this be unconditional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to answer both of your questions together. :)

This CL's intent is to use code that's as optimizeable as possible for clang + GCC. It does so by leaning heavily on Clang and GCC's special knowledge of well-known FORTIFY functions spelled __*_chk.

For example, Clang has special knowledge of __strlen_chk, so it's capable of e.g., folding it into a constant or deferring to a non-FORTIFY'ed strlen if doing so is obviously safe, whereas GCC isn't: https://godbolt.org/z/BBjEDH . The hope is to produce ~equivalent code to having the safety checks inline, for cases where the safety checks can be trivially elided (importantly, including __builtin_object_size(x, n) == -1).

The issue this is all trying to work around is that the sort of inherent "recursion" in many inline FORTIFY functions apparently forces LLVM to consider the FORTIFY'ed version to not be a builtin. I haven't deeply understood the implications of trying to fix that, but I know we've had many issues in the past of the optimizer being 'too smart' with FORTIFY'ed inline functions, so I'm not optimistic that it'll be a straightforward fix.

FWIW, there's a way cleaner approach we can take here (that'd also give clang the power to give nice compile-time diagnostics for trivial bugs. Not as well as GCC, to be fair, but something is better than nothing), but it'd require a fair amount code in an #ifdef clang block. IDK how kernel people feel about those these days. Happy to whip up a minimal example of what that may look like, if you're interested.

fortify_panic(__func__);
return __builtin_memmove(dest, src, count);
}
EXPORT_SYMBOL(__memmove_chk);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original intent was to have all checking inline. Pushing these out external function calls seems not great. Why can't these just all be inline too?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be the least invasive change that appears to work in both GCC and Clang, if my testing is to be believed:

diff --git a/include/linux/string.h b/include/linux/string.h
index 6dfbb2efa815..2388383921bd 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -366,6 +366,16 @@ __FORTIFY_INLINE void *memset(void *p, int c, __kernel_size_t size)
        return __builtin_memset(p, c, size);
 }
 
+#ifdef CONFIG_CC_IS_CLANG
+static inline void *__memcpy_chk(void *p, const void *q, size_t size, size_t p_size)
+{
+       return __builtin_memcpy(p, q, size);
+}
+#define __fortified_memcpy(p, q, size, p_size) __builtin___memcpy_chk(p, q, size, p_size)
+#else
+#define __fortified_memcpy(p, q, size, p_size) __builtin_memcpy(p, q, size)
+#endif
+
 __FORTIFY_INLINE void *memcpy(void *p, const void *q, __kernel_size_t size)
 {
        size_t p_size = __builtin_object_size(p, 0);
@@ -378,7 +388,7 @@ __FORTIFY_INLINE void *memcpy(void *p, const void *q, __kernel_size_t size)
        }
        if (p_size < size || q_size < size)
                fortify_panic(__func__);
-       return __builtin_memcpy(p, q, size);
+       return __fortified_memcpy(p, q, size, p_size);
 }
 
 __FORTIFY_INLINE void *memmove(void *p, const void *q, __kernel_size_t size)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a great mental model for how clang codegens __builtin_memcpy. Namely, whether there are any guarantees that that will always go to a memcpy defined outside of the current TU.

In the past, we've had issues with LLVM seeing things like the above as mutual recursion, and optimizing libfuncs like that to basically nop. Maybe clang has grown guarantees in the interim, but I haven't been paying close enough attention, so I can't say for sure offhand. :)

FWIW, if we don't care about &memcpy == &memcpy or __builtin_object_size(x, 1) being equivalent to __builtin_object_size(x, 0) ISTM we can do

#define __FORTIFY_INLINE_BASE inline __attribute__((always_inline))
#ifdef CONFIG_CC_IS_CLANG
#define __FORTIFY_INLINE static __FORTIFY_INLINE_BASE __attribute__((overloadable)) __attribute__((enable_if(1, "")))
#else 
#define __FORTIFY_INLINE extern __FORTIFY_INLINE_BASE __attribute__((gnu_inline))
#endif 

And everything else should Just Work. This has clang emit our FORTIFY'ed memcpy as a C++-style overload for memcpy: https://godbolt.org/z/qqBP8P .

If we do care about either (or both) of the constraints above, we can declare victory by sprinkling __attribute__((pass_object_size(N)))s around on the params of these functions. It's super simple to do on its own, though admittedly noisy:

/* N.B. -- this also requires the overloadable change in the snippet above */

#ifdef CONFIG_CC_IS_CLANG
#define __pass_object_size const __attribute__((pass_object_size(0)))
#else 
#define __pass_object_size
#endif 

__FORTIFY_INLINE void *memcpy(void * __pass_object_size p, const void *__pass_object_size q, __kernel_size_t size)
{
  ...
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I need to keep __builtin_object_size(x, 0) for memcpy() because the kernel somewhat regularly likes to use the address of a struct member to find where to start writing and then proceeds to intentionally copy across several members. :( The more common flaw is array bounds being overrun with direct access (not with memcpy()), and I'm also trying to get that addressed with UBSan's bounds checking (KSPP#25).

I do want to switch to __builtin_object_size(x, 1) for the str*() family, though, but that's also a separate effort (KSPP#6).

Another issue was the inline requirement being wanted to avoid making function calls. So, while __pass_object_size could absolutely be helpful, I'm not entirely certain it's the right thing for here. I'll need to try it out and see what the resulting kernel image looks like.

My first goal right now is to build testing for the fortify behaviors so any given compiler's output can be validated. I had avoided this in the past because of needing to invent a way to have a build failure be seen as a success by the kernel's build system (i.e. to verify compile-time checks are triggering) in addition to the run-time tests (which I'd add to the kernel's LKDTM subsystem that checks for kernel-faulting style failure conditions).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I need to keep __builtin_object_size(x, 0) for memcpy

Yeah, Bionic and glibc need to do similar things. It's sad, but still I think the best trade-off.

Another issue was the inline requirement being wanted to avoid making function calls.

Totally understandable. To clarify about the following though:

So, while __pass_object_size could absolutely be helpful, I'm not entirely certain it's the right thing for here. I'll need to try it out and see what the resulting kernel image looks like.

FWIW, __pass_object_size doesn't affect inlieability at all when always_inline is present. It only lifts the call to __builtin_object_size() up to the caller: https://godbolt.org/z/LV5TJT . Because of how clang is architected, this helps improve results with __builtin_object_size(x, 1) in some cases. (And can be used for cross-TU __builtin_object_size, but I know of 0 users of that in practice ;) ).

It comes with the side-benefit of making it impossible to take the address of the function that pass_object_size is on, since we're inserting "invisible" params with that attribute. So all &memcpys, for example, will grab the address of the not-FORTIFY'ed wrapper.

I had avoided this in the past because of needing to invent a way to have a build failure be seen as a success by the kernel's build system (i.e. to verify compile-time checks are triggering) in addition to the run-time tests (which I'd add to the kernel's LKDTM subsystem that checks for kernel-faulting style failure conditions).

Fair fair. If it helps spark inspiration, Bionic wraps diagnostic & run-time tests for FORTIFY all up in a single file: https://android.googlesource.com/platform/bionic/+/refs/heads/master/tests/clang_fortify_tests.cpp#600

The diagnostic testing in particular uses clang's internal -verify option, which clang uses for its own diagnostic tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants