Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stage 4 & stage 5 #37

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft

stage 4 & stage 5 #37

wants to merge 4 commits into from

Conversation

nickdesaulniers
Copy link
Member

No description provided.

This is a build of LLVM instrumented with PGO. The two relevant cmake
defines for this are:
- LLVM_BUILD_INSTRUMENTED
- LLVM_VP_COUNTERS_PER_SITE

I was hoping to be able to use AutoFDO instead, but I couldn't get the
AutoFDO perf data post-processing tool (create_llvm_prof) to work.

Some interesting env var's to use to test the resulting binary are
- LLVM_PROFILE_FILE
- LLVM_PROFILE_VERBOSE

Otherwise this instrumented version of clang will try to write profile
files to a path that exists in the container, but not outside of it:

  LLVM Profile Error: Failed to write file
  "/llvm-project/llvm/build/profiles/*.profraw": No such file or directory
A few issues encountered, will file TODO issues.
- Enabling LTO causes the build of llvm-tblgen to segfault. Not sure if
  this is a ulimit issue related to pids?
- The resulting clang is only a few seconds faster, with memory fns and
  str fns from musl dominating the top twenty functions of a profile.
@nickdesaulniers nickdesaulniers changed the title Stage4 stage 4 & stage 5 Jun 17, 2022
Comment on lines +285 to +300
#RUN ninja -C ${LLVM_BUILD_DIR} install-clang install-lld
#RUN ninja -C ${LLVM_BUILD_DIR} install-clang-resource-headers
#RUN ninja -C ${LLVM_BUILD_DIR} \
#install-llvm-ar \
#install-llvm-nm \
#install-llvm-objcopy \
#install-llvm-objdump \
#install-llvm-ranlib \
#install-llvm-readelf \
#install-llvm-strip

#RUN apk del cmake ninja python3

# Final test
#RUN llvm-readelf -p .comment $(which clang) | grep -e clang -e LLD
#RUN llvm-readelf -p .comment $(which clang) | grep -v GCC
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably want to keep these, probably also want to think about exporting the whole suit of tools...

@nickdesaulniers nickdesaulniers marked this pull request as draft June 17, 2022 23:36
@nathanchance
Copy link
Member

I know this is WIP but does this need to be a whole new epoch? We should be able to just add stage 4 and stage 5 into the epoch 3 Dockerfile at this point, right?

I'll review this more on Tuesday.

RUN clang ${SYSROOT} hello.c && ./a.out && \
clang ${SYSROOT} hello.c -static && ./a.out && \
clang++ ${SYSROOT} hello.cpp && ./a.out && \
clang++ ${SYSROOT} hello.cpp -static -lc++abi && ./a.out
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the SYSROOT ARG?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not "install" the sysroot we built to the standard /usr/local/. I think I need to play around with --prefix when configuring.

I somewhat think it's interesting to keep the sysroot separate, but the kernel has some hermiticity issues with host utilities that make building it more difficult.

Specifically, I'm using ARG to shorten repeated use of long command line options; not for someone to override at docker build time.

Should I install the sysroot to /usr/local/ and drop the explicit --sysroot= flags?

### Zlib
COPY --from=source zlib-1.2.12.tar.gz .
RUN tar xf zlib-1.2.12.tar.gz
ARG ZLIB_DIR=zlib-1.2.12/build
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the value of this. Do you intend for it to be controllable by the invoker?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, just making it shorter to refer to multiple times.

RUN apk del make

### LLVM
COPY --from=source llvm-project-14.0.1.src.tar.xz .
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whats the value of the COPY instead of just downloading it? If you are trying to save space by avoiding the layer, it seems like building and copying artifacts is what you want.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not trying to optimize space of the docker image at all; rather I'm trying to optimize development time. It's faster to copy the tarball between local images than to keep redownloading it.

@nickdesaulniers
Copy link
Member Author

I know this is WIP but does this need to be a whole new epoch? We should be able to just add stage 4 and stage 5 into the epoch 3 Dockerfile at this point, right?

Hmm...I guess we could. Let me think about that question more and play with making such a change. I don't immediately see any reason why not to do so.

Comment on lines +12 to +13
LLVM_PROFILE_FILE=foo.profraw
LLVM_PROFILE_VERBOSE=1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As shellcheck points out, these should be exported if we are going to keep them around (more on that below).

RUN find ${LLVM_BUILD_DIR} -name \*.profraw | xargs rm

### Start doing kernel builds
# TODO: objtool needs libelf.h and gelf.h? elfutils-dev is the alpine package.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +228 to +267
### Linux
COPY --from=source linux-5.18-rc6.tar.gz .
RUN tar xf linux-5.18-rc6.tar.gz
RUN apk add make musl-dev rsync
RUN make -C linux-5.18-rc6 INSTALL_HDR_PATH=/sysroot/usr LLVM=1 -j$(nproc) headers_install
RUN apk del rsync musl-dev make

### Musl
COPY --from=source musl-1.2.3.tar.gz .
RUN tar xf musl-1.2.3.tar.gz
ARG MUSL_DIR=musl-1.2.3/build
RUN mkdir -p ${MUSL_DIR}
RUN cd ${MUSL_DIR} && \
CC=clang AR=llvm-ar RANLIB=llvm-ranlib \
../configure --prefix=/usr --syslibdir=/usr/lib
RUN apk add make
RUN make -C ${MUSL_DIR} -j$(nproc)
RUN make -C ${MUSL_DIR} -j$(nproc) DESTDIR=/sysroot install-headers
RUN make -C ${MUSL_DIR} -j$(nproc) DESTDIR=/sysroot install-libs
RUN apk del make

# Pause for a quick sanity check
COPY hello.c hello.cpp /
ARG SYSROOT=--sysroot=/sysroot
RUN clang ${SYSROOT} hello.c && ./a.out && \
clang ${SYSROOT} hello.c -static && ./a.out && \
clang++ ${SYSROOT} hello.cpp && ./a.out && \
clang++ ${SYSROOT} hello.cpp -static -lc++abi && ./a.out

### Zlib
COPY --from=source zlib-1.2.12.tar.gz .
RUN tar xf zlib-1.2.12.tar.gz
ARG ZLIB_DIR=zlib-1.2.12/build
RUN mkdir -p ${ZLIB_DIR}
RUN cd ${ZLIB_DIR} && \
CC="clang ${SYSROOT}" AR=llvm-ar ../configure --prefix=/sysroot/usr
RUN apk add make
RUN make -C ${ZLIB_DIR} -j$(nproc)
RUN make -C ${ZLIB_DIR} -j$(nproc) install
RUN apk del make
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we build the sysroot twice? Wouldn't it be better to build it once in its own image then copy it as we need it?

Comment on lines +104 to +106
llvm-profdata merge --output=${arch}.profdata.prof \
$(find ${LLVM_BUILD_DIR} -name \*.profraw) && \
find ${LLVM_BUILD_DIR} -name \*.profraw | xargs rm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reasoning behind merging the profiles after each kernel build then merging them all at the end? Wouldn't be be simpler to merge them all at the end?

Additionally, if these remain around, I think you can avoid the find call, as the .profraw files should be in ${LLVM_BUILD_DIR}/profiles.

rm ${LLVM_BUILD_DIR}/profiles/*.profraw

ln -s /usr/lib/libcrypto.so /sysroot/usr/lib/.
# bpf/resolve_btfids doesn't respect HOSTCFLAGS
RUN cp -r /sysroot/usr/include/ /usr/local/
# TODO: something in the s390 build isn't respecting HOSTCC.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't set HOSTCC in the make command below, is that the problem?

$(find ${LLVM_BUILD_DIR} -name \*.profraw) && \
find ${LLVM_BUILD_DIR} -name \*.profraw | xargs rm

# Mips needs bash if building all targets.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation/process/changes.rst was recently updated to include bash as one of the required tools for building the kernel so this comment seems superfluous.

$(find ${LLVM_BUILD_DIR} -name \*.profraw) && \
find ${LLVM_BUILD_DIR} -name \*.profraw | xargs rm

# riscv requires perl to build
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several perl scripts in the kernel and it is listed as required in Documentation/process/changes.rst so I think this can be dropped like the bash comment above.

#ARCH=s390 CROSS_COMPILE=s390x-linux-gnu- make CC=clang -j72 defconfig
RUN ARCH=${arch} HOSTCFLAGS=${SYSROOT} HOSTLDFLAGS=${SYSROOT} \
CROSS_COMPILE=s390x-alpine-linux-musl- \
CC=clang LLVM_IAS=0 \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we mentioned offline, if we upgrade to a 5.19 prerelease, we can drop the LLVM_IAS=0 here.

RUN apk add make flex bison
ARG arch=arm
RUN ARCH=${arch} HOSTCFLAGS=${SYSROOT} HOSTLDFLAGS=${SYSROOT} \
make -C linux-5.18-rc6 LLVM=1 -s -j$(nproc) allnoconfig all && \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should benchmark the performance gains between allnoconfig and defconfig. I worry that allnoconfig might not provide as much coverage.

nickdesaulniers added a commit that referenced this pull request Jul 1, 2022
The following symbols from musl's malloc implementation were appearing
in a perf profile of a kernel build (from stage 5, which isn't landed
yet, see #37).

Samples: 13M of event 'cycles:ppu', Event count (approx.): 9699067643159
  Overhead  Shared Object              Symbol
+    1.72%  clang                      [.] __libc_malloc_impl
     0.80%  clang                      [.] __libc_free
+    0.67%  clang                      [.] alloc_slot
+    0.65%  clang                      [.] get_meta

Replace musl's malloc with jemalloc.

Fixes #36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants