Replies: 6 comments 3 replies
-
Unfortunattely the problem is that we are generating a lot of artifacts in 'airflow' when we mix local virtualenv and dockerized one and when we mount them into a container it can wreak havoc. For example .egginfo files generated automatically when you run 'pip install .'. This means that if you have local virtualenv based on python 3.6 and then want to enter breeze with python 3.7 and get egg-info from python 3.7 you will get really. really weird errors sometimes. We already try to mitigate some other - similar - kinds of errors (for example by deleting all .pyc generated code and pycache_files) while entering breeze, but *egg-info particularly you should not really delete. This problem - of course - only occurs on Mac, because Mac filesystem for docker is super-slow. On linux you see no difference whatsoever. I used to develop Airflow on Mac but I moved to Linux for quite some time but I know the pain of the slow filesystem well enough. I will try to see if I can find the reason for such speedup in this case and maybe I can try to find another way (I will dust of my old macbook and see what we can do ). |
Beta Was this translation helpful? Give feedback.
-
Thanks for the thoughts Jarek. |
Beta Was this translation helpful? Give feedback.
-
want to highlight one thing.... the speedup from reducing to a single bind mount is multiplied times the number of docker hooks you have to run so, commonly it will be pylint + mypy + flake8 --- and this can add up to nearly a minute. |
Beta Was this translation helpful? Give feedback.
-
And just to comment on that - this is really important to speed it up. I am fully aware of that, that a lot of people use macs and it would be great to speed it up. I will look to it today. Thanks for checking it @dstandish ! |
Beta Was this translation helpful? Give feedback.
-
q @potiuk is do pylint and mypy and flake really need to run in docker? can they just use same config? |
Beta Was this translation helpful? Give feedback.
-
FYI. I am in the middle of adding the 'mount all sources' option that would be enabled by default for mac users (and try to test it). It is needed for another ticket (providers versioning) so I want to make sure it works there. Answering your question: technically not, but this is needed in order to have consistent environment (and same results as on CI). You could (if you want) run those tests yourself in your virtualenv - and they should work, but if your environments starts to diverge (and people don't update their environments regularly) then you will start having:
Unfortunately all three tools will produce different errors sometimes is the three prerequisites above are not exactly the same. All this is enough (and it happened almost daily when the tests were not run in docker yet) that people will have different errors - either more or less than on CI. The end result was that people were not able to fix those errors and they were starting to open issues "I have no idea how to fix it as I cannot reproduce it" - and rightfully so. So having a docker image (the exact same that is used in CI tests) is the only way I see we can provide consistency of Pylint/MyPy/Flake8 output. Right now I can say "please rebase to latest master and rebuild breeze" if somebody has a different result. But if you are adventurous you can do it yourself simply: a) setup a venv with "[devel_ci]" extra The problem with maintaining airflow venv in sync is that it takes a lot of time to setup (460+ dependencies in total) and while you could - theorethically set it up in .pre-commit as execution environment for pylint/mypy/flake, every time you have a change in setup.py you will have to wait at least 10 minutes to rebuild it. The docker image provides a much better way of synchronizing this - it is highly optimized and has built-in incremental upgrade in case any of the dependencies change. |
Beta Was this translation helpful? Give feedback.
-
TLDR
Should we just mount the whole airflow repo into pre-commit containers, rather than mounting many subdirectories?
On mac, it seems it may provide a 50% speedup when running mypy check on a single changed file.
My odyssey
I was curious why mypy / flake8 / pylint were so slow.
In between jobs, I worked on an older dell laptop running ubuntu and pre-commit seemed faster than I was used to on the much-more-powerful macbook I had been using for work.
This suggested maybe file sharing i.e. volumes could be an issue.
I thought at first maybe it was
_script_init.sh
orbuild_images::prepare_ci_build
orbuild_images::rebuild_ci_image_if_needed
but all of these are fast.It was the actual mypy command.
Strangely, I had observed while monitoring
docker stats
that the mypy container only ran for a small percentage of the total time.What could be taking up the rest of the time, I wondered.
I took a look at the exact docker command generated in mypy.
I found these mounts:
So, I tried removing them, and just running this:
In other words, I tried just mounting the whole airflow directory instead of many subdirectory.
It turns out, this reliably reduced the time to less than half (13 seconds down from 23-29 seconds).
cc breezemaster @potiuk :)
Beta Was this translation helpful? Give feedback.
All reactions