Causes returned by `_attempt_to_pin_criterion` are too broad (causing searching unnecessary parts of a dependency graph) #171

notatallshaw · 2024-10-24T23:21:08Z

In a complex resolution the causes returned by _attempt_to_pin_criterion when rejecting a candidate are far too broad, if we take a look at the example in pypa/pip#13037 (comment) I've created a log that prints out each rejection: pypa/pip#13037 (comment)

Looking at one of the rejections:

Rejecting thinc 8.3.1, due to conflict:
	The user requested numpy==1.21.5
	spacy 3.8.2 depends on numpy>=1.19.0; python_version >= "3.9"
	mlflow 2.17.0 depends on numpy<3
	matplotlib 3.8.4 depends on numpy>=1.21
	pandas 2.0.3 depends on numpy>=1.21.0; python_version >= "3.10"
	pyarrow 17.0.0 depends on numpy>=1.16.6
	scikit-learn 1.5.2 depends on numpy>=1.19.5
	scipy 1.10.1 depends on numpy<1.27.0 and >=1.19.5
	thinc 8.3.1 depends on numpy<2.1.0 and >=2.0.0; python_version >= "3.9"

This should be ideally narrowed to:

Rejecting thinc 8.3.1, due to conflict:
	The user requested numpy==1.21.5
	thinc 8.3.1 depends on numpy<2.1.0 and >=2.0.0; python_version >= "3.9"

This would allow downstream libraries (like pip) that use cases to prefer backtracking to hone in on the correct causes, and it would produce a much more focused message to the user, especially when hitting an impossible resolution.

Because resolvelib is so generic a cause narrowing is a little tricky, but I propose the following steps:

While there are more than 2 causes, loop through each of the causes
Remove that cause and confirm there is still a conflict in the remaining causes
Check there is no conflict between the removed cause and any remaining cause
If the above criteria pass remove it from the cause list

I think this could be a huge speed up for very problamatic resolutions where the downstream library uses the causes to determine what to prefer, I will look to make a PR in the coming weeks.

The text was updated successfully, but these errors were encountered:

notatallshaw · 2024-10-25T13:04:20Z

FYI, I'm not sure in resolvelib's current design cause narrowing like this is possible, but if it can happen at the resolvelib level I think it would be best as then it benefits all clients and keeps resolution optimizations out of the downstream library. So I've opened this issue somewhat optimistically, and I'll report back on my findings once I try.

notatallshaw · 2024-10-26T01:05:02Z

Okay, so specifically the issue is the information included in the Criteron is too broad here: https://github.com/sarugaku/resolvelib/blob/1.1.0b1/src/resolvelib/resolvers/resolution.py#L138

But I'm not sure it's possible to fix this, because it looks like you can't recreate the Criterion object with less information and rerun the if not criterion.candidates: check, because it appears the provider state has already moved on and rerunning the check always passes because there are no candidates left.

And I'm not sure what other test can be done at the resolvelib level to see if a smaller information list still creates a conflict. 🙁

frostming · 2024-10-31T01:19:15Z

I think the main issue is how to detect conflicting version ranges. Look at the following conflicts:

        The user requested numpy==1.21.5
	spacy 3.8.2 depends on numpy>=1.19.0; python_version >= "3.9"
	mlflow 2.17.0 depends on numpy<3
	matplotlib 3.8.4 depends on numpy>=1.21
	pandas 2.0.3 depends on numpy>=1.21.0; python_version >= "3.10"
	pyarrow 17.0.0 depends on numpy>=1.16.6
	scikit-learn 1.5.2 depends on numpy>=1.19.5
	scipy 1.10.1 depends on numpy<1.27.0 and >=1.19.5
	thinc 8.3.1 depends on numpy<2.1.0 and >=2.0.0; python_version >= "3.9"

We can't even decide if numpy==1.21.5 conflicts with numpy<3, at least with the help of packaging. This involves a more complex version range calculator supporting such(superset/subset/disjoint) operations.

I developed a library for this purpose but I don't think it's as mature as packaging to integrate into resolvelib at present.

notatallshaw · 2024-10-31T02:19:31Z

Thanks, I'll take a look.

But it seems to me that's too specific for resolvelib? That it generically deals with conflicts but doesn't consider the source of those conflicts, that's left up to the provider?

Which means any narrowing of causes would need to be left up to the provider, either via existing APIs or a new one.

frostming · 2024-10-31T02:26:52Z

Which means any narrowing of causes would need to be left up to the provider, either via existing APIs or a new one.

Makes sense, would need another method for providers to implement.

notatallshaw · 2024-10-31T14:48:41Z

It should be possible to validate if this works without a new API, the provider could narrow any time they are passed from resolvelib, and do what work the provider needs. I plan to make a PoC on pip side to see if it's worthwhile.

The advantage to a dedicated API would be that resolvelib can reduce the amount it has to keep in state, and the provider wouldn't need to keep narrowing the same causes repeatedly. So if the PoC works out I'll consider what that API should look like.

notatallshaw · 2024-11-05T03:23:26Z

I've hacked together a branch of pip that uses dep-logic to reduce the backtrack causes considered backtracking: pypa/pip@main...notatallshaw:pip:speedy-resolve and so far it looks pretty good. It definetly improved the wall clock time of resolutions, even if not particularly reducing the amount pip had to collect.

I'll clean it up, test it against other optimizations, and look at making a new API for resolvelib so the provider can pass the improvements directly into the resolution.

notatallshaw · 2024-12-10T01:59:06Z

Okay, after working on this for a little bit I realized there's a problem with this approach (or at least implementing it in pip), there are basically two types of causes which cause backtracks:

Logically disjoint, e.g. numpy<=1,numpy>=2
Impossible given the available versions, e.g. numpy>1000

So, if two causes provided are numpy>1 and numpy>1000, these are not logically disjoint but the provider must provide back numpy>1000 on this hypothetical API. Given the way pip and resolvelib interact with each other, it's not clear to me how that can easily be checked from within a provider method, maybe I'm missing something obvious, but I need to spend a bit more time working on a pip implementation to proove to myself it's possible and makes sense.

notatallshaw mentioned this issue Oct 24, 2024

pip prefers old sdists that "obviously" can't work over recent wheels pypa/pip#13037

Open

1 task

notatallshaw linked a pull request Nov 10, 2024 that will close this issue

New disjoint method for provider #179

Draft

notatallshaw mentioned this issue Dec 10, 2024

Back jump doesn't do well in some cases #180

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Causes returned by `_attempt_to_pin_criterion` are too broad (causing searching unnecessary parts of a dependency graph) #171

Causes returned by `_attempt_to_pin_criterion` are too broad (causing searching unnecessary parts of a dependency graph) #171

notatallshaw commented Oct 24, 2024 •

edited

Loading

notatallshaw commented Oct 25, 2024 •

edited

Loading

notatallshaw commented Oct 26, 2024 •

edited

Loading

frostming commented Oct 31, 2024

notatallshaw commented Oct 31, 2024

frostming commented Oct 31, 2024

notatallshaw commented Oct 31, 2024

notatallshaw commented Nov 5, 2024

notatallshaw commented Dec 10, 2024 •

edited

Loading

Causes returned by _attempt_to_pin_criterion are too broad (causing searching unnecessary parts of a dependency graph) #171

Causes returned by _attempt_to_pin_criterion are too broad (causing searching unnecessary parts of a dependency graph) #171

Comments

notatallshaw commented Oct 24, 2024 • edited Loading

notatallshaw commented Oct 25, 2024 • edited Loading

notatallshaw commented Oct 26, 2024 • edited Loading

frostming commented Oct 31, 2024

notatallshaw commented Oct 31, 2024

frostming commented Oct 31, 2024

notatallshaw commented Oct 31, 2024

notatallshaw commented Nov 5, 2024

notatallshaw commented Dec 10, 2024 • edited Loading

Causes returned by `_attempt_to_pin_criterion` are too broad (causing searching unnecessary parts of a dependency graph) #171

Causes returned by `_attempt_to_pin_criterion` are too broad (causing searching unnecessary parts of a dependency graph) #171

notatallshaw commented Oct 24, 2024 •

edited

Loading

notatallshaw commented Oct 25, 2024 •

edited

Loading

notatallshaw commented Oct 26, 2024 •

edited

Loading

notatallshaw commented Dec 10, 2024 •

edited

Loading