Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define models for set and multiset #5963

Merged
merged 65 commits into from
Dec 9, 2024

Conversation

RustanLeino
Copy link
Collaborator

@RustanLeino RustanLeino commented Dec 4, 2024

Description

This PR adds models for set and multiset and proves from those models the consistency of the set/multiset axioms in the Dafny prelude. This is accomplished using the extract-to-Boogie functionality that was added in #5621, which had also defined a model for seq.

In addition to adding and using these models, the PR

  • Improves Source/DafnyCore/Prelude/Makefile
    • Put generated files into Output folder
    • Add SourceModuleName parameter to dafny extract SourceModuleName TargetBoogieFile DafnyFiles
    • Run extraction during CI and confirm that the checked-in DafnyPrelude.bpl has been generated. (This was missing from before, and when I started working on this PR, I found that Dafny had got out of synch with the models and was crashing on them.)
  • Fixes several occurrences in DafnyPrelude.bpl and the BoogieGenerator where a Set#... symbol was used instead of ISet#.... (This was previously undetected, because the Set and ISet types were both synonyms for [Box]bool.
  • Makes small changes to a few test files to make their proofs more stable.

Details of the models

In order to prove the extensionality property of set and multiset, it's important to have a canonical representation of sets and multisets in the model. To do this, I postulated (without a supporting model) a total order on the type Box. (Such postulates are justified by the Axiom of Choice. That seems okay, since Dafny's logic already allows you to prove the Axiom of Choice.)

The aim was for the generated models to be exactly like the handwritten ones. This was not possible, for minor annoying reasons:

  • The exaction mechanism does not offer a way to output a Boogie type synonym (the handwritten prelude had contained type Set = [Box]bool;).
  • The handwritten declarations sometimes omitted parameter names and sometimes gave one : Type to two variables. Boogie allows these, but the extraction mechanism always generates x: Type for each parameter and bound variable.

If token-by-token equality with the handwritten prelude was non-negotiable, then these minor annoyances could have been fixed by embellishing the extractor. However, there is a good reason to change this part of the axiomatizations. Previously, both set and multiset had been defined in terms of Boogie's built-in maps. Thus, to really prove the models consistent, one would have to prove that the additional axioms are a conservative extension of Boogie's maps. This is an unnecessary proof burden. So, instead, the new set and multiset models generate Set and MultiSet as uninterpreted types.

There are three consequences of using uninterpreted types rather than Boogie's built-in map types:

  • This affects other places in the handwritten DafnyPrelude.bpl that assumed maps (for example, Map#Domain(m)[x] now is written Set#IsMember(Map#Domain(m), x)) and affects what BoogieGenerator emits. The PR includes these changes. They have been tested by running the test suite to make sure the correct symbols are emitted.
  • The multiset model used both Boogie's map-select (_[_]) operator and its map-store (_[_:=_]) operator. Thus, the "select of store" axioms had to be added to the model (where they are proved as lemmas).
  • The set model used only the map-select operator. However, the translation into Boogie also made use of Boogie's map comprehensions (lambda binders). This is convenient for BoogieGenerator, because Boogie performs lambda lifting on these lambda expressions. The right thing to do is to move this lambda lifting into Dafny instead of Boogie, and thus to not make use of Boogie's lambda expressions for Dafny set comprehensions. This PR does not do that and instead leaves this as a FIXME. Instead, this PR adds a function Set#FromBoogieMap function, which converts a Boogie [Box]bool map to a Set. This is not sound (indeed, this was not sound in the handwritten axioms, either, which I realized as I was going along). The reason is that one can construct a Boogie [Box]bool map that gives true for an infinite number of Boxes. As a temporary solution (before we implement the necessary set-comprehension lifting directly in Dafny), this seems alright. First, this is no worse than before, since lambda expressions were used before, too. Second, to discover the unsoundness, one needs induction, and neither Boogie nor the SMT solver knows about induciton.

Note that iset still uses the [Box]bool representation in Boogie, since iset is not part of this PR.

Missed opportunities

  • The parameters and bound variables don't follow a consistent naming convention. When I started writing the models, I tried to follow exactly what was in the handwritten DafnyPrelude.bpl, rather than doing such clean-up. However, now that we're changing a lot of things anyway, it would have been nice to have cleaned up those names.
  • In a similar way, I'm thinking it would be nice to change the capitalization of MultiSet to Multiset.
  • There's a predicate $IsGoodMultiSet in the model. I think it had been introduced in the handwritten DafnyPrelude.bpl to fix a soundness issue with the axioms. Surprisingly, I was able to define it as true. In other words, the predicate is effectively useless and the lemmas in the model all hold even if this predicate were removed. That would be nice, because it simplifies Dafny's axiomatization slightly. I did not think about this deeply (because I couldn't find the unsoundness Issue that I seem to recall that $IsGoodMultiSet was there to fix), but think the reason it's okay to leave it out now is that the new multiset axioms don't use Boogie's built-in map type and operators.

Note about reviewing this PR

The set and multiset models comprise more than 2000 lines of Dafny. However, they need not be reviewed line by line, since Dafny checks the proofs. It suffices to review that no cheating is going on (e.g., assume statements or body-less lemmas), and this is easy since the Makefile invokes dafny extract without --allow-warnings or the like. It would be nice if I could also say "The definition of the functions and the statement of the lemmas also don't have to be checked in details--it suffices to check that the extracted Boogie declarations are identical to the previous handwritten ones.". Alas, in addition to whitespace changes and the minor annoyances mentioned above, the new axioms use Set#IsMember, MultiSet#Multiplicity, and MultiSet#UpdateMultiplicity in lieu of _[_], _[_], and _[_:=_], respectively. Regrettably, these changes make it much harder to compare the new definitions with the old. If it's of any consolation, I made the whitespace changes are very carefully as I could to avoid errors and did the renamings only after I had made the whitespace changes. Plus, the test suite still passes after these changes.

A good thing to think about during the review is if it is really okay that $IsGoodMultiSet is defined to be true. It is not needed for proving the other axioms. So why had it been introduced in the first place? (I thought it was to fix a previously discovered unsoundness among the multiset axioms, but I didn't find the exact Issue and its PR fix.)

By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.

atomb
atomb previously approved these changes Dec 7, 2024
Copy link
Member

@atomb atomb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very nice improvement, and a heroic amount of work! The review was definitely made easier by the fact that large swaths didn't need to be carefully inspected, even if, as you say, there are some unnecessary syntactic changes.

I think the one remaining test failure is just due to needing a rebuilt copy of the standard library checked in.


// ---------------------------------------------------------------
// -- Axiomatization of isets -------------------------------------
// ---------------------------------------------------------------

type ISet = [Box]bool;

function ISet#Empty(): Set;
function ISet#Empty(): ISet;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix. :) (And similar elsewhere.)

@atomb
Copy link
Member

atomb commented Dec 7, 2024

As to the previous soundness issues with MultiSet, this issue has a comment from you suggesting missing $IsGoodMultiset antecedents: #4400

We may want to run those same Vampire experiments on the new prelude.

@RustanLeino
Copy link
Collaborator Author

As to the previous soundness issues with MultiSet, this issue has a comment from you suggesting missing $IsGoodMultiset antecedents: #4400

@atomb Thanks for locating Issue #4400. The problem reported in #4400 was caused by the use of Boogie's built-in maps. This essentially made it possible to construct a multiset with a negative multiplicity of an element, namely by writing m[x := -10][x] < 0. This contradicts properties stated among the other axioms.

The new axiomatization uses its own functions for retrieving and updating the multiplicity. These two functions satisfy the "select of store" properties of Boogie's built-in maps under the proviso that the UpdateMultiplicity(m, o, n) is used with a non-negative n -- if n is negative, then the axioms don't promise anything about what the new multiplicity is, other than that it still is non-negative. In the model, UpdateMultiplicity(m, o, n) simply returns m if n is negative.

Thus, I do believe that the methodology of defining this multiset model is good, and thus that the new axioms are indeed sound to use (as the lemmas about them prove).

We may want to run those same Vampire experiments on the new prelude.

Yes, this would be good. @yizhou7 ?

I suggest we leave in the $IsGoodMultiSet function in this PR. We can do a separate PR that optimizes the encoding by removing this function. That would be a good additional time to rerun the experiment with Vampire.

@RustanLeino RustanLeino marked this pull request as ready for review December 7, 2024 02:07
@RustanLeino RustanLeino enabled auto-merge (squash) December 7, 2024 02:08
Copy link
Member

@atomb atomb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree about the multiset axioms. Having Dafny's collection types be purely axiomatized, rather than just mostly, seems better because we now know that the model agrees with the SMT solver's interpretation, and also because Z3 seems better at reasoning about purely quantified things, or purely built-in theories, but often struggles with combinations. And I also agree about leaving the removal of $IsGoodMultiset for a future optimization PR.

@RustanLeino RustanLeino merged commit 0da35e1 into dafny-lang:master Dec 9, 2024
22 checks passed
@RustanLeino RustanLeino deleted the set-model branch December 9, 2024 21:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants