Define models for set and multiset #5963

RustanLeino · 2024-12-04T20:05:09Z

Description

This PR adds models for set and multiset and proves from those models the consistency of the set/multiset axioms in the Dafny prelude. This is accomplished using the extract-to-Boogie functionality that was added in #5621, which had also defined a model for seq.

In addition to adding and using these models, the PR

Improves Source/DafnyCore/Prelude/Makefile
- Put generated files into Output folder
- Add SourceModuleName parameter to dafny extract SourceModuleName TargetBoogieFile DafnyFiles
- Run extraction during CI and confirm that the checked-in DafnyPrelude.bpl has been generated. (This was missing from before, and when I started working on this PR, I found that Dafny had got out of synch with the models and was crashing on them.)
Fixes several occurrences in DafnyPrelude.bpl and the BoogieGenerator where a Set#... symbol was used instead of ISet#.... (This was previously undetected, because the Set and ISet types were both synonyms for [Box]bool.
Makes small changes to a few test files to make their proofs more stable.

Details of the models

In order to prove the extensionality property of set and multiset, it's important to have a canonical representation of sets and multisets in the model. To do this, I postulated (without a supporting model) a total order on the type Box. (Such postulates are justified by the Axiom of Choice. That seems okay, since Dafny's logic already allows you to prove the Axiom of Choice.)

The aim was for the generated models to be exactly like the handwritten ones. This was not possible, for minor annoying reasons:

The exaction mechanism does not offer a way to output a Boogie type synonym (the handwritten prelude had contained type Set = [Box]bool;).
The handwritten declarations sometimes omitted parameter names and sometimes gave one : Type to two variables. Boogie allows these, but the extraction mechanism always generates x: Type for each parameter and bound variable.

If token-by-token equality with the handwritten prelude was non-negotiable, then these minor annoyances could have been fixed by embellishing the extractor. However, there is a good reason to change this part of the axiomatizations. Previously, both set and multiset had been defined in terms of Boogie's built-in maps. Thus, to really prove the models consistent, one would have to prove that the additional axioms are a conservative extension of Boogie's maps. This is an unnecessary proof burden. So, instead, the new set and multiset models generate Set and MultiSet as uninterpreted types.

There are three consequences of using uninterpreted types rather than Boogie's built-in map types:

This affects other places in the handwritten DafnyPrelude.bpl that assumed maps (for example, Map#Domain(m)[x] now is written Set#IsMember(Map#Domain(m), x)) and affects what BoogieGenerator emits. The PR includes these changes. They have been tested by running the test suite to make sure the correct symbols are emitted.
The multiset model used both Boogie's map-select (_[_]) operator and its map-store (_[_:=_]) operator. Thus, the "select of store" axioms had to be added to the model (where they are proved as lemmas).
The set model used only the map-select operator. However, the translation into Boogie also made use of Boogie's map comprehensions (lambda binders). This is convenient for BoogieGenerator, because Boogie performs lambda lifting on these lambda expressions. The right thing to do is to move this lambda lifting into Dafny instead of Boogie, and thus to not make use of Boogie's lambda expressions for Dafny set comprehensions. This PR does not do that and instead leaves this as a FIXME. Instead, this PR adds a function Set#FromBoogieMap function, which converts a Boogie [Box]bool map to a Set. This is not sound (indeed, this was not sound in the handwritten axioms, either, which I realized as I was going along). The reason is that one can construct a Boogie [Box]bool map that gives true for an infinite number of Boxes. As a temporary solution (before we implement the necessary set-comprehension lifting directly in Dafny), this seems alright. First, this is no worse than before, since lambda expressions were used before, too. Second, to discover the unsoundness, one needs induction, and neither Boogie nor the SMT solver knows about induciton.

Note that iset still uses the [Box]bool representation in Boogie, since iset is not part of this PR.

Missed opportunities

The parameters and bound variables don't follow a consistent naming convention. When I started writing the models, I tried to follow exactly what was in the handwritten DafnyPrelude.bpl, rather than doing such clean-up. However, now that we're changing a lot of things anyway, it would have been nice to have cleaned up those names.
In a similar way, I'm thinking it would be nice to change the capitalization of MultiSet to Multiset.
There's a predicate $IsGoodMultiSet in the model. I think it had been introduced in the handwritten DafnyPrelude.bpl to fix a soundness issue with the axioms. Surprisingly, I was able to define it as true. In other words, the predicate is effectively useless and the lemmas in the model all hold even if this predicate were removed. That would be nice, because it simplifies Dafny's axiomatization slightly. I did not think about this deeply (because I couldn't find the unsoundness Issue that I seem to recall that $IsGoodMultiSet was there to fix), but think the reason it's okay to leave it out now is that the new multiset axioms don't use Boogie's built-in map type and operators.

Note about reviewing this PR

The set and multiset models comprise more than 2000 lines of Dafny. However, they need not be reviewed line by line, since Dafny checks the proofs. It suffices to review that no cheating is going on (e.g., assume statements or body-less lemmas), and this is easy since the Makefile invokes dafny extract without --allow-warnings or the like. It would be nice if I could also say "The definition of the functions and the statement of the lemmas also don't have to be checked in details--it suffices to check that the extracted Boogie declarations are identical to the previous handwritten ones.". Alas, in addition to whitespace changes and the minor annoyances mentioned above, the new axioms use Set#IsMember, MultiSet#Multiplicity, and MultiSet#UpdateMultiplicity in lieu of _[_], _[_], and _[_:=_], respectively. Regrettably, these changes make it much harder to compare the new definitions with the old. If it's of any consolation, I made the whitespace changes are very carefully as I could to avoid errors and did the renamings only after I had made the whitespace changes. Plus, the test suite still passes after these changes.

A good thing to think about during the review is if it is really okay that $IsGoodMultiSet is defined to be true. It is not needed for proving the other axioms. So why had it been introduced in the first place? (I thought it was to fix a previously discovered unsoundness among the multiset axioms, but I didn't find the exact Issue and its PR fix.)

By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.

# Conflicts: # Source/DafnyCore/Verifier/BoogieGenerator.ExpressionTranslator.cs

atomb

This is a very nice improvement, and a heroic amount of work! The review was definitely made easier by the fact that large swaths didn't need to be carefully inspected, even if, as you say, there are some unnecessary syntactic changes.

I think the one remaining test failure is just due to needing a rebuilt copy of the standard library checked in.

atomb · 2024-12-07T00:01:52Z

Source/DafnyCore/DafnyPrelude.bpl


 // ---------------------------------------------------------------
 // -- Axiomatization of isets -------------------------------------
 // ---------------------------------------------------------------

 type ISet = [Box]bool;

-function ISet#Empty(): Set;
+function ISet#Empty(): ISet;


Nice fix. :) (And similar elsewhere.)

atomb · 2024-12-07T00:13:15Z

As to the previous soundness issues with MultiSet, this issue has a comment from you suggesting missing $IsGoodMultiset antecedents: #4400

We may want to run those same Vampire experiments on the new prelude.

RustanLeino · 2024-12-07T02:05:31Z

As to the previous soundness issues with MultiSet, this issue has a comment from you suggesting missing $IsGoodMultiset antecedents: #4400

@atomb Thanks for locating Issue #4400. The problem reported in #4400 was caused by the use of Boogie's built-in maps. This essentially made it possible to construct a multiset with a negative multiplicity of an element, namely by writing m[x := -10][x] < 0. This contradicts properties stated among the other axioms.

The new axiomatization uses its own functions for retrieving and updating the multiplicity. These two functions satisfy the "select of store" properties of Boogie's built-in maps under the proviso that the UpdateMultiplicity(m, o, n) is used with a non-negative n -- if n is negative, then the axioms don't promise anything about what the new multiplicity is, other than that it still is non-negative. In the model, UpdateMultiplicity(m, o, n) simply returns m if n is negative.

Thus, I do believe that the methodology of defining this multiset model is good, and thus that the new axioms are indeed sound to use (as the lemmas about them prove).

We may want to run those same Vampire experiments on the new prelude.

Yes, this would be good. @yizhou7 ?

I suggest we leave in the $IsGoodMultiSet function in this PR. We can do a separate PR that optimizes the encoding by removing this function. That would be a good additional time to rerun the experiment with Vampire.

atomb

I agree about the multiset axioms. Having Dafny's collection types be purely axiomatized, rather than just mostly, seems better because we now know that the model agrees with the SMT solver's interpretation, and also because Z3 seems better at reasoning about purely quantified things, or purely built-in theories, but often struggles with combinations. And I also agree about leaving the removal of $IsGoodMultiset for a future optimization PR.

RustanLeino added 30 commits November 30, 2024 17:09

Add math axioms

f5e642b

Add model for sets

b41ca12

Add model for multisets

a30671b

Combine some modules

87359ff

Clean up Sets.dfy

89f0d81

Fix export set

818b526

Rely on type inference

637aca9

Add all Multiset signatures

25ceda9

Add manual proofs (since auto induction finds no triggers)

fdfdf77

Fix assertion failure

cc776fc

Fix null dereference bug

7576d11

Add source location to extract error messages

705f15d

Extract Mul as a binary operator

b799b66

Add required extract_pattern’s

1176aa1

Update build scripts

2077177

Add SourceModule CLI argument to “extract”

2061bcb

Add model for Multiset

e692837

Update Makefile

33d9a69

Remove some types that are inferred

45d8bc8

Add build/check targets to Makefile

f29e7d0

Change whitespace in DafnyPrelude.bpl

a7fd41e

Change multiset whitespace in DafnyPrelude.bpl

3bc4e24

Make small adjustments in .dfy sources

570884e

Make more whitespace updates and corrections

ccee8c4

More gradual whitespace changes for sets

87b2e4c

More formatting changes

61d4434

Fix set model transcription

bf9b91b

Whitespace changes for multiset

2f1145d

More whitespace changes

73d1936

Fix model pattern

26c47a3

RustanLeino added 19 commits December 3, 2024 21:33

Normalize to ancestor type before calling .AsSetType

5be31d5

Distinguish set/iset for proper subset operator

14d90aa

Improve expected test output

5754f1c

Improve stability of proofs

eb5f17c

Split into two tests

ff1bbe8

Merge branch 'master' into set-model

3f4ad83

# Conflicts: # Source/DafnyCore/Verifier/BoogieGenerator.ExpressionTranslator.cs

Update resource counts

b2a8fef

Make sure DafnyPrelude.bpl generation is intact

5e432f4

Format Dafny

c0b3476

Move point where CI calls check-extract

5b82c72

Let counterexample logic know about Set#IsMember

8377aad

Rewrite proof to try to reduce brittleness

44a3f0c

Merge branch 'master' into set-model

5d65b10

Check for Set#IsMember but still check other Boogie-map selects

5017935

Attempt to reduce resource requirement of test

bbdd3b2

Use Set#IsMember for map-domain containment

8129993

Merge branch 'master' into set-model

f9433c1

Attempt to make proof go through

9e98061

Update stabiliy test expected result

d974951

atomb previously approved these changes Dec 7, 2024

View reviewed changes

Update standard libraries with the improved proofs

38e89c6

RustanLeino dismissed atomb’s stale review via 38e89c6 December 7, 2024 00:16

RustanLeino marked this pull request as ready for review December 7, 2024 02:07

RustanLeino enabled auto-merge (squash) December 7, 2024 02:08

atomb approved these changes Dec 9, 2024

View reviewed changes

Merge branch 'master' into set-model

710578d

RustanLeino merged commit 0da35e1 into dafny-lang:master Dec 9, 2024
22 checks passed

RustanLeino deleted the set-model branch December 9, 2024 21:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define models for set and multiset #5963

Define models for set and multiset #5963

RustanLeino commented Dec 4, 2024 •

edited

Loading

atomb left a comment

atomb Dec 7, 2024

atomb commented Dec 7, 2024

RustanLeino commented Dec 7, 2024

atomb left a comment

Define models for set and multiset #5963

Define models for set and multiset #5963

Conversation

RustanLeino commented Dec 4, 2024 • edited Loading

Description

Details of the models

Missed opportunities

Note about reviewing this PR

atomb left a comment

Choose a reason for hiding this comment

atomb Dec 7, 2024

Choose a reason for hiding this comment

atomb commented Dec 7, 2024

RustanLeino commented Dec 7, 2024

atomb left a comment

Choose a reason for hiding this comment

RustanLeino commented Dec 4, 2024 •

edited

Loading