* add fake_cvc5 script
* remove yet another list of "all" provers, replace by reference to
config file
* improve setting of default 100% baseline, allow for partial
baseline
* output all baseline violations instead of just one
Change-Id: Ib27f23f3046e9fa43dc2f32f74d6d71a624118fc
Commit scripts for a new testsuite that runs provers on all VCs for
a given test, using timeouts (and not steps). Criteria for success
are success percentages.
* Makefile:
entry for new testsuite, to be used by anod script
* benchmark_script: removed or moved scripts elsewhere
* testsuite/gnatprove/MANIFEST.bench: list of tests to be run in
this testsuite
* testsuite/gnatprove/bench: scripts for new testsuite
* testsuite/gnatprove/bench/create_benchmarks.py:
script to run generate all VCs for selected tests
* testsuite/gnatprove/bench/benchtests.py:
script to run provers on all VCs
* testsuite/gnatprove/bench/process_results.py:
script to collect results and produce testsuite-like output
* testsuite/gnatprove/bench/prover_stats.py:
moved from benchmark_script folder; fixed support for alt-ergo
* testsuite/gnatprove/lib/python/test_support.py:
don't use replay in benchmark mode
* testsuite/gnatprove/tests/*/bench.yaml: baseline files for new
testsuite, specify percentages for each prover
Change-Id: I936a928aaee3c7f94c7b8aefe5854f111b7c0763
cvc5 command line options and output changed from cvc4, so we add a
new class to this benchmarking script
Change-Id: I2cc09648c4f8326e612ad4abde1c2acb15c3695d
The prover stats script was reading from the why3server socket and
hoping to always get complete answers from the why3server. With
higher parallelism messages seem to be cut more often and we need to
deal with this by buffering the messages to find the newline.
Change-Id: I94a4d3e32347107021409bd3199c067a2d013886
Previously, if a check had several VCs associated with it, only the
first one was generated. We now generate all of them for each
prover.
Change-Id: I53485a8e63a079baa51ffbff85bf0a0da1929c4b
scripts to extract (and run provers on) VCs from the testsuite were
still in python2, and not up to date with the recent testsuite
changes.
Change-Id: I39d4f301c62982ed4713fb0eedac91ca458c10c2
Replace the static why3.conf file by a JSON file "gnatprove.conf" with
roughly the same contents. The why3.conf file is now dynamically
generated on each invocation of gnatprove dynamically. This will allow
for more flexibility when it comes to prover wrappers.
* why3.conf: removed
* gnatprove.conf: added
* why3.conf.fake: removed
* gnatprove.adb:
(Write_Why3_Conf_File): read gnatprove.conf file and write
why3.conf file in object directory
Change-Id: Ibd152fda35be56e460a945a8eb8c970fa1f49721
We now implement --benchmark mode by a why3.conf file that points to the
fake executables instead of the real ones. Gnatwhy3 will read that fake
file instead of the real one in benchmark mode.
I have added a blend between Claire's and my scripts; added a hidden
--benchmark option to gnatprove, and a corresponding option to the
testsuite and a bunch of scripts (and some docs) to make use of all this.