Commit Graph

  • 2f82ba0d2b Merge branch 'topic/Add-human-eval-problems' into 'main' main Rowan Walshe 2026-02-09 11:32:15 +00:00
  • 8f9cc4897a Start adding HumanEval problems Rowan Walshe 2026-02-09 11:32:15 +00:00
  • cca2da835c Merge branch 'topic/Add-ability-to-unpack-generated-and-evaluated-datasets' into 'main' Rowan Walshe 2026-01-29 13:54:45 +00:00
  • 9293f60bb2 Apply 1 suggestion(s) to 1 file(s) Rowan Walshe 2026-01-29 13:47:27 +00:00
  • 52a450c624 Address review comments Rowan Walshe 2026-01-28 17:28:36 +00:00
  • 80f1e39d1a Apply 9 suggestion(s) to 2 file(s) Rowan Walshe 2026-01-28 16:52:18 +00:00
  • 6338a6b92d Merge branch 'topic/Remove-proof_steps-metric' into 'main' Rowan Walshe 2026-01-28 11:50:52 +00:00
  • fd3e42bf1c Remove the proof_steps metric Rowan Walshe 2026-01-28 10:32:37 +00:00
  • 891654ba98 Remove data/evalauted and data/generated Rowan Walshe 2026-01-28 10:32:12 +00:00
  • 88bd0c3f18 Make it possible to unpack generated and evaluated datasets Rowan Walshe 2026-01-27 17:53:00 +00:00
  • abb670fd40 Merge branch 'topic/Update-CI' into 'main' Rowan Walshe 2026-01-27 11:58:05 +00:00
  • 46b1d70e4c Add tox.ini and update CI to test using the oldest and newest supported Python Rowan Walshe 2026-01-27 11:58:04 +00:00
  • 4e470a3fcb Merge branch 'topic/Fix-issues-from-non-utf-8-files-in-generated-solution' into 'main' Rowan Walshe 2026-01-26 17:17:13 +00:00
  • 80a1c293d1 Migrate dataset file contents to base64 encoding Rowan Walshe 2026-01-26 17:17:13 +00:00
  • 580cf7b68e Merge branch 'topic/Add-license' into 'main' Rowan Walshe 2025-11-21 16:47:28 +00:00
  • cc52701b27 Merge branch 'mr/recommend-alire-setup' into 'main' Sebastian M'Caw 2025-11-21 16:21:39 +00:00
  • a1b18426b4 Add LICENSE Rowan Walshe 2025-11-21 16:18:21 +00:00
  • 8ea540daf9 Update contents Seb M'Caw 2025-11-21 16:15:26 +00:00
  • adae40b523 Apply 1 suggestion(s) to 1 file(s) Sebastian M'Caw 2025-11-21 16:14:45 +00:00
  • 0299f594dc Recommend using Alire for toolchain installation Seb M'Caw 2025-11-21 10:18:11 +00:00
  • 0a1dd8c9d9 Remove references to GitLab package registry Seb M'Caw 2025-11-21 10:05:39 +00:00
  • cd2e8b3909 Merge branch 'mr/fix-mypy-ci' into 'main' Sebastian M'Caw 2025-11-20 16:54:24 +00:00
  • 2a4a1b7ab0 Treat mypy errors as CI failure Seb M'Caw 2025-11-20 15:53:26 +00:00
  • 361a3c808f Merge branch 'mr/report-cmd' into 'main' Sebastian M'Caw 2025-11-20 16:34:48 +00:00
  • 2fbfb81b34 Merge branch 'mr/use-public-pypi' into 'main' Sebastian M'Caw 2025-11-20 16:03:10 +00:00
  • 8f8d709ed5 Apply 1 suggestion(s) to 1 file(s) Sebastian M'Caw 2025-11-20 15:43:53 +00:00
  • 005d8d4d06 Fix mypy errors Seb M'Caw 2025-11-20 13:44:55 +00:00
  • aaa70d6762 Use public PyPI Seb M'Caw 2025-11-20 12:01:36 +00:00
  • e555264b9b Self-review Seb M'Caw 2025-11-19 14:35:45 +00:00
  • 19cf9928f5 Report proof step count relative to canonical Seb M'Caw 2025-11-19 13:11:50 +00:00
  • 219459064a Remove dead code Seb M'Caw 2025-11-19 11:11:23 +00:00
  • f12abcb92a Revert unnecessary sort_dict() change Seb M'Caw 2025-11-19 11:10:10 +00:00
  • f1cbfa49fb Update README Seb M'Caw 2025-11-19 11:06:43 +00:00
  • fd575871d4 Simplify table printing Seb M'Caw 2025-11-19 10:46:07 +00:00
  • f7ff60aa92 Rename test_report_evaluation_results.py -> test_report.py Seb M'Caw 2025-11-19 10:10:29 +00:00
  • d052851156 Restore signature of report_evaluation_results() Seb M'Caw 2025-11-19 10:08:23 +00:00
  • 3befcbf6cd Simplify Metric constructors Seb M'Caw 2025-11-18 16:16:27 +00:00
  • 8b44444510 Consolidate section value into primary_metric Seb M'Caw 2025-11-18 11:08:58 +00:00
  • 7bc5f63271 Add unproved checks count Seb M'Caw 2025-11-18 09:58:14 +00:00
  • 902a67c419 Add tests Seb M'Caw 2025-11-17 18:44:24 +00:00
  • 1a66cc5f19 Add docstrings Seb M'Caw 2025-11-11 16:39:24 +00:00
  • 19b1353ce9 Add --list-samples option Seb M'Caw 2025-11-11 15:00:20 +00:00
  • 379b42ac9b Add limited option to filter by result Seb M'Caw 2025-11-11 14:18:09 +00:00
  • 1d9384c25f Add option to filter on dataset kind Seb M'Caw 2025-11-11 11:59:26 +00:00
  • 71a0f5d681 Fix zero samples case Seb M'Caw 2025-11-11 11:51:40 +00:00
  • 2cd04698c7 Add metrics from generation Seb M'Caw 2025-11-11 11:24:14 +00:00
  • b2ca9dc4c6 Display more details for proof steps Seb M'Caw 2025-11-11 10:34:55 +00:00
  • da17b12892 Add when kwarg to constuctors Seb M'Caw 2025-11-11 10:19:11 +00:00
  • 97cde2ea35 Add basic report command Seb M'Caw 2025-11-06 19:11:54 +00:00
  • 19739be8a6 Merge branch 'mr/dataset-check-cmd' into 'main' Sebastian M'Caw 2025-11-05 16:37:03 +00:00
  • 0d74ca4a3d Re-organise comments Seb M'Caw 2025-11-04 14:22:14 +00:00
  • 89cd53accf Use conftest.py for fixtures Seb M'Caw 2025-11-04 14:00:33 +00:00
  • 6c8621bed2 Fix log message test Seb M'Caw 2025-11-04 14:17:58 +00:00
  • 5eb35ec123 Apply 3 suggestion(s) to 2 file(s) Sebastian M'Caw 2025-11-04 14:15:01 +00:00
  • 9ce92aa7d2 Add to Makefile Seb M'Caw 2025-10-30 11:20:31 +00:00
  • 2a0cceae15 Update README Seb M'Caw 2025-10-30 11:17:18 +00:00
  • 987fc8a56c Self-review Seb M'Caw 2025-10-30 11:12:18 +00:00
  • 489d1c9caf Define EvaluationStatsInvalid Seb M'Caw 2025-10-29 11:06:39 +00:00
  • 89cc863b3a Rationalise exceptions Seb M'Caw 2025-10-29 10:59:18 +00:00
  • 7bac797bd9 of dataset -> in dataset in exception messages Seb M'Caw 2025-10-27 18:12:02 +00:00
  • 972f8721f9 Rename subcommand to check-datasets Seb M'Caw 2025-10-27 18:00:05 +00:00
  • a57d09141f Add CLI to check other dataset paths Seb M'Caw 2025-10-27 17:52:08 +00:00
  • 64856abaec Add test for diff_dicts() and diff_sequences() Seb M'Caw 2025-10-27 16:07:13 +00:00
  • 9421c7a4ed Add explain sample to test Seb M'Caw 2025-10-27 15:22:24 +00:00
  • 59fe03932b Fix handling of samples compatible with no evals Seb M'Caw 2025-10-27 15:05:18 +00:00
  • 8a41ae8a70 Fix ruff warning Seb M'Caw 2025-10-24 16:56:42 +01:00
  • 69be909dca Check that initial sources are not already solved Seb M'Caw 2025-10-24 16:15:52 +01:00
  • 39d080fb2e Install eval tools in check-base-datasets job Seb M'Caw 2025-10-23 16:06:53 +01:00
  • d2e92fd3d3 Check accuracy of canonical results Seb M'Caw 2025-10-23 11:11:45 +01:00
  • 2900718503 Check for failing canonical evaluation results Seb M'Caw 2025-10-22 13:34:50 +01:00
  • 1ebdd45777 Add CI job Seb M'Caw 2025-10-21 16:54:00 +01:00
  • 84a7e7289d Check compacted and expanded datasets match Seb M'Caw 2025-10-21 16:46:23 +01:00
  • 7475e69c27 Merge branch 'mr/gnatprove-in-ci' into 'main' Sebastian M'Caw 2025-10-23 15:57:29 +01:00
  • 03f0735207 Remove test_prove_ci() Seb M'Caw 2025-10-23 13:19:32 +01:00
  • 1f035977cd Download binary from package registry in CI Seb M'Caw 2025-10-23 12:10:23 +01:00
  • fc8fcbcda3 Merge branch 'mr/improve-prove-eval' into 'main' Sebastian M'Caw 2025-10-21 09:57:51 +01:00
  • 3643792c3b Call gprbuild before gprls Seb M'Caw 2025-10-20 18:12:06 +01:00
  • 2178692973 Resolve macOS's /private symlinks when unpacking DirectoryContents Seb M'Caw 2025-10-20 17:14:25 +01:00
  • cea04a022e Self-review Seb M'Caw 2025-10-16 17:09:06 +01:00
  • a1516f44ef Remove old EvaluationStats format Seb M'Caw 2025-10-16 14:43:00 +01:00
  • 50de6e63b7 Update datasets to new EvaluationStats format Seb M'Caw 2025-10-16 13:48:46 +01:00
  • 1b63c12be4 Include missing ProofChecks in the EvaluationStats Seb M'Caw 2025-10-16 13:16:06 +01:00
  • dd108a234c Remove redundant test sample Seb M'Caw 2025-10-16 11:19:16 +01:00
  • 45a1393fe3 Fix src_pattern matching in subprojects. Seb M'Caw 2025-10-16 10:33:14 +01:00
  • a2467c978c Simplify delegated_fails sample Seb M'Caw 2025-10-16 10:14:20 +01:00
  • 78f05f3d38 Detect when SPARK_Mode is set to Off Seb M'Caw 2025-10-15 12:13:33 +01:00
  • 43041c4113 Generalise src_pattern matching Seb M'Caw 2025-10-14 18:03:40 +01:00
  • 37486bb0c6 Fix alignment Seb M'Caw 2025-10-13 17:21:52 +01:00
  • 67216d74da Add test for type_checked() Seb M'Caw 2025-10-13 17:20:00 +01:00
  • a7a397f38d Add test_index_from_line_and_col() Seb M'Caw 2025-10-03 16:49:43 +01:00
  • 154dfd1510 Update README Seb M'Caw 2025-10-03 16:40:26 +01:00
  • ff7cd0d0cd Add "required_checks" to loader tests Seb M'Caw 2025-10-03 16:34:41 +01:00
  • 3dbe2df1b3 Update datasets Seb M'Caw 2025-10-03 16:03:15 +01:00
  • f05bb560c1 Permit filtering "required_checks" by entity Seb M'Caw 2025-10-03 15:32:48 +01:00
  • 4b763b2113 Add "entities" to expected_spark_files Seb M'Caw 2025-10-03 14:29:56 +01:00
  • c991ce19b4 Call gnatprove without --limit-subp Seb M'Caw 2025-10-03 12:41:13 +01:00
  • e329908105 Fix test_prove_ci() Seb M'Caw 2025-10-03 11:47:55 +01:00
  • d056776522 Add delegated_fails sample Seb M'Caw 2025-10-02 17:19:05 +01:00
  • e828c9ea8b Add "required_checks" to README Seb M'Caw 2025-10-02 11:58:50 +01:00
  • 8d42535391 Run make pack-dataset Seb M'Caw 2025-10-02 10:17:21 +01:00