115 Commits

Author SHA1 Message Date
Rowan Walshe
cca2da835c Merge branch 'topic/Add-ability-to-unpack-generated-and-evaluated-datasets' into 'main'
Make it possible to unpack generated and evaluated datasets

See merge request eng/ai/ada-eval!20
2026-01-29 13:54:45 +00:00
Rowan Walshe
9293f60bb2 Apply 1 suggestion(s) to 1 file(s)
Co-authored-by: Sebastian M'Caw <mcaw@adacore.com>
2026-01-29 13:47:27 +00:00
Rowan Walshe
52a450c624 Address review comments 2026-01-28 17:28:46 +00:00
Rowan Walshe
80f1e39d1a Apply 9 suggestion(s) to 2 file(s)
Co-authored-by: Sebastian M'Caw <mcaw@adacore.com>
2026-01-28 16:52:18 +00:00
Rowan Walshe
fd3e42bf1c Remove the proof_steps metric 2026-01-28 10:32:37 +00:00
Rowan Walshe
88bd0c3f18 Make it possible to unpack generated and evaluated datasets 2026-01-27 17:53:00 +00:00
Rowan Walshe
80a1c293d1 Migrate dataset file contents to base64 encoding 2026-01-26 17:17:13 +00:00
Sebastian M'Caw
361a3c808f Merge branch 'mr/report-cmd' into 'main'
Add `report` command

See merge request eng/ai/ada-eval!10
2025-11-20 16:34:48 +00:00
Seb M'Caw
005d8d4d06 Fix mypy errors 2025-11-20 13:44:55 +00:00
Seb M'Caw
e555264b9b Self-review 2025-11-19 14:35:45 +00:00
Seb M'Caw
19cf9928f5 Report proof step count relative to canonical 2025-11-19 13:11:50 +00:00
Seb M'Caw
f7ff60aa92 Rename test_report_evaluation_results.py -> test_report.py 2025-11-19 10:10:29 +00:00
Seb M'Caw
d052851156 Restore signature of report_evaluation_results() 2025-11-19 10:08:23 +00:00
Seb M'Caw
3befcbf6cd Simplify Metric constructors 2025-11-18 16:16:27 +00:00
Seb M'Caw
8b44444510 Consolidate section value into primary_metric 2025-11-18 11:08:58 +00:00
Seb M'Caw
7bc5f63271 Add unproved checks count 2025-11-18 09:58:14 +00:00
Seb M'Caw
902a67c419 Add tests 2025-11-17 18:44:24 +00:00
Seb M'Caw
97cde2ea35 Add basic report command 2025-11-06 19:13:20 +00:00
Seb M'Caw
0d74ca4a3d Re-organise comments 2025-11-04 14:22:14 +00:00
Seb M'Caw
89cd53accf Use conftest.py for fixtures 2025-11-04 14:18:21 +00:00
Seb M'Caw
6c8621bed2 Fix log message test 2025-11-04 14:18:21 +00:00
Seb M'Caw
987fc8a56c Self-review 2025-10-30 11:12:18 +00:00
Seb M'Caw
89cc863b3a Rationalise exceptions 2025-10-29 10:59:18 +00:00
Seb M'Caw
7bac797bd9 of dataset -> in dataset in exception messages 2025-10-27 18:18:09 +00:00
Seb M'Caw
972f8721f9 Rename subcommand to check-datasets 2025-10-27 18:18:09 +00:00