Rowan Walshe
|
cca2da835c
|
Merge branch 'topic/Add-ability-to-unpack-generated-and-evaluated-datasets' into 'main'
Make it possible to unpack generated and evaluated datasets
See merge request eng/ai/ada-eval!20
|
2026-01-29 13:54:45 +00:00 |
|
Rowan Walshe
|
9293f60bb2
|
Apply 1 suggestion(s) to 1 file(s)
Co-authored-by: Sebastian M'Caw <mcaw@adacore.com>
|
2026-01-29 13:47:27 +00:00 |
|
Rowan Walshe
|
52a450c624
|
Address review comments
|
2026-01-28 17:28:46 +00:00 |
|
Rowan Walshe
|
80f1e39d1a
|
Apply 9 suggestion(s) to 2 file(s)
Co-authored-by: Sebastian M'Caw <mcaw@adacore.com>
|
2026-01-28 16:52:18 +00:00 |
|
Rowan Walshe
|
fd3e42bf1c
|
Remove the proof_steps metric
|
2026-01-28 10:32:37 +00:00 |
|
Rowan Walshe
|
88bd0c3f18
|
Make it possible to unpack generated and evaluated datasets
|
2026-01-27 17:53:00 +00:00 |
|
Rowan Walshe
|
80a1c293d1
|
Migrate dataset file contents to base64 encoding
|
2026-01-26 17:17:13 +00:00 |
|
Sebastian M'Caw
|
361a3c808f
|
Merge branch 'mr/report-cmd' into 'main'
Add `report` command
See merge request eng/ai/ada-eval!10
|
2025-11-20 16:34:48 +00:00 |
|
Seb M'Caw
|
005d8d4d06
|
Fix mypy errors
|
2025-11-20 13:44:55 +00:00 |
|
Seb M'Caw
|
e555264b9b
|
Self-review
|
2025-11-19 14:35:45 +00:00 |
|
Seb M'Caw
|
19cf9928f5
|
Report proof step count relative to canonical
|
2025-11-19 13:11:50 +00:00 |
|
Seb M'Caw
|
f7ff60aa92
|
Rename test_report_evaluation_results.py -> test_report.py
|
2025-11-19 10:10:29 +00:00 |
|
Seb M'Caw
|
d052851156
|
Restore signature of report_evaluation_results()
|
2025-11-19 10:08:23 +00:00 |
|
Seb M'Caw
|
3befcbf6cd
|
Simplify Metric constructors
|
2025-11-18 16:16:27 +00:00 |
|
Seb M'Caw
|
8b44444510
|
Consolidate section value into primary_metric
|
2025-11-18 11:08:58 +00:00 |
|
Seb M'Caw
|
7bc5f63271
|
Add unproved checks count
|
2025-11-18 09:58:14 +00:00 |
|
Seb M'Caw
|
902a67c419
|
Add tests
|
2025-11-17 18:44:24 +00:00 |
|
Seb M'Caw
|
97cde2ea35
|
Add basic report command
|
2025-11-06 19:13:20 +00:00 |
|
Seb M'Caw
|
0d74ca4a3d
|
Re-organise comments
|
2025-11-04 14:22:14 +00:00 |
|
Seb M'Caw
|
89cd53accf
|
Use conftest.py for fixtures
|
2025-11-04 14:18:21 +00:00 |
|
Seb M'Caw
|
6c8621bed2
|
Fix log message test
|
2025-11-04 14:18:21 +00:00 |
|
Seb M'Caw
|
987fc8a56c
|
Self-review
|
2025-10-30 11:12:18 +00:00 |
|
Seb M'Caw
|
89cc863b3a
|
Rationalise exceptions
|
2025-10-29 10:59:18 +00:00 |
|
Seb M'Caw
|
7bac797bd9
|
of dataset -> in dataset in exception messages
|
2025-10-27 18:18:09 +00:00 |
|
Seb M'Caw
|
972f8721f9
|
Rename subcommand to check-datasets
|
2025-10-27 18:18:09 +00:00 |
|