52 Commits

Author SHA1 Message Date
Rowan Walshe
8f9cc4897a Start adding HumanEval problems 2026-02-09 11:32:15 +00:00
Rowan Walshe
fd3e42bf1c Remove the proof_steps metric 2026-01-28 10:32:37 +00:00
Rowan Walshe
891654ba98 Remove data/evalauted and data/generated 2026-01-28 10:32:12 +00:00
Rowan Walshe
80a1c293d1 Migrate dataset file contents to base64 encoding 2026-01-26 17:17:13 +00:00
Seb M'Caw
50de6e63b7 Update datasets to new EvaluationStats format 2025-10-16 13:48:46 +01:00
Seb M'Caw
3dbe2df1b3 Update datasets 2025-10-03 16:03:15 +01:00
Seb M'Caw
8d42535391 Run make pack-dataset 2025-10-02 10:17:21 +01:00
Seb M'Caw
8501aeb5c9 Add "required_checks" where appropriate 2025-09-26 17:36:25 +01:00
Seb M'Caw
4a9609b4d0 Remove redundant Loop_Invariant from char_count_2 2025-09-25 17:46:45 +01:00
Rowan Walshe
75fcb8902b Add simple unit tests 2025-09-25 08:33:47 +01:00
Seb M'Caw
0786cba6e5 Also fix parameter order warning in base files of swap_depends 2025-09-08 12:33:32 +01:00
Seb M'Caw
b3028e3b8f Update prompt for absolute_value to mention overflow warning 2025-09-08 12:26:08 +01:00
Seb M'Caw
9275c65f8f Run make generate-dummy and make evaluate 2025-09-02 14:28:12 +01:00
Seb M'Caw
dc588f9d9f Rename build stats attributes 2025-09-02 09:33:06 +01:00
Seb M'Caw
f0b5da214a Cleaner fix for swap_depends compiler warning 2025-08-27 09:32:45 +01:00
Seb M'Caw
e6f71a156e Reorder evals 2025-08-22 17:21:41 +01:00
Seb M'Caw
3f4e934b44 Fix serialisation of EvaluationStats.eval_name 2025-08-22 17:12:23 +01:00
Seb M'Caw
003e02d256 Shorten eval names 2025-08-22 16:15:00 +01:00
Seb M'Caw
b3a3e1bb89 Initial implementation of gprbuild eval 2025-08-22 13:29:46 +01:00
Seb M'Caw
817606e01f Rerun dummy generation/evaluation 2025-08-20 10:46:13 +01:00
Seb M'Caw
d6c8fbc997 Detect missing subprograms explicitly 2025-08-07 12:39:14 +01:00
Seb M'Caw
c6bd319a6a Restrict analysis with --limit-name 2025-08-07 11:42:31 +01:00
Seb M'Caw
4be79ccdb2 Fix absolute_value sample 2025-08-06 17:11:56 +01:00
Seb M'Caw
eb553d1042 Merge branch 'main' into mr/mcaw/implement-eval 2025-08-06 16:41:52 +01:00
Seb M'Caw
e7d2631607 Fix packing/unpacking with canonical_evaluation_results 2025-08-06 14:29:32 +01:00