26 Commits

Author SHA1 Message Date
Rowan Walshe
8f9cc4897a Start adding HumanEval problems 2026-02-09 11:32:15 +00:00
Rowan Walshe
fd3e42bf1c Remove the proof_steps metric 2026-01-28 10:32:37 +00:00
Seb M'Caw
50de6e63b7 Update datasets to new EvaluationStats format 2025-10-16 13:48:46 +01:00
Seb M'Caw
3dbe2df1b3 Update datasets 2025-10-03 16:03:15 +01:00
Seb M'Caw
8501aeb5c9 Add "required_checks" where appropriate 2025-09-26 17:36:25 +01:00
Seb M'Caw
4a9609b4d0 Remove redundant Loop_Invariant from char_count_2 2025-09-25 17:46:45 +01:00
Rowan Walshe
75fcb8902b Add simple unit tests 2025-09-25 08:33:47 +01:00
Seb M'Caw
0786cba6e5 Also fix parameter order warning in base files of swap_depends 2025-09-08 12:33:32 +01:00
Seb M'Caw
b3028e3b8f Update prompt for absolute_value to mention overflow warning 2025-09-08 12:26:08 +01:00
Seb M'Caw
dc588f9d9f Rename build stats attributes 2025-09-02 09:33:06 +01:00
Seb M'Caw
f0b5da214a Cleaner fix for swap_depends compiler warning 2025-08-27 09:32:45 +01:00
Seb M'Caw
e6f71a156e Reorder evals 2025-08-22 17:21:41 +01:00
Seb M'Caw
3f4e934b44 Fix serialisation of EvaluationStats.eval_name 2025-08-22 17:12:23 +01:00
Seb M'Caw
003e02d256 Shorten eval names 2025-08-22 16:15:00 +01:00
Seb M'Caw
b3a3e1bb89 Initial implementation of gprbuild eval 2025-08-22 13:29:46 +01:00
Seb M'Caw
d6c8fbc997 Detect missing subprograms explicitly 2025-08-07 12:39:14 +01:00
Seb M'Caw
c6bd319a6a Restrict analysis with --limit-name 2025-08-07 11:42:31 +01:00
Seb M'Caw
4be79ccdb2 Fix absolute_value sample 2025-08-06 17:11:56 +01:00
Seb M'Caw
eb553d1042 Merge branch 'main' into mr/mcaw/implement-eval 2025-08-06 16:41:52 +01:00
Seb M'Caw
e7d2631607 Fix packing/unpacking with canonical_evaluation_results 2025-08-06 14:29:32 +01:00
Rowan Walshe
48669d0c68 Include more problems 2025-07-29 14:10:46 +01:00
Rowan Walshe
54e2a00d50 Simplify dataset 2025-07-23 18:38:38 +01:00
Rowan Walshe
eee5de308f Add two more problems to the dataset 2025-07-23 17:48:25 +01:00
Rowan Walshe
46f6226c18 Add prompts for ineffective statement examples 2025-07-23 16:16:00 +01:00
Rowan Walshe
0559165777 Update dataset formatting 2025-07-23 16:13:28 +01:00