Rowan Walshe
|
8f9cc4897a
|
Start adding HumanEval problems
|
2026-02-09 11:32:15 +00:00 |
|
Rowan Walshe
|
fd3e42bf1c
|
Remove the proof_steps metric
|
2026-01-28 10:32:37 +00:00 |
|
Rowan Walshe
|
80a1c293d1
|
Migrate dataset file contents to base64 encoding
|
2026-01-26 17:17:13 +00:00 |
|
Seb M'Caw
|
50de6e63b7
|
Update datasets to new EvaluationStats format
|
2025-10-16 13:48:46 +01:00 |
|
Seb M'Caw
|
3dbe2df1b3
|
Update datasets
|
2025-10-03 16:03:15 +01:00 |
|
Seb M'Caw
|
8d42535391
|
Run make pack-dataset
|
2025-10-02 10:17:21 +01:00 |
|
Seb M'Caw
|
8501aeb5c9
|
Add "required_checks" where appropriate
|
2025-09-26 17:36:25 +01:00 |
|
Seb M'Caw
|
4a9609b4d0
|
Remove redundant Loop_Invariant from char_count_2
|
2025-09-25 17:46:45 +01:00 |
|
Rowan Walshe
|
75fcb8902b
|
Add simple unit tests
|
2025-09-25 08:33:47 +01:00 |
|
Seb M'Caw
|
0786cba6e5
|
Also fix parameter order warning in base files of swap_depends
|
2025-09-08 12:33:32 +01:00 |
|
Seb M'Caw
|
b3028e3b8f
|
Update prompt for absolute_value to mention overflow warning
|
2025-09-08 12:26:08 +01:00 |
|
Seb M'Caw
|
dc588f9d9f
|
Rename build stats attributes
|
2025-09-02 09:33:06 +01:00 |
|
Seb M'Caw
|
f0b5da214a
|
Cleaner fix for swap_depends compiler warning
|
2025-08-27 09:32:45 +01:00 |
|
Seb M'Caw
|
e6f71a156e
|
Reorder evals
|
2025-08-22 17:21:41 +01:00 |
|
Seb M'Caw
|
3f4e934b44
|
Fix serialisation of EvaluationStats.eval_name
|
2025-08-22 17:12:23 +01:00 |
|
Seb M'Caw
|
003e02d256
|
Shorten eval names
|
2025-08-22 16:15:00 +01:00 |
|
Seb M'Caw
|
b3a3e1bb89
|
Initial implementation of gprbuild eval
|
2025-08-22 13:29:46 +01:00 |
|
Seb M'Caw
|
d6c8fbc997
|
Detect missing subprograms explicitly
|
2025-08-07 12:39:14 +01:00 |
|
Seb M'Caw
|
c6bd319a6a
|
Restrict analysis with --limit-name
|
2025-08-07 11:42:31 +01:00 |
|
Seb M'Caw
|
4be79ccdb2
|
Fix absolute_value sample
|
2025-08-06 17:11:56 +01:00 |
|
Seb M'Caw
|
eb553d1042
|
Merge branch 'main' into mr/mcaw/implement-eval
|
2025-08-06 16:41:52 +01:00 |
|
Seb M'Caw
|
e7d2631607
|
Fix packing/unpacking with canonical_evaluation_results
|
2025-08-06 14:29:32 +01:00 |
|
Seb M'Caw
|
7849926649
|
Record only timeouts in EvaluationStats
|
2025-08-06 11:37:09 +01:00 |
|
Seb M'Caw
|
45913654d0
|
Remove redundant nesting from serialised DirectoryContents
|
2025-08-06 10:18:22 +01:00 |
|
Rowan Walshe
|
48669d0c68
|
Include more problems
|
2025-07-29 14:10:46 +01:00 |
|