Add a new public PyObject_CallNoArgs() function to the C API: call a
callable Python object without any arguments.
It is the most efficient way to call a callback without any argument.
On x86-64, for example, PyObject_CallFunctionObjArgs(func, NULL)
allocates 960 bytes on the stack per call, whereas
PyObject_CallNoArgs(func) only allocates 624 bytes per call.
It is excluded from stable ABI 3.8.
Replace private _PyObject_CallNoArg() with public
PyObject_CallNoArgs() in C extensions: _asyncio, _datetime,
_elementtree, _pickle, _tkinter and readline.
GH-14039: allow (no more than) one wholly empty arena on the usable_arenas list.
This prevents thrashing in some easily-provoked simple cases that could end up creating and destroying an arena on each loop iteration in client code. Intuitively, if the only arena on the list becomes empty, it makes scant sense to give it back to the system unless we know we'll never need another free pool again before another arena frees a pool. If the latter obtains, then - yes - this will "waste" an arena.
When inheriting a heap subclass from a vectorcall class that sets
`.tp_call=PyVectorcall_Call` (as recommended in PEP 590), the subclass does
not inherit `_Py_TPFLAGS_HAVE_VECTORCALL`, and thus `PyVectorcall_Call` does
not work for it.
This attempts to solve the issue by:
* always inheriting `tp_vectorcall_offset` unless `tp_call` is overridden
in the subclass
* inheriting _Py_TPFLAGS_HAVE_VECTORCALL for static types, unless `tp_call`
is overridden
* making `PyVectorcall_Call` ignore `_Py_TPFLAGS_HAVE_VECTORCALL`
This means it'll be ever more important to only call `PyVectorcall_Call`
on classes that support vectorcall. In `PyVectorcall_Call`'s intended role
as `tp_call` filler, that's not a problem.