The code now uses direct pointer access from C code to write the colors to the destination texture instead of trying to force them back up into an __m128i and a single write call. This is what produces the major speed-up.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6761 8ced0084-cf51-0410-be5f-012b33b47a6e
GX_TF_RGBA8 texture decoder optimized with SSE2 producing a ~68% speed increase over reference C implementation.
TABified the entire document per NeoBrainX. :)
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6706 8ced0084-cf51-0410-be5f-012b33b47a6e
~25% speed improvement in decoding GX_TF_RGB5A3 textures which aren't used very much. I thought it would help for movie playback but I misled myself. Video playback has nothing to do with this texture format.
Next I'll see if I can knock out some of these other texture decoders. Byte swizzling I'm sure can somehow be accomplished using _mm_unpacklo_epi8 trickery, so that'd be another big win I hope.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6698 8ced0084-cf51-0410-be5f-012b33b47a6e
TextureDecoder.cpp: merged two functionally identical decode5A3RGBA and decode5A3rgba methods.
OpcodeDecoding.cpp and DLCache.cpp: optimization for GX_LOAD_XF_REG. The PSUHFB solution sounds better for SSSE3, but this is a small win for the default case.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6692 8ced0084-cf51-0410-be5f-012b33b47a6e
use plain vertex arrays instead of VBOs to render in Opengl plugin as the nature of the data make VBOs slower. This must bring, depending on the implementation, a good speedup in opengl.
in my system now opengl and d3d9 have a difference of 1 to 5 fps depending of the game.
some cleanup and a little work pointing to future improvements in the way of rendering.
please test and check for any errors.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6139 8ced0084-cf51-0410-be5f-012b33b47a6e
The GCC model for extended instructions like these is that you compile
with -msse3 etc. These affect code generation for whole compilation units,
so the idea is that you have a separate .c file for each instruction set
class and then indirect to the desired one at runtime.
Without e.g. -msse4.1, the GCC built-ins used by <foointrin.h> are not
available. However, in our specific case of compiling with -msse2 and
wanting to use SSE3.1 code, enough built-ins are available that we only
need to provide a little hack for pshufb.
Upgrading this to also use SSE4.1 instructions doesn't appear feasible
without a lot of undesirable duplication of GCC built-in functions and
headers, so we'd probably have to move to the GCC model of separate
source files for that.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@6014 8ced0084-cf51-0410-be5f-012b33b47a6e
Fix the decoder codepath when OpenCL is enabled and the DX11 plugin is used.
Added the DX11 plugin to the Dolphin project dependencies.
git-svn-id: https://dolphin-emu.googlecode.com/svn/trunk@5764 8ced0084-cf51-0410-be5f-012b33b47a6e