Implement Precompile Shader Archive
When using GLES2 backend, every Fixed Pipeline Function (so OpenGL 1.x) can lead to the creation of a new shader program. Some can take a bit of time to compile and link (like when a lot of lights are involved), giving some "hicup" to a game. That long loading time can be seen when launching Foobillard++ or Neverball for example.
Because Fixed Pipeline Emulator always generate the same shaders program, thoses could be saved for later use: by creating an Archive containing past build FPE program, and using the GL_OES_program_binary extension to save / load the program binary and avoid the compiling and linking part.
The PSA will be available only if the extension is present, and if it support at least 1 format for binary programs.
On linux, the Archive will be save in the HOME folder, as a hidden file (named .gl4es.psa)
On AmigaOS4, it will be in PROGDIR: as a hidden file (same name as linux)
TODO: Were to put the archive on Android TODO: Were to put the archive on Emscripten
Implemented in commit 601184b1b7e63cd1f85b5360207bb763efa8fe82 and fixed with 5a189a1f30ae73a4849a2e4f3f18259e6484ec55. Disabled by default for now.
While Daniel working on adding necessary functions to amigaos4 driver, i still did test how it all handles currently by settings LIBGL_NOPSA to 0.
So, i just run neverball and exit. As result, it create .gl4es.psa file in the root of game (good!), which is 43 bytes of size (so just header saying GL4ES PrecompiledShaderArchive + some bytes of structure format or something). When i exit from game i also have at end "LIBGL: Saved a PSA with 0 Precompiled Programs" (that of course expected, as there is no functions implemented).
Through, should't there be error or something (or log message) saying when running something like "LIBGL: Forcing to use PSA, but functions didn't work" , or something ? Or it make no sense ?
Mmm, yes, the archive should not be created. I'll check that !
@kas1e : with comit 37d3b629580f2289703d5ee83f701c38cabf08ef it should now ignore PSA if the extension is not present (you will have to delete yourself the previous empty one).
Btw, did you test PSA on Pandora already ? I.e. with fricking shark while playing and loading of neverball for example. Interesting to know if on your side with pandora's gles2 all works as expected now ?
I tested quickly on the Pandora, and it seems promising yes. Starting time for Neverball is much faster now. I need to do more testing, but that looks good (I haven't tried Friking Shark yet).
Is pandora's gles2 driver also have needs to assembly from spirv format those precompiled shaders, or they already in the machine code and not in spirv saved in .gl4es.psa file ?
No spirv on the pandora, it's already machine code specific to the PowerVR it use.
So if then take in account that Hans will not do it in Nova, then maybe asking Daniel to save psa not in spirv, but in Nova's assembly code will make sense.. (If nova provide such conversion functions to public)
yep
But note that I don't think it's possible to access Nova compiled program shader if Hans don't write the equivalent of glProgramBinary(...) and glGetProgramBinary(...) into Nova. That's why Daniel proposed to do it using the SpirV way.
But imho problem with those "micro-pauses-hickups" happens because there needs to convert things from one format to card specific code in realtime , and when we have precompiled shaders (and if they saved in the machine code), there will be no needs to convert anything, just send when need it. So, imho, if only ogles2 will have those functions, but shaders will be in machine code, then everything should be fine already, no ?:)
But, there is another issue arise : machine specific code probabaly will be specific also for different gfx cards. In case with Pandora as i understand there just one single gfx card all the time, so , all precompiled shaders on pandora will always working on all pandoras, while, in case with amigaos4 (or any other desktop os where we can change cards), we have different gfx card, which mean different machine code, which mean that we can't in end save machine-specific code, right ? And on pandora it works because one single card everywhere ?
From another side, it really doesn't matter if it will be different for different cards : it only will mean that we can't release with included .gl4es.psa , but one which will be generated for user on user's machine will be then later used with no probs, only drawback will be that first time will need to play with hickups..
Yes, the .gl4es.psa is supposed to be build on the target computer, not bring by the app. That means the 1st run will still get the hicup, but other runs will not. It the same on the Pandora, there are 3 different model with slightly diferent hardware, and there are a lot of version of the driver user can choose from, so I don't plan to put any pre-baked gl4es.psa in games, and will one be build by itself.
Aha got it.. So let's wait and see if implementation as Daniel do with "spirv" on top of nova conversion will give enough boost or not. If it will be enough, then our version will be even portable across different gfx card, but if not , then probabaly we will need ask Hans to provide public api to generate machine ready code.
Hi :)
So Daniel send me first version with his implementation of necessary functions, that what readme says:
- Implemented the OES_get_program_binary extension as requested by Kasie Kasovich (kas1e). Because Nova doesn't provide any support here, this implementation works with the internal SPIR-V code of all shaders that make up the respective program. So the binaries here are no true machine-code but intermediate code, packed into a proprietary format. Providing such binaries essentially skips the vertex- and fragment-shader's GLSL compilation steps. Anyway, this means:
- new function GetProgramBinaryOES to extract such a binary representation of the respective successfully linked shader program.
- new function ProgramBinaryOES to supply the GL with such a cached binary.
- support for the glGet parameter GL_NUM_PROGRAM_BINARY_FORMATS_OES - which returns 1 now.
- support for the glGet parameter GL_PROGRAM_BINARY_FORMATS_OES, which returns one single format ID identifying my binary format (0xC00DA675)
- support for the glGetProgramiv parameter GL_PROGRAM_BINARY_LENGTH_OES which gives you the buffer size required to store the binary for the respective program.
- consequently add GL_OES_get_program_binary to the extension string.
So, i run Neverball over new binary, and gl4es says me in output:
LIBGL: Extension GL_OES_get_program detected LIBGL: Numberof supported Program Binary Format : 1
And nothing more related to it.
So , after i exit from Neverball, i didn't have any .gl4es.psa file created. I run some DOS tracer (to see if it tries to create that file) and nope, file even didn't tries to be created.
Have any clue what to check next ?:)
Do you see "LIBGL: Shuting down" at the end of the program at least?
The function close_gl4es() from src/gl/init.c at this end of the file, declared as a "destructor" should print it.
Yeah of course i see that one, i just print only info relevant to PSA, but there is full log (supertuxkart0.6.2a, latest gl4es):
4/0.Work:games/supertuxkart> supertuxkart_gl4es_1916
LIBGL: Initialising gl4es
LIBGL: v1.1.1 built on Jul 22 2019 23:35:10
LIBGL: Using GLES 2.0 backend
LIBGL: Using Warp3DNova.library v1 revision 65
LIBGL: Using OGLES2.library v2 revision 9
LIBGL: OGLES2 Library and Interface open successfuly
LIBGL: Targeting OpenGL 2.0
LIBGL: Forcing NPOT support by disabling MIPMAP support for NPOT textures
LIBGL: Not trying to batch small subsequent glDrawXXXX
LIBGL: Current folder is:/Work/games/supertuxkart
Data files will be fetched from: '.'
Highscores will be saved in './.supertuxkart/highscore.data'.
LIBGL: Hardware test on current Context...
LIBGL: Hardware Limited NPOT detected and used
LIBGL: Extension GL_EXT_blend_minmax detected and used
LIBGL: FBO are in core, and so used
LIBGL: PointSprite are in core, and so used
LIBGL: CubeMap are in core, and so used
LIBGL: BlendColor is in core, and so used
LIBGL: Blend Substract is in core, and so used
LIBGL: Blend Function and Equation Separation is in core, and so used
LIBGL: Texture Mirrored Repeat is in core, and so used
LIBGL: Extension GL_OES_mapbuffer detected
LIBGL: Extension GL_OES_element_index_uint detected and used
LIBGL: Extension GL_OES_packed_depth_stencil detected and used
LIBGL: Extension GL_EXT_texture_format_BGRA8888 detected and used
LIBGL: Extension GL_OES_texture_float detected and used
LIBGL: high precision float in fragment shader available and used
LIBGL: Extension GL_EXT_frag_depth detected and used
LIBGL: Max vertex attrib: 16
LIBGL: Extension GL_OES_get_program detected
LIBGL: Number of supported Program Binary Format: 1
LIBGL: Max texture size: 16384
LIBGL: Max Varying Vector: 32
LIBGL: Texture Units: 8(8), Max lights: 8, Max planes: 6
LIBGL: Extension GL_EXT_texture_filter_anisotropic detected and used
LIBGL: Max Anisotropic filtering: 16
LIBGL: Hardware vendor is A-EON Technology Ltd. Written by Daniel 'Daytonta675x' MьЯener @ GoldenCode.eu
LIBGL: OGLES2 Library and Interface closed
LIBGL: Shuting down
I also asking Daniel to recheck new stuff too, he says that glGetProgramiv with GL_PROGRAM_BINARY_LENGTH_OES works and glGetProgramBinaryOES apparently works too.
EDIT: and glProgramBinaryOES works too
Btw, measuring things by some test examples on os4 for now, we found that of course all the cached binaries should be preloaded right on running, because loading binaries from disk can slow the things down the same as before.
But i assume in case with gl4es, on begining it scaning for .gl4es.psa , and if it found and functions/extensions working, it then preload binaries to memory, right ?
Check in https://github.com/ptitSeb/gl4es/blob/master/src/gl/init.c#L569 if the name of the PSA is correct, using some printf
Btw, measuring things by some test examples on os4 for now, we found that of course all the cached binaries should be preloaded right on running, because loading binaries from disk can slow the things down the same as before.
But i assume in case with gl4es, on begining it scaning for .gl4es.psa , and if it found and functions/extensions working, it then preload binaries to memory, right ?
Yes. Loading of the PSA file is done at init of gl4es. Then, when a fpe shader need to be created, it ceck first in the PSA archive (in memory), and create the shader directly if present. If not present, the shader is created in the traditional way, and is added to the in-memory PSA (that is then flaged "dirty"). At shut down of gl4es, the PSA archive is writen back to disk if it's flagged dirty.
Check in https://github.com/ptitSeb/gl4es/blob/master/src/gl/init.c#L569 if the name of the PSA is correct, using some printf
Something wrong with whole "If" started from if(hardext.prgbin_n>0) { line. Maybe missing } or something somewhere, because none printfs is reached there.
As i see for now , all that file creation thing inside of if(globals4es.nopsa==0) { } loops, which probabaly not what it should be ?
look closely, it's nopsa , so it should be 0 to use psa.
That how it looks like for me now:
if (getcwd(cwd, sizeof(cwd))!= NULL)
SHUT(LOGD("LIBGL: Current folder is:%s\n", cwd));
printf("hardext.prgbin_n = %d\n", hardext.prgbin_n);
printf("uuuuuuuuuuuuuu\n");
SHUT(LOGD("OMEGA: 2232323232323ooooo\n"));
if(hardext.prgbin_n>0) {
env(LIBGL_NOPSA, globals4es.nopsa, "Don't use PrecompiledShaderArchive");
printf("eeeeeeeeee\n");
if(globals4es.nopsa==0) {
printf("erewrerrrrrr\n");
cwd[0]='\0';
// TODO: What to do on ANDROID and EMSCRIPTEN?
#ifdef __linux__
const char* home = getenv("HOME");
if(home)
strcpy(cwd, home);
if(cwd[strlen(cwd)]!='/')
strcat(cwd, "/");
#elif defined AMIGAOS4
strcpy(cwd, "PROGDIR:");
#endif
SHUT(LOGD("OMEGA: ooooooooooooo\n"));
if(strlen(cwd)) {
SHUT(LOGD("OMEGA: bbbbbbbbbbbbbb\n"));
strcat(cwd, ".gl4es.psa");
printf("cwd = %s\n", cwd);
fpe_InitPSA(cwd);
fpe_readPSA();
}
}
}
}
And in log, i can see :
hardtext.prgbin_n = 0 uuuuuuuuuuuuuuuuu OMEGA:22232323232323oooooo
and no other prinfs
Mmmm, wait, on AmigaOS, there is no pre-tests, so the prog_n is 0 by default. What is the version of GLES2 that have this extension available ?
that one where those new functions added ? 2.9
Or you mean GLES2 standard ?
that one where those new functions added ? 2.9
Yes, that. Because I'll probably need to hardcode prog_n = 1 on AMIGA if OGLES2 driver >= 2.9 for now (until I find a way to create an offscreen context on Amiga so I can launch some test at start of gl4es).
Yeah, let's do it that way. Or , you simple can assume that ogles2.9 is minimum for gl4es at all
Mmm, wait, I have this in src/agl/agl.c I have
#define MIN_OGLES2_LIB_VERSION 1
#define MIN_OGLES2_LIB_REVISION 22
So, 1.9 doesn't work...