BUG assigning file variable to field_tree in index
When _codes_index_add_file is invoked from codes_index_add_file, a new file structure variable (newfile) is created and this variable is added to the list of index files. However, later when the field_tree struct is generated, the file variable obtained from file pool (grib_file_open) is assigned instead. The ID of file variable and newfile variable may not match. Later, this can produce errors when index file is read.
To reproduce an error:
Download and uncompress tar file examples.tar.gz:
tar xvfz examples.tar.gz
Compile both programs:
gcc -leccodes readindex.c -o readindex
gcc -leccodes writeindex.c -o writeindex
First execute:
./writeindex
Finally execute:
./readindex
Thank you for your contribution. I tried your change locally and several tests failed:
130 - eccodes_f_grib_index (Failed)
160 - eccodes_p_grib_index_test (Failed)
173 - eccodes_p_high_level_api_test (Failed)
For example the Fortran test examples/F90/grib_index.sh fails with
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x7f393fbce94f in ???
#1 0x7f393fc159b4 in ???
#2 0x7f393fc0f3a9 in ???
#3 0x7f393fc04d5c in ???
#4 0x7f393fc0a924 in ???
#5 0x7f39420da584 in codes_index_get_handle
at /tmp/eccodes_develop/src/grib_index.c:1666
#6 0x7f39420dbcc4 in codes_new_from_index
at /tmp/eccodes_develop/src/grib_index.c:1926
#7 0x7f39420db3fd in grib_handle_new_from_index
at /tmp/eccodes_develop/src/grib_index.c:1857
#8 0x7f394295e0be in grib_f_new_from_index_
at /tmp/eccodes_develop/fortran/grib_fortran.c:1930
#9 0x7f39429727d2 in __grib_api_MOD_grib_new_from_index
at /tmp/cmake_build/build-eccodes/fortran/grib_f90.f90:787
#10 0x7f394297cb49 in __eccodes_MOD_codes_new_from_index
at /tmp/cmake_build/build-eccodes/fortran/eccodes_f90.f90:695
#11 0x401ca3 in index
at /tmp/eccodes_develop/examples/F90/grib_index.f90:84
#12 0x402156 in main
at /tmp/eccodes_develop/examples/F90/grib_index.f90:16
/tmp/eccodes_develop/examples/F90/grib_index.sh: line 14:
67663 Segmentation fault
(core dumped) ${examples_dir}/eccodes_f_grib_index > index_f90.out
And also I ran valgrind on the C example:
valgrind --log-file=vlog tests/grib_indexing data/index.grib
and many errors were issued. e.g.
==80468== Invalid read of size 4
==80468== at 0x69F18AC: fseek (in /lib64/libc-2.22.so)
==80468== by 0x53EF584: codes_index_get_handle (grib_index.c:1666)
==80468== by 0x53F0CC4: codes_new_from_index (grib_index.c:1926)
==80468== by 0x53F03FD: grib_handle_new_from_index (grib_index.c:1857)
==80468== by 0x401A63: main (grib_indexing.c:120)
==80468== Address 0x8964f90 is 0 bytes inside a block of size 552 free'd
==80468== at 0x4C2A55B: free (vg_replace_malloc.c:540)
==80468== by 0x69E99B8: fclose@@GLIBC_2.2.5 (in /lib64/libc-2.22.so)
==80468== by 0x5482981: grib_file_close (grib_filepool.c:330)
==80468== by 0x53EDCBF: _codes_index_add_file (grib_index.c:1253)
==80468== by 0x53EC250: grib_index_add_file (grib_index.c:1064)
==80468== by 0x4010F9: main (grib_indexing.c:45)
==80468== Block was alloc'd at
==80468== at 0x4C2932F: malloc (vg_replace_malloc.c:309)
==80468== by 0x69EA1DC: __fopen_internal (in /lib64/libc-2.22.so)
==80468== by 0x5482269: grib_file_open (grib_filepool.c:249)
==80468== by 0x53EC3C1: _codes_index_add_file (grib_index.c:1099)
==80468== by 0x53EC250: grib_index_add_file (grib_index.c:1064)
==80468== by 0x4010F9: main (grib_indexing.c:45)
It appears to me that this PR is fine but triggers a latent bug in the underlying code set. A file pointer gets duplicated by the assignment 'newfile->hanlde = file->handle' but I don't see any code that tries to avoid closing them individually. Once file->handle is closed, then newfile->handle is not valid any more. The file pointer should not be closed until all copies of the pointer are gone.
It appears to me that this PR is fine but triggers a latent bug in the underlying code set. A file pointer gets duplicated by the assignment 'newfile->hanlde = file->handle' but I don't see any code that tries to avoid closing them individually. Once file->handle is closed, then newfile->handle is not valid any more. The file pointer should not be closed until all copies of the pointer are gone.
Dear Shinji, Do you have a fix in mind? we would welcome your valued contribution
I'm thiking about adding 'grib_file * dependents' member to the grib_file structure and queue 'newfile' in the chain so that grib_file_close_file (and grib_file_delete file too?) can avoid closing the file pointer as long as some 'dependents' exist. I will post a PR if I manage to cook a patch that sees ok.