eccodes icon indicating copy to clipboard operation
eccodes copied to clipboard

BUG assigning file variable to field_tree in index

Open jsanchez-tiempo opened this issue 5 years ago • 5 comments

When _codes_index_add_file is invoked from codes_index_add_file, a new file structure variable (newfile) is created and this variable is added to the list of index files. However, later when the field_tree struct is generated, the file variable obtained from file pool (grib_file_open) is assigned instead. The ID of file variable and newfile variable may not match. Later, this can produce errors when index file is read.

To reproduce an error:

Download and uncompress tar file examples.tar.gz: tar xvfz examples.tar.gz

Compile both programs:

gcc -leccodes readindex.c -o readindex
gcc -leccodes writeindex.c -o writeindex

First execute: ./writeindex

Finally execute: ./readindex

jsanchez-tiempo avatar May 07 '20 14:05 jsanchez-tiempo

CLA assistant check
All committers have signed the CLA.

FussyDuck avatar May 07 '20 14:05 FussyDuck

Thank you for your contribution. I tried your change locally and several tests failed:

        130 - eccodes_f_grib_index (Failed)
        160 - eccodes_p_grib_index_test (Failed)
        173 - eccodes_p_high_level_api_test (Failed)

For example the Fortran test examples/F90/grib_index.sh fails with

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0  0x7f393fbce94f in ???
#1  0x7f393fc159b4 in ???
#2  0x7f393fc0f3a9 in ???
#3  0x7f393fc04d5c in ???
#4  0x7f393fc0a924 in ???
#5  0x7f39420da584 in codes_index_get_handle
        at /tmp/eccodes_develop/src/grib_index.c:1666
#6  0x7f39420dbcc4 in codes_new_from_index
        at /tmp/eccodes_develop/src/grib_index.c:1926
#7  0x7f39420db3fd in grib_handle_new_from_index
        at /tmp/eccodes_develop/src/grib_index.c:1857
#8  0x7f394295e0be in grib_f_new_from_index_
        at /tmp/eccodes_develop/fortran/grib_fortran.c:1930
#9  0x7f39429727d2 in __grib_api_MOD_grib_new_from_index
        at /tmp/cmake_build/build-eccodes/fortran/grib_f90.f90:787
#10  0x7f394297cb49 in __eccodes_MOD_codes_new_from_index
        at /tmp/cmake_build/build-eccodes/fortran/eccodes_f90.f90:695
#11  0x401ca3 in index
        at /tmp/eccodes_develop/examples/F90/grib_index.f90:84
#12  0x402156 in main
        at /tmp/eccodes_develop/examples/F90/grib_index.f90:16
/tmp/eccodes_develop/examples/F90/grib_index.sh: line 14:
67663 Segmentation fault
(core dumped) ${examples_dir}/eccodes_f_grib_index > index_f90.out

And also I ran valgrind on the C example:

valgrind --log-file=vlog tests/grib_indexing data/index.grib

and many errors were issued. e.g.

==80468== Invalid read of size 4
==80468==    at 0x69F18AC: fseek (in /lib64/libc-2.22.so)
==80468==    by 0x53EF584: codes_index_get_handle (grib_index.c:1666)
==80468==    by 0x53F0CC4: codes_new_from_index (grib_index.c:1926)
==80468==    by 0x53F03FD: grib_handle_new_from_index (grib_index.c:1857)
==80468==    by 0x401A63: main (grib_indexing.c:120)
==80468==  Address 0x8964f90 is 0 bytes inside a block of size 552 free'd
==80468==    at 0x4C2A55B: free (vg_replace_malloc.c:540)
==80468==    by 0x69E99B8: fclose@@GLIBC_2.2.5 (in /lib64/libc-2.22.so)
==80468==    by 0x5482981: grib_file_close (grib_filepool.c:330)
==80468==    by 0x53EDCBF: _codes_index_add_file (grib_index.c:1253)
==80468==    by 0x53EC250: grib_index_add_file (grib_index.c:1064)
==80468==    by 0x4010F9: main (grib_indexing.c:45)
==80468==  Block was alloc'd at
==80468==    at 0x4C2932F: malloc (vg_replace_malloc.c:309)
==80468==    by 0x69EA1DC: __fopen_internal (in /lib64/libc-2.22.so)
==80468==    by 0x5482269: grib_file_open (grib_filepool.c:249)
==80468==    by 0x53EC3C1: _codes_index_add_file (grib_index.c:1099)
==80468==    by 0x53EC250: grib_index_add_file (grib_index.c:1064)
==80468==    by 0x4010F9: main (grib_indexing.c:45)

shahramn avatar Jul 18 '20 16:07 shahramn

It appears to me that this PR is fine but triggers a latent bug in the underlying code set. A file pointer gets duplicated by the assignment 'newfile->hanlde = file->handle' but I don't see any code that tries to avoid closing them individually. Once file->handle is closed, then newfile->handle is not valid any more. The file pointer should not be closed until all copies of the pointer are gone.

shinji-s avatar Aug 06 '20 09:08 shinji-s

It appears to me that this PR is fine but triggers a latent bug in the underlying code set. A file pointer gets duplicated by the assignment 'newfile->hanlde = file->handle' but I don't see any code that tries to avoid closing them individually. Once file->handle is closed, then newfile->handle is not valid any more. The file pointer should not be closed until all copies of the pointer are gone.

Dear Shinji, Do you have a fix in mind? we would welcome your valued contribution

shahramn avatar Aug 06 '20 10:08 shahramn

I'm thiking about adding 'grib_file * dependents' member to the grib_file structure and queue 'newfile' in the chain so that grib_file_close_file (and grib_file_delete file too?) can avoid closing the file pointer as long as some 'dependents' exist. I will post a PR if I manage to cook a patch that sees ok.

shinji-s avatar Aug 06 '20 10:08 shinji-s