GC process reliability
i set up a simple test where i take a filesystem image with no deleted files, delete a bunch, then call SPIFFS_gc and simulate power loss at each write. at some point, when the filesystem is later remounted, one of the files that were not touched (or should not have been, rather) becomes unreadable:
this is a very critical problem, since devices tend to be reset at random times and GC occurence is unpredictable.
$ ./spls 0xdb000.bin
image size: 131072
0 iddddidddddiddd era_cnt: 0
1 diddddddddddddd era_cnt: 0
2 ddddddddddddddd era_cnt: 0
3 dddddidddddidid era_cnt: 0
4 ddddddddddddddd era_cnt: 0
5 dddddiddidddddd era_cnt: 0
6 dddddidddddddid era_cnt: 0
7 ddddddddddd____ era_cnt: 0
8 _______________ era_cnt: 0
9 _______________ era_cnt: 0
10 _______________ era_cnt: 0
11 _______________ era_cnt: 0
12 _______________ era_cnt: 0
13 _______________ era_cnt: 0
14 _______________ era_cnt: 0
15 _______________ era_cnt: 0
16 _______________ era_cnt: 0
17 _______________ era_cnt: 0
18 _______________ era_cnt: 0
19 _______________ era_cnt: 0
20 _______________ era_cnt: 0
21 _______________ era_cnt: 0
22 _______________ era_cnt: 0
23 _______________ era_cnt: 0
24 _______________ era_cnt: 0
25 _______________ era_cnt: 0
26 _______________ era_cnt: 0
27 _______________ era_cnt: 0
28 _______________ era_cnt: 0
29 _______________ era_cnt: 0
30 _______________ era_cnt: 0
31 _______________ era_cnt: 0
era_cnt_max: 1
last_errno: 0
blocks: 32
free_blocks: 24
page_alloc: 116
page_delet: 0
used: 29116 of 113201
-- files:
server.key.pem: 916 0xfcbbff51
conf.json: 1115 0xd0c2b474
conf1.json: 803 0x1b18434d
sys_config_schema.json: 8166 0x61be1ff1
aws-iot-faa6f05c81.crt.pem: 1141 0x9e858b4d
aws-iot-faa6f05c81.key.pem: 227 0xad3bfcc8
ca-verisign-g5.crt.pem: 5142 0x2b19102d
sys_ro_vars_schema.json: 332 0xdf6558a7
server.crt.pem: 2752 0x29941f1d
ca-verisign-ecc-g2.crt.pem: 1590 0x8635500e
conf0.json: 2939 0x0128cb8c
-- overall crc32: 0x3f94ac09
$ ./sptest 0xdb000.bin 38
=== Will fail at write #38
image size: 131072
remove conf.json
W #0000 1 @ 1540
W #0001 2 @ 20
W #0002 1 @ 2820
W #0003 2 @ 18
W #0004 1 @ 2564
W #0005 2 @ 16
W #0006 1 @ 2308
W #0007 2 @ 14
W #0008 1 @ 2052
W #0009 2 @ 12
W #0010 1 @ 1796
W #0011 2 @ 10
W #0012 1 @ 1540
remove aws-iot-faa6f05c81.crt.pem
W #0013 1 @ 13828
W #0014 2 @ 12308
W #0015 1 @ 15108
W #0016 2 @ 12306
W #0017 1 @ 14852
W #0018 2 @ 12304
W #0019 1 @ 14596
W #0020 2 @ 12302
W #0021 1 @ 14340
W #0022 2 @ 12300
W #0023 1 @ 14084
W #0024 2 @ 12298
W #0025 1 @ 13828
remove aws-iot-faa6f05c81.key.pem
W #0026 1 @ 15364
W #0027 2 @ 12312
W #0028 1 @ 15620
W #0029 2 @ 12310
W #0030 1 @ 15364
== dumped fs to before_gc.bin
R #0131 256 @ 0
R #0132 2 @ 254
R #0133 256 @ 4096
R #0134 256 @ 8192
R #0135 256 @ 12288
R #0136 2 @ 12542
R #0137 256 @ 16384
R #0138 256 @ 20480
R #0139 256 @ 24576
R #0140 256 @ 28672
R #0141 256 @ 32768
R #0142 256 @ 36864
R #0143 256 @ 40960
R #0144 256 @ 45056
R #0145 256 @ 49152
R #0146 256 @ 53248
R #0147 256 @ 57344
R #0148 256 @ 61440
R #0149 256 @ 65536
R #0150 256 @ 69632
R #0151 256 @ 73728
R #0152 256 @ 77824
R #0153 256 @ 81920
R #0154 256 @ 86016
R #0155 256 @ 90112
R #0156 256 @ 94208
R #0157 256 @ 98304
R #0158 256 @ 102400
R #0159 256 @ 106496
R #0160 256 @ 110592
R #0161 256 @ 114688
R #0162 256 @ 118784
R #0163 256 @ 122880
R #0164 256 @ 126976
R #0165 256 @ 12288
R #0166 5 @ 12544
R #0167 256 @ 12288
R #0168 256 @ 16384
R #0169 256 @ 20480
R #0170 256 @ 24576
R #0171 256 @ 28672
R #0172 256 @ 32768
R #0173 256 @ 36864
R #0174 256 @ 40960
R #0175 256 @ 45056
R #0176 256 @ 49152
R #0177 256 @ 53248
R #0178 256 @ 57344
R #0179 256 @ 61440
R #0180 256 @ 65536
R #0181 256 @ 69632
R #0182 256 @ 73728
R #0183 256 @ 77824
R #0184 256 @ 81920
R #0185 256 @ 86016
R #0186 256 @ 90112
R #0187 256 @ 94208
R #0188 256 @ 98304
R #0189 256 @ 102400
R #0190 256 @ 106496
R #0191 256 @ 110592
R #0192 256 @ 114688
R #0193 256 @ 118784
R #0194 256 @ 122880
R #0195 256 @ 126976
R #0196 256 @ 0
R #0197 256 @ 4096
R #0198 5 @ 4608
R #0199 256 @ 4608
R #0200 256 @ 12288
R #0201 5 @ 12544
R #0202 256 @ 0
R #0203 256 @ 4096
R #0204 256 @ 8192
R #0205 256 @ 12288
R #0206 256 @ 16384
R #0207 256 @ 20480
R #0208 256 @ 24576
R #0209 256 @ 28672
R #0210 64 @ 12544
W #0031 64 @ 31744
R #0211 64 @ 12608
W #0032 64 @ 31808
R #0212 64 @ 12672
W #0033 64 @ 31872
R #0213 64 @ 12736
W #0034 64 @ 31936
W #0035 2 @ 28694
W #0036 2 @ 12288
W #0037 1 @ 12548
R #0214 256 @ 12288
R #0215 5 @ 12800
R #0216 256 @ 28672
R #0217 64 @ 12800
W #0038 64 @ 32000
=== BOOM!
== dumped fs to boom.bin
[rojer@nba ~/go/src/cesanta.com/common/spiffs/tools mos1]$ ./spls before_gc.bin
image size: 131072
0 idddd//////iddd era_cnt: 0
1 diddddddddddddd era_cnt: 0
2 ddddddddddddddd era_cnt: 0
3 ddddd////////id era_cnt: 0
4 ddddddddddddddd era_cnt: 0
5 dddddiddidddddd era_cnt: 0
6 dddddidddddddid era_cnt: 0
7 ddddddddddd____ era_cnt: 0
8 _______________ era_cnt: 0
9 _______________ era_cnt: 0
10 _______________ era_cnt: 0
11 _______________ era_cnt: 0
12 _______________ era_cnt: 0
13 _______________ era_cnt: 0
14 _______________ era_cnt: 0
15 _______________ era_cnt: 0
16 _______________ era_cnt: 0
17 _______________ era_cnt: 0
18 _______________ era_cnt: 0
19 _______________ era_cnt: 0
20 _______________ era_cnt: 0
21 _______________ era_cnt: 0
22 _______________ era_cnt: 0
23 _______________ era_cnt: 0
24 _______________ era_cnt: 0
25 _______________ era_cnt: 0
26 _______________ era_cnt: 0
27 _______________ era_cnt: 0
28 _______________ era_cnt: 0
29 _______________ era_cnt: 0
30 _______________ era_cnt: 0
31 _______________ era_cnt: 0
era_cnt_max: 1
last_errno: 0
blocks: 32
free_blocks: 24
page_alloc: 102
page_delet: 14
used: 25602 of 113201
-- files:
server.key.pem: 916 0xfcbbff51
conf1.json: 803 0x1b18434d
sys_config_schema.json: 8166 0x61be1ff1
ca-verisign-g5.crt.pem: 5142 0x2b19102d
sys_ro_vars_schema.json: 332 0xdf6558a7
server.crt.pem: 2752 0x29941f1d
ca-verisign-ecc-g2.crt.pem: 1590 0x8635500e
conf0.json: 2939 0x0128cb8c
-- overall crc32: 0xdce86ff8
[rojer@nba ~/go/src/cesanta.com/common/spiffs/tools mos1]$ ./spls boom.bin
image size: 131072
0 idddd//////iddd era_cnt: 0
1 diddddddddddddd era_cnt: 0
2 ddddddddddddddd era_cnt: 0
3 /dddd////////id era_cnt: 0
4 ddddddddddddddd era_cnt: 0
5 dddddiddidddddd era_cnt: 0
6 dddddidddddddid era_cnt: 0
7 dddddddddddd___ era_cnt: 0
8 _______________ era_cnt: 0
9 _______________ era_cnt: 0
10 _______________ era_cnt: 0
11 _______________ era_cnt: 0
12 _______________ era_cnt: 0
13 _______________ era_cnt: 0
14 _______________ era_cnt: 0
15 _______________ era_cnt: 0
16 _______________ era_cnt: 0
17 _______________ era_cnt: 0
18 _______________ era_cnt: 0
19 _______________ era_cnt: 0
20 _______________ era_cnt: 0
21 _______________ era_cnt: 0
22 _______________ era_cnt: 0
23 _______________ era_cnt: 0
24 _______________ era_cnt: 0
25 _______________ era_cnt: 0
26 _______________ era_cnt: 0
27 _______________ era_cnt: 0
28 _______________ era_cnt: 0
29 _______________ era_cnt: 0
30 _______________ era_cnt: 0
31 _______________ era_cnt: 0
era_cnt_max: 1
last_errno: 0
blocks: 32
free_blocks: 24
page_alloc: 102
page_delet: 15
used: 25602 of 113201
-- files:
server.key.pem: 916 0xfcbbff51
conf1.json: 803 0x1b18434d
cannot read sys_config_schema.json, err: -10004
sorry, that's a lot of output. the important bit is at the very end: sys_config_schema.json could not be read.
Hmm.. If you run a SPIFFS_check, is it recovered? Guess not, as you get ERR_DELETED, but still.. Also, do you have a code example of how to reproduce this? I will probably need to add interrupted write capability to the test bench if I haven't done it already.
2017-03-16 16:45 GMT+01:00 Deomid Ryabkov [email protected]:
sorry, that's a lot of output. the important bit is at the very end: sys_config_schema.json could not be read.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pellepl/spiffs/issues/137#issuecomment-287099532, or mute the thread https://github.com/notifications/unsubscribe-auth/AE3NV3tbEAkQow3GWwqshFzlUv3HjyOuks5rmVkigaJpZM4MffWW .
see attached
[rojer@nba ~/cesanta/qwe master]$ tar xjf sptest.tar.bz2
[rojer@nba ~/cesanta/qwe master]$ cd spiffs/tools/
[rojer@nba ~/cesanta/qwe/spiffs/tools master]$ make ./spls ./sptest SPIFFS_CONFIG_PATH=.
gcc -I. -I. -I/home/rojer/cesanta/qwe/spiffs/tools/.. -o spls spls.c mem_spiffs.c /home/rojer/cesanta/qwe/spiffs/tools/../spiffs_cache.c /home/rojer/cesanta/qwe/spiffs/tools/../spiffs_gc.c /home/rojer/cesanta/qwe/spiffs/tools/../spiffs_nu
cleus.c /home/rojer/cesanta/qwe/spiffs/tools/../spiffs_check.c /home/rojer/cesanta/qwe/spiffs/tools/../spiffs_hydrogen.c /home/rojer/cesanta/qwe/spiffs/tools/../spiffs_vfs.c -Wall -Werror -DNO_ESP -DSPIFFS_TEST_VISUALISATION -g
gcc -I. -I. -I/home/rojer/cesanta/qwe/spiffs/tools/.. -o sptest sptest.c mem_spiffs.c /home/rojer/cesanta/qwe/spiffs/tools/../spiffs_cache.c /home/rojer/cesanta/qwe/spiffs/tools/../spiffs_gc.c /home/rojer/cesanta/qwe/spiffs/tools/../spiff
s_nucleus.c /home/rojer/cesanta/qwe/spiffs/tools/../spiffs_check.c /home/rojer/cesanta/qwe/spiffs/tools/../spiffs_hydrogen.c /home/rojer/cesanta/qwe/spiffs/tools/../spiffs_vfs.c -Wall -Werror -DNO_ESP -DSPIFFS_TEST_VISUALISATION -g
[rojer@nba ~/cesanta/qwe/spiffs/tools master]$ ./sptest 0xdb000.bin 38 && ./spls boom.bin 0xc9c55669
=== Will fail at write #38
...
-- files:
server.key.pem: 916 0x055a86e7
conf1.json: 803 0x1b18434d
cannot read sys_config_schema.json, err: -10004
Hi @rojer, looked into it a little. The fact that an untouched file becomes corrupt during an aborted GC is really no mystery, as all files in the block being cleaned are actually pretty brutally touched during the gc. As for the error, a SPIFFS_check seem to fix it IIANM.
so do you then recommend that SPIFFS_check be run on every mount?
the practical upshot of it that files just go missing and device gets bricked... not ideal :)
No, that's not very ideal I agree. There is some info on it here: https://github.com/pellepl/spiffs/wiki/FAQ#powerlosses-contd-when-should-i-run-spiffs_check
I do not know what system or what application you're running it on, but if there is a way to detect a powerloss amidst a file operation, then check should be invoked. Otherwise, yes, on cold start check should run.
I'm coming at this from the nodemcu project -- I think that we should add a spiffs-check on each restart (as we have no idea when the power went away). I might try adding some 'forced poweroff' logic to my stress tester (in PR https://github.com/pellepl/spiffs/pull/100 ) to make sure that it works.
yeah, same here. there's no way to tell, in general. so - any chance that GC can be modified to not require a check? also, how expensive is check? doing it on every boot may be expensive, considering most boots are normal.
The GC cannot be modified to not require a check, as it is now. Unfortunately, a check can be very expensive - it depends on how big the fs is and how many files it contains, but mostly if it needs to mend anything. I don't think a check on every reboot is necessary though. Normally, embedded systems distinguish from controlled reboots by writing some magic to some ram that is not cleared (not in .bss region) on startup. One could reserve some ram for another magic which is updated at completion of an fs operation and destroyed at start of an fs operation. I also have some ideas regarding spiffs that would make the whole check procedure a lot smaller, basically by introducing a journal page. This way there's no need to do a grand full-system healing, but a more directed one if needed. Also, there are plans of rewriting the block handling and having indirect blocks. This way, metadata not residing in cleaned block does not have to be updated on a gc. This would lead to a more effective gc and would, ultimately, lead to a gc not needing a check if interrupted. Unfortunately, I'm pretty stuck up with work and life right now, and probably will be for a while.
as providers of middleware, we unfortunately cannot assume existence of energy-independent storage. i think i'll have to go with SPIFFS_check by default, for now.
this didn't go down very well. looks like SPIFFS_check has some sort of bug, because with the change above we started getting pretty frequent corruptions - even on planned and scheduled reboots, where we SPIFFS_unmount the fs first, i.e. where check should've been a no-op. corruptions are different in nature to the ones caused by GC: preexisting files do not disappear, but the file that was written just before reboot gets corrupted, its content gets scrambled (but no read errors). neither read not write cache is used. i'll look more into that, and probably file a separate issue.
Sound bad, having a check corrupting things is a tad counter intuitive to say the least. Yes, please file as another issue. All info you can collect is valuable. Cheers / P
Den 21 mars 2017 18:51 skrev "Deomid Ryabkov" [email protected]:
this didn't go down very well. looks like SPIFFS_check has some sort of bug, because with the change above we started getting pretty frequent corruptions - even on planned and scheduled reboots, where we SPIFFS_unmount the fs first, i.e. where check should've been a no-op. corruptions are different in nature to the ones caused by GC: preexisting files do not disappear, but the file that was written just before reboot gets corrupted, its content gets scrambled (but no read errors). neither read not write cache is used. i'll look more into that, and probably file a separate issue.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pellepl/spiffs/issues/137#issuecomment-288163266, or mute the thread https://github.com/notifications/unsubscribe-auth/AE3NV1QspcGZkljTGnFAlJ0nUM7VnGKaks5roA4fgaJpZM4MffWW .
ok, let me close the loop on this last one: this was a false alarm, something else was wrong with the setup.
i no longer suspect SPIFFS_check.
Good to hear. Did you ind out what the problem was?