StackExchange.Redis icon indicating copy to clipboard operation
StackExchange.Redis copied to clipboard

Corrupted Data on ListLeftPush v2.0.601

Open ctrondse opened this issue 6 years ago • 20 comments

We're finding that occasionally the data we are pushing to Redis (once every million pushes) contains additional contents such as commands recently executed against the redis server. I believe it may be related to: https://github.com/StackExchange/StackExchange.Redis/issues/875

We are pushing a binary serialized and zipped object into Redis with varying size. I believe the issue is during the write as the commands in the data are only executed in the same application that is also pushing this data to redis.

Push Code:

protected static RedisValue Convert<T>(T value, ILog log) {
	RedisValue redisValue;

	try {
		redisValue = new ZipCompressedObject<T>(value).CompressedValue;
	} catch (CompressedObjectException ex) {
		log?.Error(ex, "Failed converting compressed object.");
		redisValue = default;
	}

	return redisValue;
}

public void PushStart(T value) {
	RedisValue rv = Convert(value, Compressed, Log);
	Db.ListRightPush(Name, rv);
}

I can post additional code snippets if necessary.

Here is a sample of the corrupted data we are seeing (I stripped off some of the beginning data due to overall size, the interesting part is towards the end):

Y"?~??A?FN5??(}?	???1K??Y?0??wS?o?? ??jzO?(??v%2g[??[???*?????b??X ?\????4t=DC?w?5?IY?X??V,VG'?Z?@???-??`?^]o????3?{????{#??l?a?!?!?q????q
?.?w??5tg???f?Y??O??~?s&??	?A:?????P?P?0]z?3?[d`V??Y="??Y"??D|?;??2g?|?ku??6?R?5?d?@???iKB?%!?=$????]??@????S?%??_m??$*C??Y6???9???:?iOV?&?eu?P??=y?????'[6TP?|`~%???@,_?Rj?j???h?;K?,pp*,A?U?@K???@@??Z???]?Q?0??c?R???eU)??Q??S{"???}_??=????B??HQm:(?u??????[???? ?M???Dth
=A
(c???
??i?y???@ZR?W??#D?=Rd`?:Jd`?:Z??c"q?%?|?O??d|??@??d|??@?'h?=g
?9Q  ??29Y  ?S???'??j??<Y?????<9?:#9????Y<9????9?N??0?s?8O  ???@[|B{"?r??G2?{??r?]???G?????&???0?K\??o?_?? ??U?J?X??'??k?h I?+$?RB ?$?Z???7d;?,0?k?c|?:??[@??5??c??-?$?????S?????t??cq?q^U??h?e,@?~??????%x??(Z_8????i??/??k?~?1??|??.z~?f??L?;?Fu?:y?v?oN55????P?1??f}o
?k}~>f?p?K?;?F?>???_??_o??Wh??i???A=?????Y?R?ç?M?To?3?J??Z(????{?????~ ???'M????"???I?O9??<>?yz??D
}*b`>'y??>?`
?O[??<]?	I4??x?8^??Z?_?k?6	?????=S?F????i?????1?H??/??b?/?YM/?s?4?eC??:R?É??/???)5??6e??U}???,??k???3?U?u?)>]???????kE????'lk????,<???n??'p??~WGv?f?3?t9[??c??s?L+d?U???;Y?>?[?i???T.??|???\??v-?^?~????%?M?'?/1?\?j5?J??Z???DW?B|?'2?|???R??J?{?:X?gj?W?a?~??4?R??@??&<;K??]???????-??%\?KIV???UX?i5?!\?m???p]`
?O[V??UX?)?/?T??L_6?]??????R?????????q?l?2??2H4????m????bHy.R??l
?,~?Z???(
????r?Zu?c?(??j?
?;6?>??%k?Y??_?????9?a?L ~Q??Zun???sC??;??6e??
:bj?	????cj?????lD??no???h???.l???
Q?fD??B?^???
~?Z??|T?Z|???~*?]?&?/t4??]	?1fv?hc?^D??q???`_??
W?Z?????A?V?<??),l??u?j?`
=Q?nl`NV?LtR2??O?????YQ???????+??q??7?U??Q"??ZBb??????????ji?6i5?T??L@B??$?a?$n??$??u?s?Z?.???'??IP??Q??I??7A*?N?:?\tB?N];nTYw????q??eE'?Z??>	?HG?'??=?Xt?$`??'?>?Z??i?~?w	A??P??In??#TZ??@?uU?"??o??B?HQiM
b5???p&u??"????.?? cTf??M?d?
??? ?uY?C??vZ??,-??Q??JC;?????????.u?@?????????;???????q?G?F?<?g??#]???,???9L%[?g?lGi???C_?????A???=G??????????J?i?+?????_?I?M???j???%??s]?g??3ls?W?;2N?$?rg2???8xz?l-=?0???????????????j???????u???Z6????m?5cv!?N??bL?XSK1u?????hj9?n???????Vc??:??@?{5qH?????????w
??=?nL?#??@?n????2Y?R?[?)????????&?Qa????Z??????*?|0*?=P? ??{?6???????0?`V??d??????eT?}?B???<Y??VG???Y(????G3*?I??RL?h=9?Q!??
?OP???????b@??w????????C???b7?H?q*???MB ??<B??	?(g??h?^?? ??"E??? ?"E?? ?????(?%?.?](cT??]D???Q??0?v?q??????%??!??.???G???%?\aaB???J?;??v???X?x???-????S??y??h???y??h7?z??h7?+Z?l_??Ed ?"????YV?.!?Jw?u?N	?(w?u?n	?(????Y??uP?>????wP???tP?!c"?pk??.?="cT?.?=&cT??.?=^?x???'??M?/F?2f????T??k7?qa??e?????h??u?y?^0??_??|??&?z?^2???n?l =^*??	?L_???kQ^???Q?4?B?W?*?-??o???@@?wY??yO  ???r??b>???!?P9ID??G??O$?|??d????B}>???<??<| ?@7?_?$?#?G_?O$??%?_?$D??SK&?+?_??6ll%??Y??@?JC??3??2???ri?PK?R??n??????f?V0???v?????H%?????8???Z??????Z?,???-?5)%XY??T?CIN?E??r??NU?r?
0s_?N?.??Z???,Y??DS??h??/???}??TGSY"?:Z???s?.?2ZI?sh^??&tG1m?lY]?lz???cQG?r?W???\???????FH?m???u??5?Qc??~%Q??u???Q???b?7:K???{ #??XR???j??\tó?h f??ut?g?(?:20p????1G?c?????V????l?h-??\G7?&c?J??@[5"	?j?~3???}#?5?3??????K??????N????"?w?????????)??1??m?}c???S????)?m?(?mC???
?6?)??l?6Y??@U??ute?@/1E???G?om???6???
??s3u? ?~s?????0?e???????????	?7???`[?zlc??#zt?6??l??=??c?mg?²A???????^]qYy??6?-
[?B?Go@??SyZx?0c"?)<[,l3???L??t????n????i9jUGEsQ?^5?3=???;?*<????u???vu?$?BO?i?1??V??????p??c}??C?K?j6????n+;??r???+8 ]j??U?t??,?dg????'8?=ud?q?????1???Q?S??%?r<g?????vQ?(,???f?0o
u??a"2O??N	?????$?`
?O[Z?
??DS?^5???!z???&/?d?
?l??mVW2q??H?????-??Uo|^?-????5??\??I?cN?k??????S??X?iK_KZ_L?y?z????;?k+????k???M?|I???k}?Z?3?:???h??Zy??U?b??]ê]"M?Z?I????&????
?x3?æ????aaKO?
;??
~???$suj9xz???G?@-w??uO[?r?v??C???tOw?a&=T?T=?Wp>U?k?>?:????9?>W9?????Q<?V??=?mu??qQ?r??h???u?Cw??Q?'?0?v?z???NrP?d??2OqP?})*?4?8??(??^?2?.8	??!??I?w??.r?H?.???sa?1???8?:????9?!??:?D?'?A?N???K???m%R??U?r	A7?PiWJ??x???ZB?WG?auD?b?
:?X?'N ???T?
?.??14:?znrP?q)???>7???nsP?????W%?!C?N	??!?t?Qy??D????N?X6j?OB ??<Bu= !?A?????H???????G???<?Xb??????<????O??r????T??
O'@-z&?t?%@Mz6?t?;??u??:?|2wI`???`??R?:?r2"??n?8X?0?&?!?k?hj??	i??F"?Z?fBb?e??I?????D4???4?~/M=y?!
??P??,?O?????d8???8D?$N??4)???/?c???|?? ?2F??A??d?*??A?o?s???V??r?|????l:?a?z?MC???T??????$0??d0D???>D??H?'Ge???(?_????I??E????????7?????g)?,lyJ??D?8	0?^t??vi?Sm?(,&!p@?:FKHDY?G?.%!?O?
?????K??3KP9??X?%???b?????)!e?P9?%???*h??@?q?I?S??2~"}???-&?<?=?\3??`W??M?3???Y?D:)Vm
??3x?:?%?B/??n?};??y?pU}??Dz?
{?;???8?ex?VRWL???jBG
??:??????????X-??l?%V?H??t?R?d09??f?_b?????;?
??????w?s???_?-?&?u&?V??v???aa ??;P???L??I??n????
S?J?S?????]?r????UJ???:1p????:??$h??+j??@d?E
? ?e,0??l,2g
????i?tS9????*l?H?J???A??%???B? ?????z??-b??%???bl?T?61??|?D_n??a;?c??'?????B96Z3
VO???%? ???g@?f??[x"?r??=@?`YF?/SP?}UG??@?v?Z?KD??#TO??*3
$5
LPUSH
$29
executionStatus--Running--web
$10
1502032112
*2
$4
INFO
$11
replication
*1
$4
PING
*1
$4
PING
*2
$4
INFO
$11
replication
*1
$4
PING
*1
$4
PING
*2
$4
INFO
$11
replication
*1
$4
PING
*1
$4
PING
*2
$4
INFO
$11
replication
*3
$5
LPUSH
$20
executionResult--web
$46243
?|???n?lI6?W??$W0E:U?l??e????_c?N?Mz-?N?B??B?B??^?????<??N~?|?o?7
????}fvfwn\?q?'?	?7???????c^???;7z~?-vk?6gZ?;7&*?r??v+x???????WI??????R??[???M?@?t&?#)?????e?Y/?????VMWM???i{????~/ ?)?\????6???IesDi2;\H??7?J?
?l??}??P[?Y?-wf????????*{?????&???B?+?Up??c?r"?
?l&??$8??5?xN%?U?J?/UQI??????T!??r??k??q?k?'I?T`??????&?v?X(????J>??*y???^??=????Lla?*?0?*???d9?c????b1G?Yt
???v?L?W?#??t?P$?
VU?????0?z?????UUUMiZYWC"Y?+N9?e????S?i=???]"?f:?<({????;??*??i?b.?9??`Z?W???3mA6(?t?+s?,?w???????????R??XHdS??]?)?[?wn?N??e?¡??Z$?s?f9?f7?k???;???_?\???]?????????_x?z?@x?&K??Ia??&??Z??Ac?Xn?[??)???r??F??7?q???m?X?dCrm
????k?Z^^?3?\???o?h??_{???+2??`?|??{?z???C??u}?zq ?L?c????7![?_R??k???H/,UU??Ik??N$??!B?????9s?6???<sV?pt??H???<g^O??u?J]?8Msg?h?Y[??C?7\?@?????U?K?;q4??xCC?h#?????Y???z????W???V???Y???4y????????/???B}??DG2Y??
???[Ö?]?\X?vuM-S]g?'U??????j?a?j?%??^???r:UvF?/????s?g?fNo"???4g?0??R?z???L0=p???`i?C?1o(U?r?CNv:? 

ctrondse avatar Dec 04 '19 18:12 ctrondse

Well this is ... bad. Very bad. That's definitely output stream data in the payload, so something has gone horribly wrong here - either a bad length encode, or a buffer leaking between threading would be my hunches. Oof.

Is this happening often? Reliably? Is there any particular usage pattern you are using that is demonstrating this? My first challenge is: how to repro this.

On Wed, 4 Dec 2019, 18:59 ctrondse, [email protected] wrote:

We're finding that occasionally the data we are pushing to Redis (once every million pushes) contains additional contents such as commands recently executed against the redis server. I believe it may be related to: #875 https://github.com/StackExchange/StackExchange.Redis/issues/875

We are pushing a binary serialized and zipped object into Redis with varying size. I believe the issue is during the write as the commands in the data are only executed in the same application that is also pushing this data to redis.

Push Code:

protected static RedisValue Convert<T>(T value, ILog log) {

RedisValue redisValue;

try {

  redisValue = new ZipCompressedObject<T>(value).CompressedValue;

} catch (CompressedObjectException ex) {

  log?.Error(ex, "Failed converting compressed object.");

  redisValue = default;

}

return redisValue;

}

public void PushStart(T value) {

RedisValue rv = Convert(value, Compressed, Log);

Db.ListRightPush(Name, rv);

}

I can post additional code snippets if necessary.

Here is a sample of the corrupted data we are seeing (I stripped off some of the beginning data due to overall size, the interesting part is towards the end):

Y"???A?FN5??(}? �???1K??Y?0??w�S?o?? ??jzO?(??v%2�g[??[???�?�????b??X ?????4t=DC�?�w?5?I�Y�?X??V,VG'?Z?@?�??-???^]o???�?3?{?�???{#??l�?a?!?!?q????q ?.?w??5tg???�f?Y??O?�?~?�s&?? ?A:?????P?P?0]z?��3?[dV?��?Y="�??Y"�??D�|?;? ?2g? |?k�u?�?6?R�?5?d?@???iKB?%!?=$�???�?]??@??��??S?%??m??$*C??Y��6???9?�??:?iOV?&?eu?P??=y?�????'[6�TP?|�~%??�?@,�?Rj?�j???h?;K?,�pp,A?�U?@��K?�??@@??Z�???]?Q?0??c?R?���??eU)?��?Q?�?S{"???}�??=?�???B??HQm�:(?u?????�?[???�? ?M?�?? D�th =A (c?�?? ??i?y???�@�Z�R?W??#D�?=Rd?:Jd?:Z??�c"�q?�%?�|?O??d|??@?? d|??@?'h�? =g ?9Q ??��29Y ?S???'??j??<Y??�???<9?��:#9??�??Y<9??�??9?N??0?s��?8O ??��?@[|B{"�?r? ?G2?{�?�?r?]???G?????&???0?K\�??o��??? ??U ?J?X??'??k?h I?+$��?RB ?$��?Z???7d;?,0�?k?c�|?:??[@�??5??c??-?$?????S?????t??c�q?q^U?�?h?e,@?~?????� ?%x�??(Z_8�????i??/??k?~?1??�|?�?.z~�?�f??L ?;?Fu?:y?v?oN55�??�??P?1??f}o ?k}�~>f?p?K?;?��F?>???_??o??Wh�??�i???A=?????Y?R?ç�?M?To�?3?J??Z(????{???�?�?~ ???'M??�??"?�??I?O9??<�>?yz??D }*b�>'y??>�? ?�O[??<]?� I4��??�x?8^??Z??�k?6 ????�?=S?F????i?�????1?H??/??b?�/?YM/�?s?4?eC�??:R�?É??/???)5??6e?�?U}??�?,??k???3?U?u?)>�]??�?????kE????'lk????,??n??'p??~WGv�? f?�3?t9[?�?c??s?L+ d�?U???;Y?>?[?i???T.?�?|???\�??v-?^?~????�%?M?'?/1??j5?�J??Z???DW?B|?'2?�|???R??J?{?:X?g�j?W?a?~??4�?R??�@??&<;K??]????�???-�??%?KIV???UX?i5?!?m�???p]� ?�O[V??UX?)?/?T??L_6?]?????��?R????�?????q?l?2??�2H4�????m????b�Hy.R??l ?,?Z???( ??�??r?Zu?c?(?�?j? ?;�6?>?�?%k?Y?�?

???�??9?a?L Q??Zun�???sC??;??6e ?? :bj? ????cj???�??lD??no???�h???.l??? ��Q?fD??B?�^??? ?Z�??|T?Z�|????�]?&?/t4??] ?1fv?���hc?^D??q???��_?? �W�?Z?????A?V?�?),l??u?j?� =Q?n�lNV?LtR2�??O�?�?�???YQ�????? ??+??q??7?U�??Q"?�?ZB�b??????????ji?6i5?T??L@B??$?a?�$n??$??u�?�s?Z?.???'??IP??Q??I??7A*?N��?:�?\tB?N��];nTYw????q??e�E'?Z??> ?HG?'�??=?Xt?$??'�?>?Z??i??w� A?�?P??I�n??#TZ??@?�uU?"??o??B?HQiM� b5?��??p�&u??"????. ?? cTf?�?M?d? ??? ? uY?C?�?v�Z??,�-??Q??JC;?????????.u?@????????�?;�?????�?�?q?G?��F?????wP�?�??��tP�?!c"?pk??. ?="cT?. ?=&cT??. ?=�^?x?�??'??�M?/F?2f?�???T??�k7?qa??�e????�?h??�u?y��?^0???�?|??&?z?^2??�?n?l =^?�?� ?L_?�??k��Q^?�??���Q?4?B?W?*?-??�o?�??@@?wY??yO ???r??b>?�??!?P9�I�D??G??O$�?|�??d????B}�>?�??�U?k�?>?:?�???9?>W9?�??�??Q<�?V??=?mu??qQ?r?�?h??�?u?�Cw??Q?'?0?v?z��??�?NrP�?d??2OqP?}�)

?4�?�8?? (?�?^?�2?.8 ?�?!??I�?w? ?�.r?�H?.???sa?1??��?�8�?:????��9?!?�?��:?D?'?A?�N???K��???m%R??U?r A7�?PiWJ�??x???ZB?�WG?auD?b? �:?X?'N ???�T? �?.??14�:?znrP?q�)??��?>7???nsP�?????�W%?!C�?N ?�?!?t?�Qy??�D????N?X6j?OB ??<Bu= !�?A�??�?�??H???�?�???G�?�??<�?Xb�??????D??H�?'�Ge???(?_????I??E???�?????7???�??g)?�,lyJ??D?8 0?^t??vi?Sm?(,&!p@�?�:FKH�DY?G?�.%!�?O�? ?????K��??3KP9?��?X?%???��b???�??�)!�e�?P9?%�???� h??@?q�?I?S?�?2~"}�???-�&?�<?=?�\3??�W??M?3???Y?D:)�Vm ??3x?:?%?B/??n?};??y?pU}�??Dz? {?�;?�??8?ex?VR�W�L???jBG� ??:???????�???X-??l?%V?H??t?R? d�09??�f?b�????�?;? ??????w?s????-?�&?u&��?V??�v???�aa ??;P???L??I??n???? S?J?S?????�]?r????UJ??�?:1p? ???:??$h??+�j??@d?E� ? �? e,0??l,2�g ?�???i?tS9????*l��?H?J�??? A??%???B? ???�??z??-��b??%???��bl?�T?6��1??�|�?D_n??a;?�c??�'?????B96Z��3 V�O?�??%? ???g��@?f�??[x"?r??=�@?�YF?/SP?}UG??@?v�?Z?K�D??#TO??*3 $5 LPUSH $29 executionStatus--Running--web $10 1502032112 *2 $4 INFO $11 replication *1 $4 PING *1 $4 PING *2 $4 INFO $11 replication *1 $4 PING *1 $4 PING *2 $4 INFO $11 replication *1 $4 PING *1 $4 PING

2 $4 INFO $11 replication 3 $5 LPUSH $20 executionResult--web $46243 ?�|�???n?lI6?W??$W0E:U?l??e????�_c?N?M�z�-?N?�B�??�B?�B�??^?????<??N~?|?o?7 ????}fvfwn?q?'? ?7???????c^???��;7�z~?-�vk?6gZ?�;7&?r??v+x???????WI??????R??[???M?@?t&?#)?????e?Y/�?????VMWM???i{�????~/ ?)???�??6???IesDi2;\H??7?�J? ?l??}??P[?Y?- wf???????? {�?????&��???B?+?Up??�c?r"? ?l&??$8??5?xN%?�U?J?/U�QI??????T!??r??k??q?k�?'I�?T?????�?&?v?X(????J>??y???^�??=????��Ll�a?��?0????d9?c�????b1G?Yt ???v?L?W?#?�?t?P$? VU?????0?z?�????UUUMiZYWC�"Y?+N9?e???��?S?i=???]"?�f:?<({????;????i?b.?9?? Z?W???3mA6(�?t?�+s�?,?w???????????�R??XHdS?�?�]?)?[?wn?�N??e?¡??Z$?s?f9?f7?k???;???_????]??????�?�??_x?z? @x https://github.com/x ?&K??Ia�??&�??Z�?�?Ac?Xn?[??)???�r??F??7?q??�?m?X?d�Crm �????k?Z^^?�3??? �?o?h??_�{???+2�???|??{�?z???C??u}?zq ?���L?c???�?7![?_R??�k??�?H/,UU??Ik??N$�?�?!B?????9s?6???<sV?pt??H???<g^O??u?J]?8Msg?h?Y[�??�C?7?@???�??U?K?�;q4??xCC?h#????�?Y???z????W???V�??�?Y???4y??????�?�?/???B}??DG2Y�?? ???[Ö?]?\X?vuM-S]g?�'U??�????j?�a?j?%�??^??�?r:UvF?/????s?g?fNo"???4g?0??R?z???L0=p??? i?�C?1o(U�?r?CNv:?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/StackExchange/StackExchange.Redis/issues/1294?email_source=notifications&email_token=AAAEHMDNCCLOHHUY432WNNDQW74XLA5CNFSM4JVNGL32YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4H6C76FQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAEHMFX6REKHSLOU7IOOHLQW74XLANCNFSM4JVNGL3Q .

mgravell avatar Dec 04 '19 22:12 mgravell

Our application is executing on Linux in a distributed compute environment using Mono v6.4.0.198. We often have 5000+ instances of our application running at once on different machines, but all with open connections to Redis. Each instance will upload multiple of these objects through its lifespan, but it will do so synchronously 1 object at a time. However, asynchronously the application could be running other Redis commands on the same ConnectionMultiplexer

We have not seen this issue occur with any other objects are pushing into Redis.

We're currently seeing this occur between 1 and 3 times a day per ~5 million objects, so it's incredibly hard to reproduce.

ctrondse avatar Dec 04 '19 23:12 ctrondse

I know you're quite busy, but is there any additional logging, or options I can enable in the Configuration to extract more information about what may be happening here? We'd like to help debug the issue if possible, but we're not sure where to start looking.

ctrondse avatar Dec 12 '19 23:12 ctrondse

Sadly, no, I can't think of a pre-existing switch that we could toggle here. I guess the first thing is to try and repro it somehow, presumably by bulk loading a pipe until it fails.

Some brief questions:

  • remind me (on mobile/email), don't have full history: setup summary here? What OS? Is TLS involved, etc?
  • do the payloads look the right-ish size but wrong content, wrong size, or impossible to know?

If TLS is in play, it could well be part of the story, so I can't just take the output pipe in isolation. Trying to think how I can setup a repro cycle here, ideally in-proc

On Thu, 12 Dec 2019, 23:31 ctrondse, [email protected] wrote:

I know you're quite busy, but is there any additional logging, or options I can enable in the Configuration to extract more information about what may be happening here? We'd like to help debug the issue if possible, but we're not sure where to start looking.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/StackExchange/StackExchange.Redis/issues/1294?email_source=notifications&email_token=AAAEHMC37JLFQZ25DATOUKTQYLCTZA5CNFSM4JVNGL32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGYMSJA#issuecomment-565233956, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAEHMANLXL73KEHIG2GPL3QYLCTZANCNFSM4JVNGL3Q .

mgravell avatar Dec 13 '19 08:12 mgravell

Setup Summary: Our application is executing on Linux in a distributed compute environment using Mono v6.4.0.198. We often have 5000+ instances of our application running at once on different machines, but all with open connections to Redis. Each instance will upload multiple of these objects through its lifespan, but it will do so synchronously 1 object at a time. However, asynchronously the application could be running other Redis commands on the same ConnectionMultiplexer. Additionally, there is typically a 30 second+ break between each of these uploads that are being corrupted.

OS: SLES11SP4 The payloads appear to be close in size with additional data inserted into it. We're in the process of analyzing the payloads to see how much extra data in put in, however it's binary data so it's a little hard to tell.

We are not using TLS that I know of. We did not enable it for Redis.

I'll talk with my team about how we might be able to try to reproduce it.

ctrondse avatar Dec 13 '19 19:12 ctrondse

Not having TLS to contend with is a huge relief - it hugely simplifies the work here. Question: the input here - what is it? Regular byte arrays? Leased arrays? Memory-Byte? What? I want to rule out all sources of conflict, and one possibility is a leased array input that has been leaked to the pool.

I think my plan is to change the dummy redis server we have here to take a payload that is {seed}{corresponding raw data of the remaining length}, and verify it with random verifiable payloads. Then we can see if we can send a ton of data through code as much as possible as yours, and see a: if we can make it break, and b: how frequently / what conditions

On Fri, 13 Dec 2019, 19:26 ctrondse, [email protected] wrote:

Setup Summary: Our application is executing on Linux in a distributed compute environment using Mono v6.4.0.198. We often have 5000+ instances of our application running at once on different machines, but all with open connections to Redis. Each instance will upload multiple of these objects through its lifespan, but it will do so synchronously 1 object at a time. However, asynchronously the application could be running other Redis commands on the same ConnectionMultiplexer

OS: SLES11SP4 The payloads appear to be close in size with additional data inserted into it. We're in the process of analyzing the payloads to see how much extra data in put in, however it's binary data so it's a little hard to tell.

We are not using TLS that I know of. We did not enable it for Redis.

I'll talk with my team about how we might be able to try to reproduce it.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/StackExchange/StackExchange.Redis/issues/1294?email_source=notifications&email_token=AAAEHMCLIXU4ENTIZYXY5MLQYPOVBA5CNFSM4JVNGL32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEG3ABXY#issuecomment-565575903, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAEHMHXYILYMTJWY6MEGL3QYPOVBANCNFSM4JVNGL3Q .

mgravell avatar Dec 14 '19 08:12 mgravell

The input is an internal object that has been binary serialized and then compressed using the deflate method. At the point of upload it is a regular byte array.

I should be able to share code snippets of that would be helpful.

ctrondse avatar Dec 14 '19 09:12 ctrondse

@ctrondse Please share if you can, that may give us some hints here.

NickCraver avatar Mar 15 '20 18:03 NickCraver

@ctrondse Please share if you can, that may give us some hints here.

I will work on some sample code and upload it tomorrow.

ctrondse avatar Mar 16 '20 22:03 ctrondse

Attached a sample solution that uses the same code to upload and process the objects.

ListLeftPush.zip

I have not been able to recreate the issue manually so far, however we are still seeing it occur in our Production environment.

ctrondse avatar Mar 18 '20 19:03 ctrondse

Any solution?

grandsilence avatar May 03 '20 23:05 grandsilence

@NickCraver Have you had a chance to look at the sample code I uploaded? Thanks!

ctrondse avatar May 04 '20 22:05 ctrondse

@ctrondse Unfortunately I was never able to recreate this with your sample code either, my guess would be a race on something in the Mono side. Is this still happening in the latest library or version of Mono? I know we hit some fun socket issues along the way with Mono and I'm wondering if you're potentially on a version before those fixes went in.

NickCraver avatar Feb 27 '22 15:02 NickCraver

If we can get any more info here, happy to reopen (and sorry this got dropped), but this is the only report we have so my strong suspicion is this is Mono, and possibly fixed in a later release there like our other socket issues. If that's not the case and this happens in latest Mono I'm super happy to take a look and connect with that team, but without the ability to repro the issue on either side here we're kinda stuck on progressing.

NickCraver avatar Mar 08 '22 12:03 NickCraver

If we can get any more info here, happy to reopen

I have more info! We had two separate occurrences of this last month, and I believe I found the likely mechanism behind it. It happened in close proximity to an OutOfMemoryException logged elsewhere by the application, while writing a ~300 kB value to redis. The length of the stored value was correct, but the tail end was replaced with subsequent redis commands.

Stress testing the library, I was able to reproduce similar behavior.

  • While PhysicalConnection.WriteUnifiedSpan() was writing the value to the PipeWriter, writer.Write() throws an OutOfMemoryException trying to GetSpan(). Any data not yet written is lost.

https://github.com/StackExchange/StackExchange.Redis/blob/ef1178e2ebf1433bbd1c7bafb59f1cc5c1e7704d/src/StackExchange.Redis/PhysicalConnection.cs#L1001

  • The exception is caught and consumed by Message.WriteTo(). This does not invalidate the stream.

https://github.com/StackExchange/StackExchange.Redis/blob/ef1178e2ebf1433bbd1c7bafb59f1cc5c1e7704d/src/StackExchange.Redis/Message.cs#L770-L774

  • The write lock is released, so the next command(s) start writing in the middle of the partially-written command.

Here is a gist with code to reproduce the issue. The diff is against tag 2.2.79, and simulates an OutOfMemoryException when a raw value of length 555 is written. (It writes the first 400 bytes, then throws the exception.)

https://gist.github.com/Voltara/3034ebc9efb79c78463a27a247996e7d

Voltara avatar Mar 11 '22 21:03 Voltara

Ooooooh, excellent find - @mgravell is going to enjoy this one :)

Reopening this!

NickCraver avatar Mar 11 '22 21:03 NickCraver

@ctrondse Unfortunately I was never able to recreate this with your sample code either, my guess would be a race on something in the Mono side. Is this still happening in the latest library or version of Mono? I know we hit some fun socket issues along the way with Mono and I'm wondering if you're potentially on a version before those fixes went in.

Sadly I can't help with this anymore as I left the company I was at that was experiencing the issue. However I sent this thread to one of my prior colleagues that is knowledgeable on the issue we were seeing. I'm unsure if they were still experiencing it or not, so I'll let them chime in on it.

Glad to see there might be some traction on this elsewhere though!

ctrondse avatar Mar 11 '22 21:03 ctrondse

That's brilliant context, thanks. Will look at that scenario in the week.

On Fri, 11 Mar 2022, 21:19 ctrondse, @.***> wrote:

@ctrondse https://github.com/ctrondse Unfortunately I was never able to recreate this with your sample code either, my guess would be a race on something in the Mono side. Is this still happening in the latest library or version of Mono? I know we hit some fun socket issues along the way with Mono and I'm wondering if you're potentially on a version before those fixes went in.

Sadly I can't help with this anymore as I left the company I was at that was experiencing the issue. However I sent this thread to one of my prior colleagues that is knowledgeable on the issue we were seeing. I'm unsure if they were still experiencing it or not, so I'll let them chime in on it.

Glad to see there might be some traction on this elsewhere though!

— Reply to this email directly, view it on GitHub https://github.com/StackExchange/StackExchange.Redis/issues/1294#issuecomment-1065537932, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAEHMD54XW7UL6CJL5PLGDU7O2FXANCNFSM4JVNGL3Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

mgravell avatar Mar 12 '22 12:03 mgravell

Just checking in to see if there's anything I can do to assist with this, and to make sure it doesn't slip through the cracks. (In theory, depending on the application, this could allow a malicious actor to read or write arbitrary data stored in Redis.)

Voltara avatar Mar 30 '22 21:03 Voltara

Thanks for the gentle nudge. My bad; I will try to look tomorrow (famous last words)

Sorry for the delay, but sadly this is not my official day job, do I need to scrape the time.

On Wed, 30 Mar 2022, 22:16 Andrew Skalski, @.***> wrote:

Just checking in to see if there's anything I can do to assist with this, and to make sure it doesn't slip through the cracks. (In theory, depending on the application, this could allow a malicious actor to read or write arbitrary data stored in Redis.)

— Reply to this email directly, view it on GitHub https://github.com/StackExchange/StackExchange.Redis/issues/1294#issuecomment-1083632573, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAEHMHKOGACFBMXTK2LCCDVCTABJANCNFSM4JVNGL3Q . You are receiving this because you were mentioned.Message ID: @.***>

mgravell avatar Mar 30 '22 22:03 mgravell