rr icon indicating copy to clipboard operation
rr copied to clipboard

Divergence recording FF on x64

Open joneschrisg opened this issue 11 years ago • 5 comments

Found while looking at #1355, but apparently unrelated. Haven't tried to reproduce yet, but the steps I took were

  • record FF with rr -K. Open XUL menu, load new non-e10s window, load google.com
  • replay with rr -K

We hit the following divergence

[ERROR /home/cjones/rr/rr/src/Registers.cc:292:maybe_print_reg_mismatch() errno: 0 'Success'] 
 -> rbx 0x2aaaab100218 != 0x2aaad386f420 (replaying vs. recorded)
[ERROR /home/cjones/rr/rr/src/Registers.cc:292:maybe_print_reg_mismatch() errno: 0 'Success'] 
 -> rdx 0 != 0x2aaaab040108 (replaying vs. recorded)
[ERROR /home/cjones/rr/rr/src/Registers.cc:292:maybe_print_reg_mismatch() errno: 0 'Success'] 
 -> rsi 0 != 0x7c1b (replaying vs. recorded)
[ERROR /home/cjones/rr/rr/src/Registers.cc:292:maybe_print_reg_mismatch() errno: 0 'Success'] 
 -> rdi 0x2aaaab100218 != 0x2aaac0597000 (replaying vs. recorded)
[ERROR /home/cjones/rr/rr/src/Registers.cc:292:maybe_print_reg_mismatch() errno: 0 'Success'] 
 -> rbp 0 != 0x1 (replaying vs. recorded)
[ERROR /home/cjones/rr/rr/src/Registers.cc:292:maybe_print_reg_mismatch() errno: 0 'Success'] 
 -> r8 0 != 0x2aaad38da730 (replaying vs. recorded)
[ERROR /home/cjones/rr/rr/src/Registers.cc:292:maybe_print_reg_mismatch() errno: 0 'Success'] 
 -> r9 0 != 0x50 (replaying vs. recorded)
[ERROR /home/cjones/rr/rr/src/Registers.cc:292:maybe_print_reg_mismatch() errno: 0 'Success'] 
 -> r10 0 != 0x7fffffff8060 (replaying vs. recorded)
[ERROR /home/cjones/rr/rr/src/Registers.cc:292:maybe_print_reg_mismatch() errno: 0 'Success'] 
 -> r11 0x246 != 0x202 (replaying vs. recorded)
[ERROR /home/cjones/rr/rr/src/Registers.cc:292:maybe_print_reg_mismatch() errno: 0 'Success'] 
 -> r13 0x7fffffff7bd0 != 0x2aaad8313e00 (replaying vs. recorded)
[ERROR /home/cjones/rr/rr/src/Registers.cc:292:maybe_print_reg_mismatch() errno: 0 'Success'] 
 -> r14 0x2aaab7d703e8 != 0 (replaying vs. recorded)
[ERROR /home/cjones/rr/rr/src/Registers.cc:292:maybe_print_reg_mismatch() errno: 0 'Success'] 
 -> r15 0x7fffffff7b88 != 0x7fffffff8740 (replaying vs. recorded)
[ERROR /home/cjones/rr/rr/src/Registers.cc:292:maybe_print_reg_mismatch() errno: 0 'Success'] 
 -> rip 0x70000004 != 0x3bbbeef657 (replaying vs. recorded)
[ERROR /home/cjones/rr/rr/src/Registers.cc:292:maybe_print_reg_mismatch() errno: 0 'Success'] 
 -> eflags 0x244 != 0x200 (replaying vs. recorded)
[ERROR /home/cjones/rr/rr/src/Registers.cc:292:maybe_print_reg_mismatch() errno: 0 'Success'] 
 -> orig_rax 0x60 != 0xb (replaying vs. recorded)
[FATAL /home/cjones/rr/rr/src/Registers.cc:412:compare_register_files() errno: 0 'Success'] 
 (task 24043 (rec:23904) at time 147230)
 ->  Assertion `!bail_error || match' failed to hold. Fatal register mismatch (ticks/rec:384335579/384325013)

from this context

(gdb) bt
#0  0x0000000070000004 in ?? ()
#1  0x00002aaaaacb2a0b in _untraced_raw_syscall () at /home/cjones/rr/rr/src/preload/traced_syscall_shared.S:67
#2  0x00002aaaaacb2020 in untraced_syscall (a5=0, a4=0, a3=0, a2=0, a1=0, a0=46912502759960, syscallno=96)
    at /home/cjones/rr/rr/src/preload/preload.c:409
#3  sys_gettimeofday (call=0x7fffffff7b88) at /home/cjones/rr/rr/src/preload/preload.c:1275
#4  syscall_hook (call=0x7fffffff7b88) at /home/cjones/rr/rr/src/preload/preload.c:1655
#5  0x00002aaaaacb2a22 in _syscall_hook_trampoline () at /home/cjones/rr/rr/src/preload/syscall_hook.S:135
#6  0x00002aaaaaaade5a in gettimeofday ()
#7  0x00002aaaad126ef8 in js::gc::GCRuntime::maybePeriodicFullGC() () from /home/cjones/Desktop/firefox/libxul.so
#8  0x00002aaaacf7e08f in mozilla::dom::AutoEntryScript::~AutoEntryScript() () from /home/cjones/Desktop/firefox/libxul.so
#9  0x00002aaaacf3f697 in nsXPCWrappedJSClass::CallMethod(nsXPCWrappedJS*, unsigned short, XPTMethodDescriptor const*, nsXPTCMiniVariant*) () from /home/cjones/Desktop/firefox/libxul.so
#10 0x00002aaaad1f6261 in PrepareAndDispatch () from /home/cjones/Desktop/firefox/libxul.so
#11 0x00002aaaad1f5753 in SharedStub () from /home/cjones/Desktop/firefox/libxul.so
#12 0x00002aaaad8ff482 in nsXMLHttpRequest::GetInterface(nsID const&, void**) () from /home/cjones/Desktop/firefox/libxul.so
#13 0x00002aaaad234558 in void NS_QueryNotificationCallbacks<nsIChannel>(nsIChannel*, nsID const&, void**) ()
   from /home/cjones/Desktop/firefox/libxul.so
#14 0x00002aaaad2347e6 in NS_UsePrivateBrowsing(nsIChannel*) () from /home/cjones/Desktop/firefox/libxul.so
#15 0x00002aaaad2f164c in mozilla::net::nsHttpChannelAuthProvider::Init(nsIHttpAuthenticableChannel*) ()
   from /home/cjones/Desktop/firefox/libxul.so
#16 0x00002aaaad342d30 in mozilla::net::nsHttpChannel::BeginConnect() () from /home/cjones/Desktop/firefox/libxul.so
#17 0x00002aaaad3435da in mozilla::net::nsHttpChannel::OnProxyAvailable(nsICancelable*, nsIURI*, nsIProxyInfo*, tag_nsresult) ()
   from /home/cjones/Desktop/firefox/libxul.so
#18 0x00002aaaad25c7a4 in nsAsyncResolveRequest::DoCallback() () from /home/cjones/Desktop/firefox/libxul.so
#19 0x00002aaaad25ca90 in nsAsyncResolveRequest::Run() () from /home/cjones/Desktop/firefox/libxul.so
#20 0x00002aaaad25cc10 in nsProtocolProxyService::AsyncResolveInternal(nsIURI*, unsigned int, nsIProtocolProxyCallback*, nsICancelable**, bool) () from /home/cjones/Desktop/firefox/libxul.so
#21 0x00002aaaad32230d in mozilla::net::nsHttpChannel::ResolveProxy() () from /home/cjones/Desktop/firefox/libxul.so
#22 0x00002aaaad3437dd in mozilla::net::nsHttpChannel::AsyncOpen(nsIStreamListener*, nsISupports*) ()
   from /home/cjones/Desktop/firefox/libxul.so
#23 0x00002aaaad9005f7 in nsXMLHttpRequest::Send(nsIVariant*, mozilla::dom::Nullable<nsXMLHttpRequest::RequestBody> const&) ()
   from /home/cjones/Desktop/firefox/libxul.so
#24 0x00002aaaadd0ae0d in nsXMLHttpRequest::Send(JSContext*, nsAString_internal const&, mozilla::ErrorResult&) [clone .isra.531]
    () from /home/cjones/Desktop/firefox/libxul.so
#25 0x00002aaaadd35876 in mozilla::dom::XMLHttpRequestBinding::send(JSContext*, JS::Handle<JSObject*>, nsXMLHttpRequest*, JSJitMethodCallArgs const&) () from /home/cjones/Desktop/firefox/libxul.so
#26 0x00002aaaacf91c2f in mozilla::dom::GenericBindingMethod(JSContext*, unsigned int, JS::Value*) ()
   from /home/cjones/Desktop/firefox/libxul.so
#27 0x00002aaaad15dbd2 in js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) ()
   from /home/cjones/Desktop/firefox/libxul.so
#28 0x00002aaaad151a56 in Interpret(JSContext*, js::RunState&) () from /home/cjones/Desktop/firefox/libxul.so
#29 0x00002aaaad15d839 in js::RunScript(JSContext*, js::RunState&) () from /home/cjones/Desktop/firefox/libxul.so
#30 0x00002aaaad15dcb8 in js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) ()
   from /home/cjones/Desktop/firefox/libxul.so
#31 0x00002aaaad15e5a6 in js::Invoke(JSContext*, JS::Value const&, JS::Value const&, unsigned int, JS::Value const*, JS::MutableHandle<JS::Value>) () from /home/cjones/Desktop/firefox/libxul.so
#32 0x00002aaaad1064d2 in JS_CallFunctionValue(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::Value>, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>) () from /home/cjones/Desktop/firefox/libxul.so
#33 0x00002aaaacf3f4bd in nsXPCWrappedJSClass::CallMethod(nsXPCWrappedJS*, unsigned short, XPTMethodDescriptor const*, nsXPTCMiniVariant*) () from /home/cjones/Desktop/firefox/libxul.so
#34 0x00002aaaad1f6261 in PrepareAndDispatch () from /home/cjones/Desktop/firefox/libxul.so
#35 0x00002aaaad1f5753 in SharedStub () from /home/cjones/Desktop/firefox/libxul.so
#36 0x00002aaaad1f55d0 in NS_InvokeByIndex () from /home/cjones/Desktop/firefox/libxul.so
#37 0x00002aaaacf3c40e in XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) ()
   from /home/cjones/Desktop/firefox/libxul.so
#38 0x00002aaaacf3ead5 in XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*) () from /home/cjones/Desktop/firefox/libxul.so
#39 0x00002aaaad15dbd2 in js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) ()
   from /home/cjones/Desktop/firefox/libxul.so
#40 0x00002aaaad151a56 in Interpret(JSContext*, js::RunState&) () from /home/cjones/Desktop/firefox/libxul.so
#41 0x00002aaaad15d839 in js::RunScript(JSContext*, js::RunState&) () from /home/cjones/Desktop/firefox/libxul.so
#42 0x00002aaaad15dcb8 in js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) ()
   from /home/cjones/Desktop/firefox/libxul.so
#43 0x00002aaaad15e5a6 in js::Invoke(JSContext*, JS::Value const&, JS::Value const&, unsigned int, JS::Value const*, JS::MutableHandle<JS::Value>) () from /home/cjones/Desktop/firefox/libxul.so
#44 0x00002aaaad1064d2 in JS_CallFunctionValue(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::Value>, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>) () from /home/cjones/Desktop/firefox/libxul.so
#45 0x00002aaaacf3f4bd in nsXPCWrappedJSClass::CallMethod(nsXPCWrappedJS*, unsigned short, XPTMethodDescriptor const*, nsXPTCMiniVariant*) () from /home/cjones/Desktop/firefox/libxul.so
#46 0x00002aaaad1f6261 in PrepareAndDispatch () from /home/cjones/Desktop/firefox/libxul.so
#47 0x00002aaaad1f5753 in SharedStub () from /home/cjones/Desktop/firefox/libxul.so
#48 0x00002aaaad1f1095 in nsTimerImpl::Fire() () from /home/cjones/Desktop/firefox/libxul.so
#49 0x00002aaaad1f1afe in nsTimerEvent::Run() () from /home/cjones/Desktop/firefox/libxul.so
#50 0x00002aaaacf0ecd1 in nsThread::ProcessNextEvent(bool, bool*) () from /home/cjones/Desktop/firefox/libxul.so
#51 0x00002aaaacf16f4e in NS_ProcessNextEvent(nsIThread*, bool) () from /home/cjones/Desktop/firefox/libxul.so
#52 0x00002aaaad1f2f99 in nsThread::Shutdown() () from /home/cjones/Desktop/firefox/libxul.so
#53 0x00002aaaad1f3043 in nsRunnableMethodImpl<tag_nsresult (nsIThread::*)(), void, true>::Run() ()
   from /home/cjones/Desktop/firefox/libxul.so
#54 0x00002aaaacf0efa6 in nsThread::ProcessNextEvent(bool, bool*) () from /home/cjones/Desktop/firefox/libxul.so
#55 0x00002aaaacf16f4e in NS_ProcessNextEvent(nsIThread*, bool) () from /home/cjones/Desktop/firefox/libxul.so
#56 0x00002aaaacf2a673 in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) ()
   from /home/cjones/Desktop/firefox/libxul.so
#57 0x00002aaaad3e7ac2 in MessageLoop::Run() () from /home/cjones/Desktop/firefox/libxul.so
#58 0x00002aaaae16c828 in nsBaseAppShell::Run() () from /home/cjones/Desktop/firefox/libxul.so
#59 0x00002aaaae6a3a08 in nsAppStartup::Run() () from /home/cjones/Desktop/firefox/libxul.so
#60 0x00002aaaae6d9354 in XREMain::XRE_mainRun() () from /home/cjones/Desktop/firefox/libxul.so
#61 0x00002aaaae6db865 in XREMain::XRE_main(int, char**, nsXREAppData const*) () from /home/cjones/Desktop/firefox/libxul.so
#62 0x00002aaaae6dbafa in XRE_main () from /home/cjones/Desktop/firefox/libxul.so
#63 0x0000000000407a66 in do_main(int, char**, nsIFile*) ()
#64 0x0000000000404ceb in main ()

It looks like we've trampolined off into the woods, but I'm not sure why.

joneschrisg avatar Dec 17 '14 19:12 joneschrisg

A simple attempt to reproduce this locally didn't work for me.

rocallahan avatar Dec 18 '14 20:12 rocallahan

Just rebuilt my machine after a catastrophic hard drive failure (very suspicious given your experience ...), will see if I can still repro this.

joneschrisg avatar Dec 18 '14 23:12 joneschrisg

It took me a few tries, but I was able to reproduce this again. Just like in the original bug report, with

 -> orig_rax 0x60 != 0xb (replaying vs. recorded)

I see

 -> orig_rax 0xca != 0xb (replaying vs. recorded)

The syscall 0xb (11) is munmap() on x64. The only gecko modules that heavily use munmap, to my knowledge, are jemalloc and jseng. Both are pretty heavily multithreaded (or at least jseng alloc/dealloc used to be), so mis-recorded synchronization seems to be a possibility. I don't have a tremendous amount of time to spend on this but I'll see if something jumps out.

joneschrisg avatar Dec 19 '14 19:12 joneschrisg

There's a syscallbuf flush event just before the munmap. The buffered calls are

 -> Replaying buffered `poll' (ret:0) which does use desched event
 -> Replaying buffered `clock_gettime' (ret:0) which does not use desched event
 -> Replaying buffered `clock_gettime' (ret:0) which does not use desched event
 -> Replaying buffered `clock_gettime' (ret:0) which does not use desched event
 -> Replaying buffered `gettimeofday' (ret:0) which does not use desched event
 -> Replaying buffered `clock_gettime' (ret:0) which does not use desched event
 -> Replaying buffered `gettimeofday' (ret:0) which does not use desched event
 -> Replaying buffered `clock_gettime' (ret:0) which does not use desched event

so I'm afraid that's not too helpful. Nothing else jumps out. Smells like a userspace futex operation that's going haywire somehow.

joneschrisg avatar Dec 19 '14 20:12 joneschrisg

Actually this one seems to have been some days ago by 8a15f2acdd39d799001c98107e72c2bed4ec63e9 and is quite old, so I suggest to close this.

GitMensch avatar Jul 26 '21 08:07 GitMensch