Security.xml throws exception with randomwalk
When running ./bin/rwalk Security.xml, or any other module that uses the security module, and exception will be thrown whenever the AuthenticateSystemWrong node is run.
2022-09-08T16:48:59,788 [testing.randomwalk.Framework] ERROR: Error during random walk
java.lang.Exception: Error running node AuthenticateSystemWrong
at org.apache.accumulo.testing.randomwalk.Module.visit(Module.java:369) ~[accumulo-testing-shaded.jar:?]
at org.apache.accumulo.testing.randomwalk.Framework.run(Framework.java:54) ~[accumulo-testing-shaded.jar:?]
at org.apache.accumulo.testing.randomwalk.Framework.main(Framework.java:98) ~[accumulo-testing-shaded.jar:?]
Caused by: org.apache.accumulo.core.client.AccumuloException: Authentication succeeded when it should have failed while the user exists? true
at org.apache.accumulo.testing.randomwalk.security.Authenticate.authenticate(Authenticate.java:101) ~[accumulo-testing-shaded.jar:?]
at org.apache.accumulo.testing.randomwalk.security.Authenticate.visit(Authenticate.java:40) ~[accumulo-testing-shaded.jar:?]
at org.apache.accumulo.testing.randomwalk.Module.lambda$visit$0(Module.java:301) ~[accumulo-testing-shaded.jar:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52) ~[accumulo-testing-shaded.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52) ~[accumulo-testing-shaded.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52) ~[accumulo-testing-shaded.jar:?]
at java.lang.Thread.run(Thread.java:829) ~[?:?]
2022-09-08T16:48:59,801 [testing.randomwalk.Framework] INFO : Test finished
There is a log "Test finished" afterwards but it just hangs at this point and blocks.
This exception is thrown when the following case is statisfied:
https://github.com/apache/accumulo-testing/blob/ae9c46552a661c2dec132d26835cec5dd2ecbf66/src/main/java/org/apache/accumulo/testing/randomwalk/security/Authenticate.java#L83-L86
I'm not sure if there is an error in the logic that throws this exception or if there is a valid error occurring here.
I have not looked at this in a long time, so not completely sure the following explanation is completely correct.
I think the security RW test are broken because the assume that the security APIs are immediately consistent when they are actually eventually consistent. The security RW test may work better if run w/ a single tablet server but not sure.
I think when an Accumulo client does a security operation it picks a random tablet server to actually carry out that work. The tablet servers store the security information in Zookeeper and use ZooCache to keep a copy of it. So when one tserver does a security operation it will eventually cause the zoocaches of other tablet servers to update. That could lead to the following
- Client A randomly contacts tserver 1 to update some security information
- Client A randomly contacts tserver 2 to read the security information it just updated. Because ZK has not propagated the update yet to tserver 2, Client A does not see the update.
Again not sure if my recollection matches the current code.
The security RW test may work better if run w/ a single tablet server but not sure.
When running with a single tserver it seems like this exception gets thrown just as often.
I see some inconsistency when running the security module. At times my error is just what you described above. Other times I get a different error followed by the one above. And rarely, I get a completely different error. I will post the logs below.
This one I got followed then by the one posted above. This usually happened on a second run through and it seemed the module got further along
2022-09-10T14:44:36,497 [randomwalk.security.TableOp] INFO : Caught MutationsRejectedException(PERMISSION_DENIED) in TableOp.WRITE as expected.
2022-09-10T14:44:36,497 [core.clientImpl.TabletServerBatchWriter] ERROR: Failed to send tablet server groot:9997 its batch : Error PERMISSION_DENIED for user table_groot - User does not have permission to perform this action
org.apache.accumulo.core.client.AccumuloSecurityException: Error PERMISSION_DENIED for user table_groot - User does not have permission to perform this action
at org.apache.accumulo.core.clientImpl.TabletServerBatchWriter$MutationWriter.sendMutationsToTabletServer(TabletServerBatchWriter.java:1015) ~[accumulo-testing-shaded.jar:?]
at org.apache.accumulo.core.clientImpl.TabletServerBatchWriter$MutationWriter$SendTask.send(TabletServerBatchWriter.java:858) ~[accumulo-testing-shaded.jar:?]
at org.apache.accumulo.core.clientImpl.TabletServerBatchWriter$MutationWriter$SendTask.run(TabletServerBatchWriter.java:816) ~[accumulo-testing-shaded.jar:?]
at org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52) ~[accumulo-testing-shaded.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52) ~[accumulo-testing-shaded.jar:?]
at java.lang.Thread.run(Thread.java:829) ~[?:?]
Caused by: org.apache.accumulo.core.clientImpl.thrift.ThriftSecurityException
at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$update_result$update_resultStandardScheme.read(TabletClientService.java:9080) ~[accumulo-testing-shaded.jar:?]
at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$update_result$update_resultStandardScheme.read(TabletClientService.java:9066) ~[accumulo-testing-shaded.jar:?]
at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$update_result.read(TabletClientService.java:9000) ~[accumulo-testing-shaded.jar:?]
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:88) ~[accumulo-testing-shaded.jar:?]
at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_update(TabletClientService.java:280) ~[accumulo-testing-shaded.jar:?]
at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.update(TabletClientService.java:263) ~[accumulo-testing-shaded.jar:?]
at org.apache.accumulo.core.clientImpl.TabletServerBatchWriter$MutationWriter.sendMutationsToTabletServer(TabletServerBatchWriter.java:931) ~[accumulo-testing-shaded.jar:?]
... 7 more
This one I got rarely and by itself.
22-09-10T14:46:32,567 [testing.randomwalk.Framework] ERROR: Error during random walk
java.lang.Exception: Error running node security.TableOp
at org.apache.accumulo.testing.randomwalk.Module.visit(Module.java:364) ~[accumulo-testing-shaded.jar:?]
at org.apache.accumulo.testing.randomwalk.Framework.run(Framework.java:54) ~[accumulo-testing-shaded.jar:?]
at org.apache.accumulo.testing.randomwalk.Framework.main(Framework.java:98) ~[accumulo-testing-shaded.jar:?]
Caused by: java.lang.UnsupportedOperationException: GET_SUMMARIES not implemented
at org.apache.accumulo.testing.randomwalk.security.TableOp.visit(TableOp.java:296) ~[accumulo-testing-shaded.jar:?]
at org.apache.accumulo.testing.randomwalk.Module.lambda$visit$0(Module.java:299) ~[accumulo-testing-shaded.jar:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52) ~[accumulo-testing-shaded.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52) ~[accumulo-testing-shaded.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52) ~[accumulo-testing-shaded.jar:?]
at java.lang.Thread.run(Thread.java:829) ~[?:?]
2022-09-10T14:46:32,577 [testing.randomwalk.Framework] INFO : Test finished
More testing needs to be done, but my current guess is the initial error is caused by the user not being dropped as it is supposed to be before the authentication (as seen in the xml). This happens in the NoUserNoTable node (the first large node). Though the error isn't very clear there.
I think the security RW test are broken
I agree. I dropped the test from the other modules in https://github.com/apache/accumulo-testing/pull/221
I narrowed down the line of code that causes the module to fail. It fails on the expected authenticate fail case which all it does is increment the password passed into authenticateUser which should cause the check to fail. Though at times once it gets to https://github.com/apache/accumulo/blob/cf57e343de77f416de10c22e0ed05ef37f433b36/server/base/src/main/java/org/apache/accumulo/server/security/handler/ZKAuthenticator.java#L231, which verifies the cryptPass, it passes.
That function compares the crypt hash with the new password token that is passed in from the module. Based on the password incrementation, it should return false however, it seems a decent amount of times it returns true instead. Here: https://github.com/apache/accumulo/blob/cf57e343de77f416de10c22e0ed05ef37f433b36/server/base/src/main/java/org/apache/accumulo/server/security/handler/ZKSecurityTool.java#L134-L136. If this returns true, then the entire module returns true when it is expected to be false.
Not quite sure why this is occurring since it is seemingly random on when the module fails or when it passes but is consistent on which point it fails at.
This can happen just as often with a single tserver because the manager can also be selected to perform the security operation, so the tserver is still only eventually consistent with the ZooKeeper quorum that was updated by the manager.