delta icon indicating copy to clipboard operation
delta copied to clipboard

The exception should be thrown up when exceptional vacuum

Open scanry opened this issue 5 years ago • 3 comments

e.g.
DeltaTable deltaTable = DeltaTable.forPath(sparkSession, tablePath); deltaTable.vacuum(); If it don't has permission of hdfs, that it's hard to see that it's a failure.

scanry avatar Jun 24 '20 00:06 scanry

Vacuum is currently a best-effort operation that tries to delete files but does not fail if a file fails to be deleted for some reason. Maybe it is too aggressive about ignoring errors.

Can you find out from the log4j logs what is the actual exception that is being ignored by vacuum?

tdas avatar Jun 24 '20 16:06 tdas

I had faced same situation on delta 0.6.1 and cannot found actual exception logs on driver. It seems ignored on this method by catching IOException from fs.delete(). https://github.com/delta-io/delta/blob/f587c1d117e9ef907a377604eb78b1e2c705ef6c/src/main/scala/org/apache/spark/sql/delta/util/DeltaFileOperations.scala#L228-L236

Execute fs.delete() without catching IOException throw AccessControlException and RemoteException as root cause like below. (blurred actual path and user ids)

org.apache.hadoop.security.AccessControlException: Permission denied: user=<NotPermittedUser>, access=WRITE, inode="<path/i/wanted/to/vacuum>":<OwnerUser>:<OwnerUser>:drwxr-xr-x
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:292)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:216)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1706)
	at org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.delete(FSDirDeleteOp.java:88)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3712)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:1015)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:611)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1659)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)

  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
  at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
  at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
  at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:2046)
  at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:707)
  at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:703)
  at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
  at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:714)
  ... 48 elided
Caused by: org.apache.hadoop.ipc.RemoteException: Permission denied: user=<NotPermittedUser>, access=WRITE, inode="<path/i/wanted/to/vacuum>":<OwnerUser>:<OwnerUser>:drwxr-xr-x
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:292)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:216)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1706)
	at org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.delete(FSDirDeleteOp.java:88)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3712)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:1015)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:611)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1659)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)

Both exceptions are extended from IOException.

magarage avatar Oct 09 '20 13:10 magarage

Although vacuum is a best-effort operation, dropping exceptions such as permission errors. Log all of them is also not ideal. One thing we can improve is making vacuum still log the first IOException it ignores.

zsxwing avatar May 24 '22 16:05 zsxwing

Not sure if this is what you meant, but anyway here is a PR to hopefuly solve this https://github.com/delta-io/delta/pull/1405

amirmor1 avatar Oct 02 '22 12:10 amirmor1