filetail, log file move to history dir ,how to configure file generation and archive strategies
I have nginx log file to collect in "/data/logs/nginx/xyz.log" every day ,mv the log and compress to /data/logs/nginx/2017/05/xyz.log.tar.gz and recreate new log file /data/logs/nginx/xyz.log how to configure file generation and archive strategies ? i try use default "Active File with Reverse Counter " naming type ,but i found streamsets try to collect /data/logs/nginx/2017 ,but that is a dir
I'm not clear on what you want SDC to do. Do you want it to read the xyz.log.tar.gz file each day?
no,i only need to collect current xyz.log but ,if I move the log to a dir that in the same dir , when multifilereader to refresh offset and recompute header hash ,they will coccur java.io.FileNotFoundException,becase ,that dir not exclude
I think LiveFile.java -> refresh() should change to this
if (changed) { try (DirectoryStream<Path> directoryStream = Files.newDirectoryStream(path.getParent())) { for (Path path : directoryStream) { if (!path.toFile().isDirectory()){ BasicFileAttributes attrs = Files.readAttributes(path, BasicFileAttributes.class); String iNode = attrs.fileKey().toString(); int headLen = (int) Math.min(this.headLen, attrs.size()); String headHash = computeHash(path, headLen); if (iNode.equals(this.iNode) && headHash.equals(this.headHash)) { if (headLen == 0) { headLen = (int) Math.min(HEAD_LEN, attrs.size()); headHash = computeHash(path, headLen); /*get file header content and compute md5 as hashvalue/ } refresh = new LiveFile(path, iNode, headHash, headLen); break; } }
}
}
if use filetail to collect logs ,then at the same log dir,not permit subdir exist, otherwise livefile refresh will throw exception, but in real product env , logs aways compress and mv to subdirs, so I think,if the file has been renamed in the same dir ,then fresh, but if the file has been deleted or mv to away ,refresh should return null
@sumpan Have you tried the above fix? Is it working for you?
yes , I change some code ,and i works fine ,here's the pull request https://github.com/streamsets/datacollector/pull/27