Why is Lucene.Net.Store.Azure throwing IndexNotFounException "no segments* file found in AzureDirectory" exception after modifying the index?
For some reason when I delete a document from a Lucene index it is throwing a "No Segements Found" exception. This only happens when trying to load the data stored remotely in Azure blob storage. The local index updates fine and continues to operate until the local copy get cleared. Then when the issue present when it tries to load the index from remote Azure blob storage then the error presents. I am able to duplcate this by making a modifation to the index, deleting the local index immediatly, then to reread the index so that it has to download from blob storage again. There are files in both the local and remote paths after this method runs.
Secondarily, I am also having an issue where it seems the write locks on local temp files do not release with out me having to make an explicit call to the garbage collector.
If anyone has any insight as to what I am doing wrong I would greatly appreciate it.
.Net Core 5.0 Lucene.Net 4.8.0-beta0016 Lucene.Net.QueryParser 4.8.0-beta16 Lucene.Net.Store.Azure 4.8.0-beta15
I tried reverting Lucene.Net and Lucene.Net.QueryParser to 4.8.0-beta15 to match Lucene.Net.Store.Azure but the results are still the same.
private void IndexDocument(List<MyModel> toDelete)
{
string connectionString = "My connection string info.";
string remotePath = "my/remote/path";
List<string> excludes = toDelete.Select(o => o.Id).ToList();
string localPath = @"C:\my\local\path\";
AzureDirectory folder = new AzureDirectory(connectionString, remotePath, FSDirectory.Open(localPath));
StandardAnalyzer analyzer = new StandardAnalyzer(LuceneVersion.LUCENE_48);
IndexWriterConfig config = new IndexWriterConfig(LuceneVersion.LUCENE_48, analyzer);
IndexWriter writer = new IndexWriter(folder, config); // Create writer
foreach (var exclude in excludes)
{
Term term = new Term("MyId", exclude);
writer.DeleteDocuments(term);
}
writer.Commit();
folder.ClearLock("write.lock");
writer.Dispose();
folder.Dispose();
// The locks on the local files do not get released until garbage collector is called.
// https://stackoverflow.com/questions/13262548/delete-a-file-being-used-by-another-process
GC.Collect();
GC.WaitForPendingFinalizers();
}
We are having a same looking issue regarding missing segment exceptions when reloading from blob. After a morning with a lot of coffee It looks like the result of public override string[] ListAll() AzureDirectory.cs#L88 can in some cases give stale results which causes the AzureIndexInput.ctor to throw an exception when it is trying to determine if a blob has to be reloaded.
We have a separate reader and writer role (Azure Web Apps) which reads and writes from and to the index once a minute. I don't think this 1 minute timing is causing the issue since running them in a different cadence could also trigger a situation where the ListAll gives a dirty read.
@tomlm As a fix I would propose to change the else case to an else if (blob.Exists()) in AzureIndexInput#L42. This creates a situation where the reader may be out of date, but that's a trade off.
Release 4.8.3-beta016 has updated logic around this.
I now correctly handle Blob not found exceptions to remove from local cache and propagate as FIleNotFoundExceptions.
I think this will improve this.