Server crashes with StringIndexOutOfBoundsException when processing Macedonian (mk) templates using 'Шаблон:' namespace
Describe the bug Server crashes during startup with StringIndexOutOfBoundsException when processing Macedonian (mk) language template redirects. Many Macedonian Wikipedia templates use 'Шаблон:' (Cyrillic for "Template") instead of the expected 'Предлошка:' namespace prefix, causing the substring operation to fail with index -1.
The crash occurs at:
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.substring(String.java:1931)
at org.dbpedia.extraction.server.stats.MappingStatsHolder$$anonfun$1.apply(MappingStatsHolder.scala:54)
Over 100+ mk templates are affected, including:
- Шаблон:Инфокутија Верски објект
- Шаблон:2TeamBracket
- Шаблон:Инфокутија Православна црква
- Шаблон:Оклопно возило
- And many more...
Expected behaviour The server should either:
- Handle alternative namespace prefixes for mk language (recognizing both 'Предлошка:' and 'Шаблон:')
- Log warnings and skip invalid templates without crashing (PR #795 provides a temporary fix for this)
- Successfully start and process mk language templates without throwing exceptions
Environment
-
Extraction: (commit hash):
5eb208b932a63a6f0cd5cbede3e446315686e6a7(enable-wikidata-server branch) - OS: Linux 6.14.0-33-generic (Ubuntu)
- Java SDK Version (java --version): 1.8.0_462 (OpenJDK)
- Maven version (mvn --version): Apache Maven 3.8.7
To reproduce
- Enable Macedonian (mk) language in
server.default.propertieswith@mappingsor any configuration - Start the DBpedia extraction server:
cd server && ../run server - Server attempts to load mk template statistics
- Server crashes with StringIndexOutOfBoundsException during MappingStatsHolder initialization
Additional context & logs
The root cause is in MappingStatsHolder.scala:54 where the code expects all templates to start with 'Предлошка:' but many mk templates use 'Шаблон:'.
Related:
- PR #795 fixes the crash by adding validation but doesn't address the namespace mismatch
- This may require updating the mk language configuration to recognize 'Шаблон:' as a valid template namespace
Full stack trace:
org.dbpedia.extraction.server.stats.MappingStatsHolder$$anonfun$apply$2 apply
WARNING: mk template 'Шаблон:Инфокутија Верски објект' does not start with 'Предлошка:'
[... 100+ similar warnings ...]
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at scala_maven_executions.MainHelper.runMain(MainHelper.java:164)
at scala_maven_executions.MainWithArgsInFile.main(MainWithArgsInFile.java:26)
Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.substring(String.java:1931)
at org.dbpedia.extraction.server.stats.MappingStatsHolder$$anonfun$1.apply(MappingStatsHolder.scala:54)
at org.dbpedia.extraction.server.stats.MappingStatsHolder$$anonfun$1.apply(MappingStatsHolder.scala:54)
at scala.collection.MapLike$FilteredKeys$$anonfun$foreach$1.apply(MapLike.scala:231)
[...]
Thanks for the report — this reproduces locally. The root cause is that MappingStatsHolder assumed templates start with 'Предлошка:'; many mk templates use 'Шаблон:' so substring(...) fails.
I’m preparing a small fix that:
- accepts both 'Предлошка:' and 'Шаблон:' for mk,
- logs and skips unexpected templates instead of crashing,
- adds tests.
If maintainers prefer a config-driven approach, I can read valid template prefixes from the language config instead of hardcoding. Please assign this to me if you want me to proceed with a PR.
Greetings @haniyakonain , I am interested to work on the issue. Can you please assign this me. ThankYou
Thanks for the interest! @arnavsharma990 and @DhanashreePetare feel free to coordinate on this. @arnavsharma990 approach of accepting both 'Предлошка:' and 'Шаблон:' for mk with proper logging sounds good. Looking forward to seeing a PR from whoever gets to it first!
Hello @haniyakonain. I have followed following approach for solving the issue:
- Query ALL valid template prefixes from the Namespaces configuration
- Match templates against ANY valid prefix instead of just one
- Apply the same logic to redirect processing
This prevents the Macedonian crashes (‘Предлошка:’/‘Шаблон:’) and stays compatible for languages with a single prefix. Requesting a review for the same. Please let me know about any further changes needed. ThankYou.
Hi @DhanashreePetare, Thanks for working on this! I noticed the PR includes many unintended file deletions (1,900+ files). Could you please update it to include only the changes to MappingStatsHolder.scala? Thanks!
Thank you for the feedback! I tried working on the same. I got those deleted files pulled into my work and my first PR were with those..in 2nd PR i just committed 1 changed file on a clean branch..but still when I am trying to resolve conflicts..I'm encountering file path conflicts due to Windows limitations with some test files (colons in filenames like Lexeme:L11/wiki.xml.bz2).
My actual fix is very clean - just the changes to MappingStatsHolder.scala. Is there any way out of this..Would you be able to cherry-pick commit cd3c26b03 from the fix/issue-804-clean branch? That contains only the fix without any other changes.
Or if you can suggest me some fix to this..it would be great.
@DhanashreePetare, thanks for working on this.
The Windows path limitation with : filenames is causing the conflicts, and you’ll need to resolve that on your side (e.g. via WSL/Linux or a clean branch with only the intended change).
For now, I’m merging a temporary fix to prevent the mk crash so the server stays operational. Proper mk namespace handling (Предлошка: / Шаблон:) is still needed, and I’ll be happy to review a clean PR for that later.
@haniyakonain and @jimkont..I will resolve the deleted files issue by using WSL..and submit a PR with the changes shortly.