Support for Windows-1252/CP1252 database encoding - Java and Node.js encoding mismatch
🚨 Problem
When connecting to databases that use Windows-1252 encoding (common in legacy systems), special characters (accents, cedillas, etc.) are not handled correctly. The library currently forces UTF-8 encoding on the Java bridge, but doesn't provide proper configuration for databases using different character encodings.
🔍 Root Cause
The issue occurs because there's an encoding mismatch between:
- Java side: Currently hardcoded or defaults to UTF-8
-
Node.js side: Uses the
encodingparameter for both reading stdout and writing to stdin
When the database uses Windows-1252 encoding, the Java bridge should use Cp1252 to properly communicate with the database, while Node.js should use latin1 (the closest compatible encoding) for the IPC
communication.
✅ Current Workaround
We successfully resolved this by patching the library to:
-
Set Java encoding:
"-Dfile.encoding=Cp1252" -
Configure Node.js encoding:
latin1
The data flow becomes: Database (Windows-1252) ↔ Java Bridge (Cp1252) ↔ Node.js (latin1)
💡 Proposed Solution
The library could be enhanced to support separate encoding configurations:
new Sybase(host, port, dbname, username, password, logTiming, pathToJavaBridge, {
encoding: 'latin1', // Node.js encoding for IPC
javaEncoding: 'Cp1252', // Java encoding for database communication
extraLogs: false
});
This would allow proper handling of different database encodings while maintaining backward compatibility.
🏢 Example Use Case
Legacy systems using Windows-1252 encoding (common in enterprise environments) where proper character encoding is critical for data integrity.
📁 Files Affected
- src/SybaseDB.js: Lines around spawn() call and stdin.write()