ruby-spark
ruby-spark copied to clipboard
Encoding::UndefinedConversionError: "\x8B" from ASCII-8BIT to UTF-8
2.1.3 :036 > rdd2 = sc.parallelize(["jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj"])
Encoding::UndefinedConversionError: "\x8B" from ASCII-8BIT to UTF-8
from bundler/gems/ruby-spark-2287d5a71670/lib/spark/ext/io.rb:37:in `write'
from bundler/gems/ruby-spark-2287d5a71670/lib/spark/ext/io.rb:37:in `write_int'
from bundler/gems/ruby-spark-2287d5a71670/lib/spark/ext/io.rb:48:in `write_string'
from bundler/gems/ruby-spark-2287d5a71670/lib/spark/serializer/batched.rb:53:in `block in dump_to_io'
from bundler/gems/ruby-spark-2287d5a71670/lib/spark/serializer/batched.rb:51:in `each_slice'
from bundler/gems/ruby-spark-2287d5a71670/lib/spark/serializer/batched.rb:51:in `each'
from bundler/gems/ruby-spark-2287d5a71670/lib/spark/serializer/batched.rb:51:in `dump_to_io'
from bundler/gems/ruby-spark-2287d5a71670/lib/spark/context.rb:215:in `parallelize'
from (irb):36
from bundler/gems/railties-3.2.13/lib/rails/commands/console.rb:47:in `start'
from bundler/gems/railties-3.2.13/lib/rails/commands/console.rb:8:in `start'
from bundler/gems/railties-3.2.13/lib/rails/commands.rb:41:in `<top (required)>'
from script/rails:6:in `require'
from script/rails:6:in `<main>'
I Also tried Oj as a serializer, and get the same error. It seems to be coming from IO or StringIO
What serializer are you using?
@ondra-m I've tried marshal and oj
Getting the same error but with floats instead of strings. Tried both Marshal and oj. My data is kinda big, so it's hard to debug. Any ideas?
To reproduce:
sc.parallelize [LabeledPoint.new(1, [1,2]), LabeledPoint.new(3, [1,6])]
# => Encoding::UndefinedConversionError: "\xDF" from ASCII-8BIT to UTF-8
from /home/chang/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/packable-1.3.8/lib/packable/extensions/io.rb:62:in `write'
It works for me.
-
What is your ruby spark version
Spark::VERSION? My is1.2.1 -
Please post full backtrace.