ruby-spark icon indicating copy to clipboard operation
ruby-spark copied to clipboard

Encoding::UndefinedConversionError: "\x8B" from ASCII-8BIT to UTF-8

Open alex-silentale opened this issue 10 years ago • 4 comments

2.1.3 :036 > rdd2 = sc.parallelize(["jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj"])
Encoding::UndefinedConversionError: "\x8B" from ASCII-8BIT to UTF-8
    from bundler/gems/ruby-spark-2287d5a71670/lib/spark/ext/io.rb:37:in `write'
    from bundler/gems/ruby-spark-2287d5a71670/lib/spark/ext/io.rb:37:in `write_int'
    from bundler/gems/ruby-spark-2287d5a71670/lib/spark/ext/io.rb:48:in `write_string'
    from bundler/gems/ruby-spark-2287d5a71670/lib/spark/serializer/batched.rb:53:in `block in dump_to_io'
    from bundler/gems/ruby-spark-2287d5a71670/lib/spark/serializer/batched.rb:51:in `each_slice'
    from bundler/gems/ruby-spark-2287d5a71670/lib/spark/serializer/batched.rb:51:in `each'
    from bundler/gems/ruby-spark-2287d5a71670/lib/spark/serializer/batched.rb:51:in `dump_to_io'
    from bundler/gems/ruby-spark-2287d5a71670/lib/spark/context.rb:215:in `parallelize'
    from (irb):36
    from bundler/gems/railties-3.2.13/lib/rails/commands/console.rb:47:in `start'
    from bundler/gems/railties-3.2.13/lib/rails/commands/console.rb:8:in `start'
    from  bundler/gems/railties-3.2.13/lib/rails/commands.rb:41:in `<top (required)>'
    from script/rails:6:in `require'
    from script/rails:6:in `<main>'

I Also tried Oj as a serializer, and get the same error. It seems to be coming from IO or StringIO

alex-silentale avatar Dec 17 '15 15:12 alex-silentale

What serializer are you using?

ondra-m avatar Dec 17 '15 15:12 ondra-m

@ondra-m I've tried marshal and oj

alex-silentale avatar Jan 05 '16 16:01 alex-silentale

Getting the same error but with floats instead of strings. Tried both Marshal and oj. My data is kinda big, so it's hard to debug. Any ideas?

To reproduce:

sc.parallelize [LabeledPoint.new(1, [1,2]), LabeledPoint.new(3, [1,6])]
# => Encoding::UndefinedConversionError: "\xDF" from ASCII-8BIT to UTF-8
from /home/chang/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/packable-1.3.8/lib/packable/extensions/io.rb:62:in `write'

Coolnesss avatar Jan 12 '17 00:01 Coolnesss

It works for me.

  1. What is your ruby spark version Spark::VERSION? My is 1.2.1

  2. Please post full backtrace.

ondra-m avatar Jan 14 '17 18:01 ondra-m