php-rdkafka icon indicating copy to clipboard operation
php-rdkafka copied to clipboard

Reading messages from a C# producer

Open dobryakov opened this issue 5 years ago • 10 comments

Hello, we have a problem with decoding messages containing Decimal values (in terms of C# data types).

For example, when producer (based on C# application) send us value like "1234.56", we receive message and get in PHP data something like "\000\000\000\000�Ĥ���]Hf47b6ffc".

Is there any idea how to solve it?

dobryakov avatar Dec 14 '20 13:12 dobryakov

Does the rest of the data look fine?

nick-zh avatar Dec 14 '20 14:12 nick-zh

Does the rest of the data look fine?

In most cases no :( The rest data looks like �Ĥ��� till the end of body.

dobryakov avatar Dec 14 '20 14:12 dobryakov

are you maybe using compression in c# and did not configure it on the php side?

nick-zh avatar Dec 14 '20 14:12 nick-zh

are you maybe using compression in c# and did not configure it on the php side?

Compression? Hmm.. Could you please point me to the manuals describing this?

dobryakov avatar Dec 14 '20 14:12 dobryakov

php-rdkafka is basically just a wrapper around librdkafka, configuration settings can be found here: https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md for compression you can set compression.codec. It looks like your producer in C# is using it

nick-zh avatar Dec 14 '20 14:12 nick-zh

Unfortunately no. As we investigated, in the best case the library gives us decimal value as set of bytes. Looks like { value: "�Ĥ���" } in json. The single working solution for the moment is calling hexdec(bin2hex(message['value'])), and you need to know the decimal point position (AKA "exponent"). Typically it is described at avro scheme, and Deserializer should ask Schema Registry for it. But in this case we need to do it manually, for example $1234567 convert to $12345.67 Very strange. I will continue to investigate tomorrow and come back to you with details.

dobryakov avatar Dec 15 '20 15:12 dobryakov

I see the official C# client is also just a wrapper for librdkafka, so i am sure we can get it to work. Do you use avro schemas for producing messages?

nick-zh avatar Dec 15 '20 21:12 nick-zh

I think it's just the fact that C# in this case saves floating point number in binary format, which is not something PHP can recover on it's own (@dobryakov even mentioned that exponent has to be known).

I think your library @nick-zh (https://github.com/jobcloud/php-kafka-lib) supports avro schemas, so it might work in this case?

Steveb-p avatar Dec 16 '20 06:12 Steveb-p

@nick-zh @Steveb-p We are using jobcloud library too. Doesn't work out-of-the-box at the moment. We use the hack with hexdec(bin2hex()) and convert values manually ($1234567 to $12345.67 mentioning the exponent known from avro scheme visually). I would be happy if someone improve it to do this convertation out-of-the-box, including the exponent :)

dobryakov avatar Dec 16 '20 07:12 dobryakov

Thx guys for the feedback, i will have some time over the holidays, but there are a lot of topics i want to tackle, so can't promise anything. But i will try to find out where stuff is being done differently and if we can recover from it. If it is something that the C# does different from other official clients, i think we wont fix it. We are using a java producer as well in one of our projects, i will check if we have a similar issue there (not sure if we do use doubles there). @dobryakov if you can provide more input (like a sample schema and a sample producer to reproduce it), that would be great, this would help me progress faster on this :v: Are you using logical types in Avro?

nick-zh avatar Dec 16 '20 08:12 nick-zh