Unexpected behaviour of print(char, HEX) and print(int, HEX)
char c = 255;
Serial.print(c, HEX);
produces
FFFFFFFF
This is due to the fact that print(char c, int base) convert char c (1 byte) into an unsigned long (4 bytes). As char is signed 255 is in fact -1(Two's complement), once written in 4 bytes it becomes FFFFFFFF. But as a user I would rather have FF than FFFFFFFF.
The problem exist with int (2 bytes) too
Here is a more complete example:
void setup() {
Serial.begin(9600);
int i = 1;
Serial.print("i: ");
Serial.print(i, 10);
Serial.print(" | ");
Serial.println(i, HEX);
i = -1;
Serial.print("i: ");
Serial.print(i, 10);
Serial.print(" | ");
Serial.println(i, HEX);
char c = 127;
Serial.print("c: ");
Serial.print(c, 10);
Serial.print(" | ");
Serial.println(c, HEX);
c = 255;
Serial.print("c: ");
Serial.print(c, 10);
Serial.print(" | ");
Serial.println(c, HEX);
}
void loop() {}
output
i: 1 | 1
i: -1 | FFFFFFFF
c: 127 | 7F
c: -1 | FFFFFFFF
where I owuld rather have
i: 1 | 1
i: -1 | FFFF
c: 127 | 7F
c: -1 | FF
One can argue that doing a hexadecimal print of a negative number isn't well defined at all, IIRC only the decimal (base 10) printing supports negative numbers currently. Still, I guess this is something that can be fixed - if the current behaviour is not well defined, might as well improve that. I'll have a look at your PR next.
Your code is broken from the start, is it not?
char c = 255; ??
char a = 255;
unsigned char b = 255;
void setup() {
Serial.begin( 9600 );
Serial.println( a, HEX );
Serial.println( b, HEX );
}
void loop() {}
Output:
FFFFFFFF FF
@matthijskooijman This is well defined behavior also.
As the signed bit pattern of a is -1 in a char, then the unsigned long gets the bit pattern which represents the same number as if it were signed (All 1's).
If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer.
To maintain the -1 value, the sign bit needs to be extended across the whole variable. The bit pattern before the sign bit will not change. The value of the bit pattern is only different when using the unsigned representation (after conversion, obviously represents a +ve, not a -ve).
When using b (already unsigned), there is no change as there is no sign bit.
@Chris--A, I believe all current behaviour is well-defined, just not what the original poster expected (and, IMHO his expections do not seem unreasonable).
The main reason for this is that printing things in HEX (or any other non-decimal base) forces the value to be interpreted as an unsigned value. However, currently a value is first sign-extended and then interpreted / printed as an unsigned value, while the original reporter expected his original value to be directly interpreted as unsigned instead. In effect, this means that when printing signed values in base 10, they should be sign-extended, when using any other base, they should be zero-extended. This is also the approach the pull request ended up at.
Considering this again, I'm not sure if it would be a good idea to actually make this change. It adds some overhead (both runtime and code size) for a corner case that is not used often. If someone really needs this behaviour, it would be just as easy to simply cast the signed value to an unsigned value of the same size before printing, making the intended behaviour explicit. I even believe that this will result in more efficient code than handling this inside the print functions as well.
@Chris--A To be clear, char c = 255; was just a way to assign char c = -1;. My problem was only to see additional Bytes appear. This can be seen as inconvenient.
@matthijskooijman I added a last commit in order to have code closer to what you suggest first (without bit mask). The cost is still the same though. I would argue that the overhead exist only when calls to print are made with the base parameter. (Plus the over head at runtime is only one if, because the casting is done anyway.)
But the same behaviour can indeed be obtained by a simple cast to unsigned without modifying the current code.
I think the main problem here is that print(char) is for printing chars as characters, not as numbers. There is no method defined for print that takes a char as first argument and an extra second argument. Therefore, print(char, HEX) will convert the char to another type (I don't know what are the conversion rules for this in C++), and that type happens to be represented like that.
My advice: don't print a char. Printing a char directly is meant for printing it as a character. Instead, consider converting it to a byte:
Serial.print(byte(c), HEX)
(I think this should work)
You can see the types supported by print in https://github.com/arduino/Arduino/blob/master/hardware/arduino/avr/cores/arduino/Print.h
About int printing too many Fs, well in this case maybe something should be done; as I could see from Print.cpp it just casts int to long and prints it. Plus the function to print signed longs could be changed to always print a sign, for example print(-255) could print -FF, which in my opinion would be the "correct" representation of -255 in base 16.
EDIT: I created pull request arduino/Arduino#4535 proposing this change, what do you think?
EDIT 2: Alternatively, you can opt for the same workaround I suggested for bytes and convert ints to unsigned ints (which range from 0 to FFFF), like Serial.print(unsigned(i), HEX)