Tuesday, 24 December 2013

C++ data types

(This is a summary of what I read from Bjarne Stroustrup's book- 4th edition(2013))

Not all the rules of C++ are specific. What I mean is that there is a standard (ISO14882:2011) which defines rules of C++ but the standard has left several gaps to be filled in by the compiler implementation and the platform on which the libraries are compiled. For ex: size of inbuilt data types is not specifically defined. Standard specifies char to be more than 8-bit. This means that in any implementation of C++, irrespective of the platform, char is guaranteed to be at least 8-bit, but whether char is going to be exactly 8 bit is implementation and machine dependent. If char was standardized to be exactly 8 bit throughout, that would have made our life easier but that couldn't be done because there are some character sets which are more than 8-bit. ASCII is 7-bit (128 characters) but Unicode and UCS(universal character set) are more than 8-bit. Thus, size of char can't be limited to 8-bit, so the standard plainly provides a lower limit.

One can divide the data types in C++ in majorly two headings viz. built-in data types and user defined data types. User defined data types are classes, struct, enum and enum class. Built in data types can be further divided as integral (bool, char and int) and floating point (double) data types.

bool: bool can store two values. Non zero integer values convert to true, zero converts to false. a null pointer (nullptr) converts to false and a valid pointer converts to true.

char: char can be "normal" char, unsigned char, signed char, char16_t,char32_t. These are 6 distinct char types.
Is "normal" char (i.e. simply using char without the mention of sign) signed or unsigned? This is implementation dependent. It could be signed or unsigned.The problem occurs when you do something like:
                                  char c; cin>>c;
                                  int ctoi_c = (int)c;
Now ctoi_c can be between 0 to 255 if char is represented by 8 bit unsigned char and between -127 to 127 for 8 bit signed char. (Note: It can't be between -128 to 127 as C++ can be used on machines using 1's complement representation which will not have -128 in 8-bit signed numbers, thus for portability char can never be -128). <limits> header has a numeric_limits template which has a is_signed static member constant for checking if the char type is signed.
What should be done than to ensure portability?
Solution: One way is to totally avoid "normal" char but some standard library funtions like strcmp() take plain char only. So the best way around is to use "normal" char and avoid negative character values.

int type:
Like char there are three possibilities: "normal" int, unsigned int and signed int. Here, we have choice of size as well, we got short int, int, long int and long long int. Unlike plain char, normal int is always signed. Using unsigned to ensure that some value is always positive is not a good idea, there are implicit conversion rules. So, when do we actually use unsigned? Well, it is a very subjective question and I find varied opinions over this on internet. For now, I have decided to stick to using unsigned only if some bitwise operations are involved and otherwise use signed even if I am sure that the value is going to be positive. To mix unsigned and signed arithmetic can again cause bugs because the behaviour may be implementation or system dependent. 

Size matters: Size of above defined integer type is implementation dependent. Later we will see what the standards actually guarantee about size, but if you really need to be specific <cstdint> provide several aliases. ex: int64_t(guarantee of 64 bit), uint_fast16_t(whatever that means?) or int_least32_t (atleast 32 bit guarantee). It is important to know size subtleties from point of view of portability.
Note about types in <stdint>: Types like int32_t etc. (_t suffix indicates typedef) are defined in stdint.h . These are just typedef(or aliases, both are same) and not  actual types(except char16_t and char32_t, which are not aliases). Therefore, int32_t is just an alias of some fundamental type which is 32bit. The way it helps in portability of code is that if a system has a 32 bit int and another system has 16 bit int, using int32_t ensures that we have used consistent data types (may be int on former and long on latter). But, remember there are no guarantees, if there is no fundamental type for alias. For example: Suppose you define variable of type uint8_t but all the int types are greater than 8bit, than uint8_t is undefined. (source:http://www.cplusplus.com/reference/cstdint/)

What size does C++ standard guarantee? 
Taking sizeof(char) as 1. (how many bit is that 1 actually going to be is implementation dependent, only guarantee is that it is atleast 8 bit).
• 1 ≡ sizeof(char>=8bit) ≤ sizeof(short>=16bit) ≤ sizeof(int>=16bit) ≤ sizeof(long>=32bit) ≤ sizeof(long long>=64bit)
* sizeof(N) ≡ sizeof(signed N) ≡ sizeof(unsigned N) (i.e. size of signed , unsigned and normal is same)
Now, what happens typically is char is 8 bit, int is 32 bit, but this can't be assumed.

Size of a pointer is not necessarily same as the size of the type it is pointing to. 

 Using sizeof() operator: sizeof() is a unary operator which can be used to find the size of built in types. Here is the result for my system(gcc 4.7.2 on 64bit machine):
  • Sizeof char: 1
  • Sizeof int:4
  • Sizeof short:2
  • Sizeof long:8
  • Sizeof long long:8
But actually how many bit does a 1 correspond to in the above output? 

<limits> header provides a way to see the system dependent information like maximum and minimum value and if a type is signed or not etc. Some of the results of my system are: 
  • Minimum value of unsigned int0
  • Maximum value of unsigned int4294967295
  • Minimum value of char-128
  • Maximum value of char127 (2^7-1)
  • Minimum value of int-2147483648 (2^31)
  • Maximum value of int2147483647 (2^31-1)
  • Is char signed?1
  • Is int signed?
From this result, I can infer that on my system, char is 8 bit and rest of the types are scaled according to the multiples of char found by sizeof() operator.












No comments:

Post a Comment