Notes to myself: Best practices in CS: 2013

Wednesday, 25 December 2013

C++: Some simple things that I learnt after reading Steven Gribble's code from Courseera assignment

1) Use command line parameters to pass obvious inputs from the user.
Give the error message with the right input format, to be accepted by your code.

2) What are uintN_t types? When should we use them? How are they different from traditional integer types like short and int? When should we use unsigned values?
I found these questions in one form or the other in several discussion forums like stack overflow etc. What I can make of it is that the use is subjective between unsigned and signed int. But the rule of thumb is never mix unsigned and signed types. Now coming to uintN_t types, these are fixed types that is there size doesn't depend on the system. For example uint16_t will always be 16bits on any system. They are therefore helpful in portability.(Check this:http://notestomyselfcs.blogspot.in/2013/12/c-data-types.html)

3) Use cerr in place of cout to display error messages.
cerr, cout and clog are objects of class ostream and are used to send characters to some destination. The destination of cerr and cout is both console, but it can be changed so that all the output (cout) is diverted to some other place and errors can be kept separate. Ex: If you run a prog as ./a.out>myresult, results of cout will go to myresult and output of cerr to screen. Another difference is that cout and clog are buffered while cerr is not. What buffering means is the output is temporarily stored before being dumped on the screen in one go, this makes the output faster. cerr on the other hand is unbuffered, that is every character is put on the screen as it occurs. This is useful because if the program crashes the contents of the buffer may be lost, however cerr will still be able to give the output. This can also help in debugging by using cerr to print messages like cerr<<"Entering dangerous function"<<endl; instead of cout.

4) Use EXIT_FAILURE in place of return 1 on error.
Using return EXIT_FAILURE on failure or return EXIT_SUCCESS is a portable way of exiting a program.

5) Define every class in a .h file and import the .h file when the class is needed. Develop every class as a standalone library which can be reused.

6) Q: what is the strange #ifndef _FILE_H_
#define _FILE_H_
. ...#endif
I see in the header files? Why the strange nomenclature of using _FILE_H_?
#ifndef, #defin, #endif are preprocessor directives. The aim of above exercise is to prevent multiple declaration of variables, functions etc. when the header file is included. In simple terms, the compiler works in the following way. Before the real compilation, all the header files are included which means they are kind of copied and pasted in the code. Now if some file is imported more than once it will lead to multiple declaration error. So _FILE_H is defined when the file is included. If the file is included again, preprocessor checks that _FILE_H_ is defined and will not include the file again. Generally .h files will have the definitions and declaration and the .cpp file will have the implementation. Each .cpp file will get converted to a .o(object) file, which are then linked by the linker to produce executable
Coming to the naming part, I am confused if I can create macro variables(like _FILE_H_) which begins with an underscore. I read that identifiers starting with _ followed by capital letter or another _ are reserved for implementation (compiler or library) but still I see codes which use macro variables that start with _. Need some clarification?

7) When to create a new namespace?
We can't create two variables of the same name within a block. for ex: int x; doublex; will give error. If we are using several libraries, these libraries may have there own variables and functions which may collide with the variables we have created. To resolve this we can use namespace. Every variable defined in namespace mynamespace has to prefixed by mynamespace:: before use. You can split namespace creation in parts. Like namespace a{some variables}, some code, namespace a{some variables}

8) Use enum for clarity.
a) enum is a type that can hold integer value. there are 2 types of enums - plain and enum class. Use enum class which is type safe.
b) what is typedef enum
In C plain enums are declared like this:
enum myenum{RED,BLUE,GREEN}
Now at the time of declaration of color variable of type myenum my, you have to write:
enum myenum color = RED;
To reduce the typing effort at the time of declaration, use typedef:
typedef enum myenum
{ RED,BLUE, GREEN}
myenum;
Now just declare the variable as:
myenum color = red;

However, in C++, there is no need to do typedef.

Use typedef whenever possible for clarity. For example to represent nodes in a graph on may use:
typedef uint32_t Node; //so no Node is the data type for node numbers

9) Declare the base class destructor as virtual.
Virtual members are polymorphically available to the derived class. Therefore, if the base class destructor is not virtual and an object of base class pointing to derived class (base* b = new derived()) is created. At the time of deletion only the base class destructor is called, and thus some resources are not cleaned. (Better to look at this: http://stackoverflow.com/questions/461203/when-to-use-virtual-destructors)

10) Use const wherever the member function is not allowed to change the member object.
11) Use trailing underscore in member variable. This is what google style guide says.
12) Always do a sanity check with function parameters. Return error via cerr<< and exit using exit(EXIT_FAILURE);

13) For creating a list use std::list

Tuesday, 24 December 2013

C++ data types

(This is a summary of what I read from Bjarne Stroustrup's book- 4th edition(2013))

Not all the rules of C++ are specific. What I mean is that there is a standard (ISO14882:2011) which defines rules of C++ but the standard has left several gaps to be filled in by the compiler implementation and the platform on which the libraries are compiled. For ex: size of inbuilt data types is not specifically defined. Standard specifies char to be more than 8-bit. This means that in any implementation of C++, irrespective of the platform, char is guaranteed to be at least 8-bit, but whether char is going to be exactly 8 bit is implementation and machine dependent. If char was standardized to be exactly 8 bit throughout, that would have made our life easier but that couldn't be done because there are some character sets which are more than 8-bit. ASCII is 7-bit (128 characters) but Unicode and UCS(universal character set) are more than 8-bit. Thus, size of char can't be limited to 8-bit, so the standard plainly provides a lower limit.

One can divide the data types in C++ in majorly two headings viz. built-in data types and user defined data types. User defined data types are classes, struct, enum and enum class. Built in data types can be further divided as integral (bool, char and int) and floating point (double) data types.

bool: bool can store two values. Non zero integer values convert to true, zero converts to false. a null pointer (nullptr) converts to false and a valid pointer converts to true.

char: char can be "normal" char, unsigned char, signed char, char16_t,char32_t. These are 6 distinct char types.
Is "normal" char (i.e. simply using char without the mention of sign) signed or unsigned? This is implementation dependent. It could be signed or unsigned.The problem occurs when you do something like:
char c; cin>>c;
int ctoi_c = (int)c;
Now ctoi_c can be between 0 to 255 if char is represented by 8 bit unsigned char and between -127 to 127 for 8 bit signed char. (Note: It can't be between -128 to 127 as C++ can be used on machines using 1's complement representation which will not have -128 in 8-bit signed numbers, thus for portability char can never be -128). <limits> header has a numeric_limits template which has a is_signed static member constant for checking if the char type is signed.
What should be done than to ensure portability?
Solution: One way is to totally avoid "normal" char but some standard library funtions like strcmp() take plain char only. So the best way around is to use "normal" char and avoid negative character values.

int type:
Like char there are three possibilities: "normal" int, unsigned int and signed int. Here, we have choice of size as well, we got short int, int, long int and long long int. Unlike plain char, normal int is always signed. Using unsigned to ensure that some value is always positive is not a good idea, there are implicit conversion rules. So, when do we actually use unsigned? Well, it is a very subjective question and I find varied opinions over this on internet. For now, I have decided to stick to using unsigned only if some bitwise operations are involved and otherwise use signed even if I am sure that the value is going to be positive. To mix unsigned and signed arithmetic can again cause bugs because the behaviour may be implementation or system dependent.

Size matters: Size of above defined integer type is implementation dependent. Later we will see what the standards actually guarantee about size, but if you really need to be specific <cstdint> provide several aliases. ex: int64_t(guarantee of 64 bit), uint_fast16_t(whatever that means?) or int_least32_t (atleast 32 bit guarantee). It is important to know size subtleties from point of view of portability.
Note about types in <stdint>: Types like int32_t etc. (_t suffix indicates typedef) are defined in stdint.h . These are just typedef(or aliases, both are same) and not actual types(except char16_t and char32_t, which are not aliases). Therefore, int32_t is just an alias of some fundamental type which is 32bit. The way it helps in portability of code is that if a system has a 32 bit int and another system has 16 bit int, using int32_t ensures that we have used consistent data types (may be int on former and long on latter). But, remember there are no guarantees, if there is no fundamental type for alias. For example: Suppose you define variable of type uint8_t but all the int types are greater than 8bit, than uint8_t is undefined. (source:http://www.cplusplus.com/reference/cstdint/)

What size does C++ standard guarantee?
Taking sizeof(char) as 1. (how many bit is that 1 actually going to be is implementation dependent, only guarantee is that it is atleast 8 bit).
• 1 ≡ sizeof(char>=8bit) ≤ sizeof(short>=16bit) ≤ sizeof(int>=16bit) ≤ sizeof(long>=32bit) ≤ sizeof(long long>=64bit)
* sizeof(N) ≡ sizeof(signed N) ≡ sizeof(unsigned N) (i.e. size of signed , unsigned and normal is same)
Now, what happens typically is char is 8 bit, int is 32 bit, but this can't be assumed.

Size of a pointer is not necessarily same as the size of the type it is pointing to.

Using sizeof() operator: sizeof() is a unary operator which can be used to find the size of built in types. Here is the result for my system(gcc 4.7.2 on 64bit machine):

Sizeof char: 1
Sizeof int:4
Sizeof short:2
Sizeof long:8
Sizeof long long:8

But actually how many bit does a 1 correspond to in the above output?

<limits> header provides a way to see the system dependent information like maximum and minimum value and if a type is signed or not etc. Some of the results of my system are:

Minimum value of unsigned int0
Maximum value of unsigned int4294967295
Minimum value of char-128
Maximum value of char127 (2^7-1)
Minimum value of int-2147483648 (2^31)
Maximum value of int2147483647 (2^31-1)
Is char signed?1
Is int signed?

From this result, I can infer that on my system, char is 8 bit and rest of the types are scaled according to the multiples of char found by sizeof() operator.