Tuesday 6 February 2018

UTF8 decocoding of 7-bit encoded Fast messages, in My way

Unicode encoding and Decoding has become very common now a days.As a developer who uses Object oriented programming ,it is easy to find Unicode encoding decoding inbuilt methods.Most of the current High-level programming have those incorporated into them by default.But when I was working on a project using C which required to parse binary data and show the ASCII/Unicode value I found little bit challenging.We were parsing some  FAST encoded messages . We were converting them to a binary string like "10001010010......" and partitioning them with Stop-bit.All was going fine until we encounter Unicode character which had a variable length encoding unlike ASCII.
 so to solve this issue I studied about overview of Unicode and code-point . Here is a YouTube link that also help me to understand the concept :
Characters in a computer - Unicode Tutorial UTF-8
I also asked the question in stackoverflow  :
Binary to UTF-8 in C    
I found that I had to represent the binary char array in the following way i.e considering the Leading and Continuation  Bytes and converting Binary Code-point to decimal code-point.
"
UTF-8 is a specific scheme for mapping a sequence of 1-4 bytes to a number from 0x000000 to 0x10FFFF:
00000000 -- 0000007F:  0xxxxxxx
00000080 -- 000007FF:  110xxxxx 10xxxxxx
00000800 -- 0000FFFF:  1110xxxx 10xxxxxx 10xxxxxx
00010000 -- 001FFFFF:  11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
" ---Source https://www.cprogramming.com/tutorial/unicode.html

so I wrote a function like this which would take character array and Unicode char array(which will be filled with the resulting Unicode character ) and calculate the decimal code point and get the utf-8 value :

void processBinaryData(char* pBinaryData,int32_t n,char *unicodeString){
    char arr[n*8],i=1,offset=0;
    int m =0;
    if(n>1){
        size_t len = strlen((char *)pBinaryData);
        while(len>0){
            len = len-8;
            if(i==1){
                offset= n+1;
                memset(arr, 0, n*8);
                for( m=0;m<8-(n+1);m++){
                    arr[m] = pBinaryData[offset+m];
                }
                offset = 10;//offset+8-(n+1)+2;
            }else{
                int j = 0;
                for( j=0;j<6;j++){//i*8-2
                    arr[m] = pBinaryData[offset];
                    m++;
                    offset++;
                }
                offset=i*8+2;

            }
            i++;
        }
        GetUnicodeChar(bin2dec(arr),unicodeString);
///////////////////
    }
    else{
        GetUnicodeChar(bin2dec(pBinaryData),unicodeString);
        return;
    }

}