Hi guys, I've to learn about the code below because I have to deal with utf-8 data encoding/decoding. I used to use this but in the past I just copied and pasted people code. In order to write my custom code I have to dig it deep in learning. var message = data.slice(start, end); var FIN =(message[0]&0x80); var RSV1 =(message[0]&0x40); var RSV2 =(message[0]&0x20); var RSV3 =(message[0]&0x10); varOpcode= message[0]&0x0F; var mask =(message[1]&0x80); var length =(message[1]&0x7F); Code (markup): So what does FIN and its associative variables mean? What they do? I know what message[0] means, but what does 0x80 or 0x0F do? Why they use & to merge them? Where can I learn more about this. I don't even know what it is called? Thanks to help save my hair in advanced,
A single ampersand means a binary "and". AND Truth table: 0 & 0 == 0 1 & 0 == 0 0 & 1 == 0 1 & 1 == 1 As opposed to a binary or, which we state with the vertical break character | (or "pipe" to the *nix retards who don't know the character existed BEFORE it was used for that) OR Truth Table 0 | 0 == 0 1 | 0 == 1 0 | 1 == 1 1 | 1 == 1 ... and you should also be aware of the XOR, usually in C syntax languages indicated by the circumflex character (or again "carat" for the *nix re-re's) XOR Truth Table 0 ^ 0 == 0 1 ^ 0 == 1 0 ^ 1 == 1 1 ^ 1 == 0 Of course if we're going to cover bitwise operations we should also talk NOT and SHIFTS. NOT truth table 1~ == 0 0~ == 1 A shift on the other hand moves the bits however many places left or right, typically a shift is indicated by two greater than or less than signs thus: 8 << 2 == 32 0x08 << 0x02 == 0x20 0b00001000 << 0b00000010 == 0b00100000 Mind you with shifts any bits that go 'off the end' of the data size are thrown away! There's also what's called a rotate, that for some reason C never saw fit to add to it's repertoire, where bits that would be shifted off are moved to the other end instead. In most assemblers it's easy: rol ax, 2 ; rotate left ax by 2. Let's say our ax register (registers are a bit like variables -- well that's a gross oversimplification but it'll have to do for now) is 0b11000011 aka 0xC3 aka 195 decimal. 0b11000011 rol 0b00000010 == 0b00001111 Literally it 'rotates' the bits. Sadly C doesn't HAVE a rotate command, so doing that in C is ugly as hell... assuming a 8 bit unsigned integer (let's not even get into signed and two's compliments for now) doing a 'rotate' in C is this train wreck of fugliness... uint8 a = 0b11000011; a = (a >> 6) | (a << 2); Another thing C lacks is the ability to track carry or overflow -- so you have to "and" first which can result in painfully slow and bloated code. Remember that most of the time numbers on computers are stored in binary... Binary can be... painful to use and track due to a number 0..65535 being 16 digits long in binary, hence why hexadecimal caught on as it's WAY easier to convert binary to hex and back in your head. That's what those 0x numbers are, they're HEX. There are several shorthands for saying a number is hex, and languages vary on which they use. N below is a hex value 0..F Posix legacy languages: 0xNN Some assemblers: 0NNh Wirth family languages: $NN So if you see 0F7h, 0xF7 or $F7 in some code, it's usually hexadecimal. Just as any number ending in a lower case letter b or starting with 0b a is typically binary. If it starts or ends with a letter o (sometimes 0o) it's octal. Some examples: HEX OCT DEC BIN 0x01 o001 1 0b00000001 0x02 o002 2 0b00000010 0x04 o004 4 0b00000100 0x08 o010 8 0b00001000 0x10 o020 16 0b00010000 0x20 o040 32 0b00100000 0x40 o100 64 0b01000000 0x80 o200 128 0b10000000 Code (markup): Notice how 1,2,4,8 in each digit of hex corresponds to a bit in the binary? That's what they're testing for. If message[0] were for example: 0x1E == 0b00011110 then 0x1E & 0x80 is the same as 0b00011110 & 0b1000000 Which is clearer if you put them one over the other: 0b00011110 0b1000000 Which is false. Bit 7 (the top bit) is set in the last one but no the first one. The result would be zero. Let's do another example... let's say we had 0xC2 and we wanted to test if bits 7 and 1 were set. (remember the 8 bits of a byte are numbered 0..7). That would be & 0x82 0b11000010 0b10000010 You and those together the result would be 'true' for being non-zero, as well as being 0x82. If we were testing for 0x02 the result would be true and 0x02. If we tested 0x03 (0b00000011) the result would still be true (a bit is set) but the result would be 0x02, since: 0b11000010 & 0b00000011 == 0b00000010 As those are the only binary bits those values have in common. Binary is how **** really works. Basically that code is testing and masking off certain bits. (though I want to backhand someone for all those "var for nothing") var message = data.slice(start, end), FIN = message[0] & 0x80; // is bit 7 set? RSV1 = message[0] & 0x40; // is bit 6 set? RSV2 = message[0] & 0x20; // is bit 5 set? RSV3 = message[0] & 0x10; // is bit 4 set? Opcode = message[0] & 0x0F; // isolate bits 0..3 throwing away top 4 bits mask = message[1] & 0x80; // again is bit 7 set length = message[1] & 0x7F; // isolate bits 0..6 throwing away bit 7 Code (markup): Which is a hell of a lot easier to say than: var message = data.slice(start, end), FIN = message[0] & 0b10000000; // is bit 7 set? RSV1 = message[0] & 0b01000000; // is bit 6 set? RSV2 = message[0] & 0b00100000; // is bit 5 set? RSV3 = message[0] & 0b00010000; // is bit 4 set? Opcode = message[0] & 0b00001111; // isolate bits 0..3 throwing away top 4 bits mask = message[1] & 0b10000000; // again is bit 7 set length = message[1] & 0b01111111; // isolate bits 0..6 throwing away bit 7 Code (markup): One of the big helps of hexadecimal is that each digit is a 'nybble' -- 4 bits. So every two digits is one 8 bit byte. As such you only need to remember each digit's values: 0000 = 0 0001 = 1 0010 = 2 0011 = 3 0100 = 4 0101 = 5 0110 = 6 0111 = 7 1000 = 8 1001 = 9 1010 = A (10 decimal) 1011 = B (11 decimal) 1100 = C (12 decimal) 1101 = D (13 decimal) 1110 = E (14 decimal) 1111 = F (15 decimal) That help any?
Oh and side note, be thankful you never had to deal with ancient big iron -- where a 'byte' was typically 6 bits not 8, so the entire character set was only 64 values not 255 or more. As I think I mentioned earlier Base64 isn't 64 bit, it's 6 bit leading to 64 possible values -- which is why octal was used on those systems as each digit in octal is 3 bits. ... and why people trying to use base64 for stuff on modern computers are pretty much full of more manure than Biff Tannen's 1946 Ford Super De Luxe. It's grossly inefficent, and has a ridiculous amount of overhead. Anyone calling it a 'native format' needs a serious case of sierra tango foxtrot uniform since that's not been true since DEC and Wang went the way of the dodo. Fun stuff: Dec Sixbit character set (octal offsets): 0 1 2 3 4 5 6 7 0 space ! " # $ % & ' 1 ( ) * + , - . / 2 0 1 2 3 4 5 6 7 3 8 9 : ; < = > ? 4 @ A B C D E F G 5 H I J K L M N O 6 P Q R S T U V W 7 X Y Z [ \ ] ^ _ Code (markup): So for example the letter A is o41 / 0x21 / 33 decimal / 0b100001 Be even more thankful we're not working with even more complex systems like base 60 or base 420, which are popular in complex navigation and geometric math... or that much of this is handled for us by the high level languages as it can get REALLY fun when you start dealing with stuff like BCD (binary coded decimal) and arbitrary length numbers.
Wow! Thanks for lengthily reply. It's helpful. I'm really appreciated you spent your time to explain this. I read every single character you write but still not fully understand. But it gives me clue to where I would head to for further study. It's definitely like a silver lightning in the dark. Man I never imagine that JavaScript would get me to this far. Thousand thanks
Ha, I finally know what this mean. Bringing an old thread back to top just to use it in a current project.