Is JavaScript converted to C++ in V8 engine

ketting00 Well-Known Member

Messages:: 772

Likes Received:: 27

Best Answers:: 3

Trophy Points:: 128

#1

Hi guys,

In my search to improve JavaScript app performance, I've stumbled upon this post on Quora: http://www.quora.com/How-does-a-JavaScript-engine-work

I don't understand the part below:
Once the Abstract Syntax Tree is formed, based on the type of JS engine,
it is either converted into a intermediate language (interpreted) or native code.
This is done for each block of code in the tree.
Google V8 takes the latter approach, which is the reason why it is faster by a huge margin.
Code (markup):
Is the V8 engine convert my app into C++ native app?
What's about on the Android machines? Is it converted to Java.

I never learn C++ so I don't know how it works.
Is it a compiler then?

Solved! View solution.

ketting00, Nov 30, 2014 IP

deathshadow Acclaimed Member

Messages:: 9,732

Likes Received:: 1,998

Best Answers:: 253

Trophy Points:: 515

#2

C++ is NOT native code. Neither is C, or pascal... those may be compiled languages, but they are compiled INTO native code and are NOT native code unto themselves. You need a compiler to turn them INTO native code. (slow, ineffective native code at that depending on your point of view).

Native code being a uselessly vague euphemism for what we used to call "Machine Language". You MIGHT call it "assembly" -- but assembler mnemonics are simply tokens so us mere mortals can write machine language, and is not machine language itself. You need an "assembler" to turn "assembly language" into "machine language".

Take what I'm working on now... my assembler code (8086/8088 using NASM):
; procedure cmsWrite;
global cmsWrite
cmsWrite:
; INPUT
;   dx = port base + offset
;   ah = register
;   al = data
; OUTPUT
;   nothing
; CORRUPTS
;   nothing
	inc  dx
	xchg al, ah
	out  dx, al
	dec  dx
	xchg al, ah
	out  dx, al
	retf
Code (markup):
"assembles" into this, THIS is native code (stated in hexadecimal)

42 86 C4 EE 4A 86 C4 EE CB

Basically a handful of gibberish bytes for us mere mortals.

Of course, since machine language is actually a different language depending on the processor family (arm, x86, x64, MIPS, PPC), the V8 compiler will need a different assembler and assembly code for every different type of processor it is targeting just like cross-platform compilers for C, C++, Pascal and any other compiled language would. Remember, machine language for ARM won't run on x86, nor will x64 run on x86 or ARM, or MIPS on ARM, x86 or x64, or PPC on anything else. That's why we have high level languages like C and compilers, or interpreted bytecode like Java -- portability of source since there is no such thing as binary compatibility.

Compiling to machine language right before runtime is typically called "Just in time" compilation -- JIT. V8 does it from source code, Java can do it on some platforms from it's normally interpreted bytecode -- and a number of emulators actually do this from one type of machine language to another. For example the old Virtual PC for macintosh in the pre-intel Mac age translates x86 to PowerPC code on the fly using a mix of interpreting and JIT compilation. Likewise old "System Mac" emulators like Basalisk can optionally use JIT to turn Motorola 680x0 machine language into 8088 machine language by recompiling/re-assembling it with equivalents. It's not as fast as native code written for the target platform, but it's usually far faster than using an interpreter.

Last edited: Dec 1, 2014

deathshadow, Dec 1, 2014 IP

ketting00 Well-Known Member

Messages:: 772

Likes Received:: 27

Best Answers:: 3

Trophy Points:: 128

#3

Thank you so much. I understand it better now. I've wondered for long time why they compared high performance JavaScript with C++. This should give me some figure.

ketting00, Dec 1, 2014 IP

deathshadow Acclaimed Member Best Answer

Messages:: 9,732

Likes Received:: 1,998

Best Answers:: 253

Trophy Points:: 515

#4

ketting00 said: ↑

I've wondered for long time why they compared high performance JavaScript with C++.
Click to expand...

Thing is there are a number of things about JavaScript / ECMAScript that "cripples" it's ability to EVER reach C's performance. You can come close or the same on some things, but there are some issues in the way JS works that will ALWAYS leave it trailing, quite often miserably so.

One of the easiest to point out is that it has to be compiled every time a new one is downloaded for V8 -- that startup penalty can have a pretty big impact the more scripting you use. Even today compiling a megabyte of code is nothing to sneeze at, and it's why quite often (for me at least) sites with a LOT of scripting go off to never-never-land (aka "Not Responding) in Chrome on first-load for like 10 seconds.

But more of a worry is the lack of strict typecasting. Basically EVERY variable in JavaScript has to be a pointer to an organized STRUCT -- now, if you've not used languages with actual record/struct organization, real pointers, or object classes, this is going to be a bit hard to grasp... but JavaScripts utter and complete lack of strict typecasting means you have no idea what type of variable something is (since it can be any and all types all at once), how big it is (making looking it up harder), and so forth.

There are only two real ways of dealing with typeless variables behind the scenes at the "native code" level -- you either store it as a string in memory and do typecasting based on the "operation" being performed when the value is read for each and every non-string operation, or you write the variable as every possible "type" into memory when it's being "written" to. The former has a heavy-ish performance impact the more often you do non-stored type operations, the latter has a pretty hefty impact on memory use. EITHER way is prone to memory fragmentation, which some people will mistake for a memory leak (ever leave facebook running long-term in FF?) when it's just a side effect of fragmentation -- the memory IS being freed, but new blocks are bigger than the old blocks so they don't "fit" into the hole left by the previously freed memory.

But more of a worry is the lack of strict typecasting. Basically EVERY variable in JavaScript has to be a pointer to an organized STRUCT -- now, if you've not used languages with actual record/struct organization, real pointers, or object classes, this is going to be a bit hard to grasp... but JavaScripts utter and complete lack of strict typecasting means you have no idea what type of variable something is (since it can be any and all types all at once), how big it is (making looking it up harder), and so forth.

There are only two real ways of dealing with typeless variables behind the scenes at the "native code" level -- you either store it as a string in memory and do typecasting based on the "operation" being performed when the value is read for each and every non-string operation, or you write the variable as every possible "type" into memory when it's being "written" to. The former has a heavy-ish performance impact the more often you do non-stored type operations, the latter has a pretty hefty impact on memory use. EITHER way is prone to memory fragmentation, which some people will mistake for a memory leak (ever leave facebook running long-term in FF?) when it's just a side effect of fragmentation -- the memory IS being freed, but new blocks are bigger than the old blocks so they don't "fit" into the hole left by the previously freed memory.

A major side effect of this is that one of the optimal structures for doing things like graphics operations -- Arrays, are grossly inefficient because they are arrays of pointers. If you use "delete" the pointer at that index is set to NULL and the memory of that pointer is still allocated. (the memory that pointer was pointing at may or may not be released depending on if some other variable is still pointing at it). It's also why "split" is far slower as it basically has to move all pointers above the ones removed down in memory.

It gets worse with objects -- because objects in JS are basically just associative arrays, they are pointers to structures that point at other structures; the "Hey is that me over there or is that you" scenario you often hear C++ detractors kvetch about, but without having any control over it. Objects in JS end up basically a "pointered list" -- which makes iterating through them a bit more of a challenge.

In a language like C++ or Object Pascal, variables and properties are a fixed size declared at runtime. For example (I'm going to use Pascal for this as it's more "plain english" than C) if my first global variables in a program were:
var
	t:word;
	b:byte;
Code (markup):
in x86 assembly they would be in the "data segment" of the program -- the DS register... memory operations default to using the DS register so if for example I wanted to pull those variables into the accumulator
	mov  ax, [0] ; points to T, which is two bytes long
	mov  al, [2] ; points to B, which is one byte long
Code (markup):
Semicolons start comments in most Intel assemblers

If those were in a record / struct
type
	tDummy = record
		t:word;
		b:byte;
	end;
Code (markup):
and I assigned them to two different variables:
var
	first, second:tDummy;
Code (markup):
to access them would go something like this -- I'll do a quick register only copy from one to the other. AX, BX, etc are processor "registers" - basically the processors internal workspace for actually doing things... saying just the register name means assign that register it's value, putting it in brackets like [bx] means access the value at the memory offset pointed to by the value in the register.
; put first.t into AX, first.b into DL
	mov  bx, OFFSET first    ; point DS:BX at var "first"
	mov  ax, [bx]            ; var "t" is 0 offset from the start of FIRST
	mov  dl, [bx + 2]        ; var "b' is 2 offset from the start as T is a 2 byte WORD
	
; put AX into second.t and DL into second.b
	mov  bx, OFFSET second
	mov  [bx], ax
	mov  [bx + 2], dl
Code (markup):
It's very simple and very little code because we know how big they are, and exactly where they are in memory.

To implement type-less variables, you'd need a structure something more like this:
type
	tTypeless = record
		dataSize:word;
		data:pointer;
	end;
Code (markup):
If we had just two variables:
	var
		first, second:tTypeless;
Code (markup):
To copy from one to the other you'd end up having to do this in ASM.
; we'll assume that we can corrupt BX, CX, ES, DI and SI
	mov   bx, OFFSET first     ; points DS:BX at var "first"
	mov   di, OFFSET second    ; points DS:DI at var "second"
	mov   cx, [bx]             ; set counter to dataSize
	mov   [di], cx             ; copy dataSize to "second"
	les   di, [di + 2]         ; points ES:DI at second.data
	push  ds                   ; preserve the data segment
	lds   si, [bx + 2]         ; points DS:SI at first.data
	rep   movsb                ; repeat move one byte from DS:[SI] to ES:[DI]
										 ;   increments SI and DI, decrements CX
										 ;   until CX = 0
	pop   ds                   ; restore the data segment
Code (markup):
Which is kind-of what we'd have to do with a string, but we have to do it with EVERY type of variable in JavaScript... The above code, particularly with the REP (repeat) involved can take anywhere from 3 to 5 times longer to run for something as simple as a strictly typecast integer number.

(the copy could run faster using more code by leveraging shifts of CX and some jumps to get down to a movsw on a 286+, movsd on 386+ or movsq on x64, but this is just an example).

They ARE trying to address this, it's why in ECMAScript you can now try to force typecasts like ToInt32 and their now saying that "type" should be "held" until an operation not compatible with it (like a float or string operation) is done -- but you still have to check the operation and it's type -- and it does nothing to address the relative lack of "packed arrays" and other efficient storage methods in JS that, well... sticks a hobble skirt and ballet heels on JavaScript and shoves a gimp ball in the browser's mouth...

Which is going to leave the performance a decade and a half behind native code. You mix in the slow audio API and high latency, lack of ability to properly intercept the keyboard, mouse or CHRISTMAS FORBID game controllers, and it's just not going to be as viable as say... a browser plugin like Unity.

It's actually why I think a new language should have been introduced for doing CANVAS and AUDIO operations, be it 2D or 3D. It should have:

1) STRICT typecasting

2) Proper class declarations and assignments

3) Better audio with double-buffering for things like doing your own mixing. Said access should be extra low latency. (100ms or less!)

4) Ability for full keyboard interception, as well as access to ALL HID devices connected to the system.

5) User approval on startup BEFORE anything is allowed to intercept user input. (to prevent hijack abuse)

6) Something I know some developers have been clamoring for, the option for bytecode distribution instead of source distribution. An intermediate bytecode would be far, FAR more effective IMHO.

But of course that's why things like Unity impresses me a HELL of a lot more than anything I've seen done with WebGL... even if it is one of those "evil" browser plugins.

Of course browser makers come up with all their bullshit reasons on why "plugins are evil" because it means they can't promote their own vendor lock in; Joe forbid developers be allowed to choose what technologies they want to deploy.

The laugh is Java would be better suited to this task, if it's access to hardware from the browser wasn't completely FUBAR. (admittedly, that's a Security "feature")

Though maybe it's just me -- JS is so close to C in syntax (since, well, it IS a C syntax language) I don't get why just using proper OpenGL, OpenAL, or even something like SDL (I've been playing with OpenGL atop SDL for some time now) from C++ or Object Pascal (Delphi, FPC, even xCode has the option) is so "hard" and just what the **** is wrong with writing native applications that you need this browser based shit in the first place.

-- edit --

BTW, the above description is much akin to the layman's explanation of inflation.

"inflation one could fairly state,
is the time when the universe expanded at a rate
that was faster than the speed of light;
though that oversimplifies and it ain't quite right.
... but for purposes here it will have to do,
'cause I ain't got the time to explain it to you." -- M.C. Hawkings

Last edited: Dec 1, 2014

deathshadow, Dec 1, 2014 IP

ketting00 Well-Known Member

Messages:: 772

Likes Received:: 27

Best Answers:: 3

Trophy Points:: 128

#5

Great advice. I found your short-sweet post is more useful than Nicholas Zakas's whole High Performance JavaScript book and I enjoy reading it.
If you don't know the guy, people lauded him as may be second to that of Douglas Crockford.
That book has nothing better than what I can find anywhere on the internet. It's the best book out there though.

Last edited: Dec 1, 2014

ketting00, Dec 1, 2014 IP

deathshadow Acclaimed Member

Messages:: 9,732

Likes Received:: 1,998

Best Answers:: 253

Trophy Points:: 515

#6

Want a laugh, neither of those names mean a thing to me...

But to be fair, I go into book stores and libraries these days and go "user crap, user crap, user crap, user crap" and then walk away dejected. The laugh is we had more informative books on how things really worked in decades past than we do today. It's all abstraction atop abstraction atop abstraction now to the point that most people's code chokes hexa-core multi-ghz Xeon setups to death pushing around the exact same amount of DATA we were two decades ago using 486/50's.

See forum software -- like this one. Nothing screams developer ineptitude like 90k of markup to deliver 15k of plaintext and a half dozen content images. (basically 30k's job at the MAXIMUM)

Worse, the "advice" I find from most books and today's "experts" seems to run 100% contrary to everything I've learned in actually DOING THINGS the past 35 years of writing software. Quite often they'll say things like "don't use with" or "don't use anonymous functions EVER" -- and I'm like "ach laddie, where'dja ever get that idea?!?"

Though right now there are a LOT of cases where something is bad in one narrow corner-case, that people magically turn into "NEVER USE THEM!!!" -- see tables; tables for layout bad, tables for tabular data good -- it's NOT don't ever use tables; It sure as shine-ola isn't "abuse nested lists on tabular data"!. Anonymous functions being assigned more than once bad, Anonymous functions assigned once completely harmless -- it's NOT don't ever use anonymous functions.

But what do I know -- I'm the nutjob who thinks the onevent attributes in HTML should be removed from the specification, as should the STYLE tag... and the STYLE attribute being deprecated for all but the handful of situations in which presentation also equals data. (like width or height on a progress bar or graph).

... and of course from my efforts a decade ago to get involved I discovered my opinions are about as welcome as a snickers bar in a swimming pool, which is the root of why I gave up being polite about what mouth-breathing halfwits seem to be in charge of things right now; see the ENTIRE WhatWG and the fools at the W3C who yum up their **** like it was breaded and deep fried. I've got nothing but disgust for the entire lot of them at this point.

Last edited: Dec 1, 2014

deathshadow, Dec 1, 2014 IP

Log in or Sign up

Advertising (learn more)

Is JavaScript converted to C++ in V8 engine

ketting00 Well-Known Member

deathshadow Acclaimed Member

ketting00 Well-Known Member

deathshadow Acclaimed Member Best Answer

ketting00 Well-Known Member

deathshadow Acclaimed Member

Log in or Sign up

Advertising (learn more)

Is JavaScript converted to C++ in V8 engine

ketting00 Well-Known Member

deathshadow Acclaimed Member

ketting00 Well-Known Member

deathshadow Acclaimed Member Best Answer

ketting00 Well-Known Member

deathshadow Acclaimed Member

Useful Searches