1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Javascript - Pass-by-value versus call-by-value

Discussion in 'JavaScript' started by serialentre, Aug 2, 2015.

  1. #1
    Hi all,

    I would like to clarify if this author explained the concept correctly.

    My understanding at this point is based on this article -
    http://nsono.net/javascript-pass-by-value-or-pass-by-reference/

    Would someone by able to verify the explanation, and if it is incorrect, point it out?

    Thank you!
     
    Solved! View solution.
    Last edited: Aug 2, 2015
    serialentre, Aug 2, 2015 IP
  2. #2
    For the most part it's an accurate description, but you have the thread title wrong. There's no such thing as "call by value" -- it's "pass by value" or "pass by reference"

    Depending on how many other languages you know and the underlying mechanisms right down to the machine language level you understand, it can be quite hard to grasp in its entirety. It is confusing in what is typically considered an untyped or loosely typed language, to have the distinction of how variables are passed be determined entirely by the type of variable... just as it's extremely hard to understand if you have no concept of what a "pointer" is... and when "theoretical" people talk about a "reference" a pointer is what they are ACTUALLY talking about.

    When most high level programming languages pass values, they do so on what is called "the stack". That's a section of memory set aside for the program (sometimes shared with other programs and the OS, other times not depending entirely on OS, environment, processor, etc). The stack is also typically used to store "local variables" while another memory section typically just called "Data" is set aside for the storage of "global variables", "constants" and other information and memory allocation that exists for the entire time the program will run. It is also common for the memory address to "return to" after calling a function to be passed onto the stack as part of the "call" mechanism.

    On most Intel implementations in compile languages like C or Pascal the stack actually grows DOWNWARDS from the top of the memory available to the program, while global variable space grows upwards from the bottom.

    You know how JavaScript Arrays have .push and .pop methods? These directly correspond to the stack behavior. You add values to the stack by "pushing" them onto the end of it, and you take values out of the stack by "popping" them off. You can also directly access the stack using an 'index' just like indexing an array.

    As the stack is a limited resource, and it takes time to copy things. Passing really large values like strings, arrays and objects on the stack is simply impractical. It risks creating a "stack overflow" where the data being passed tries to write outside the area allocated to it. On unprotected memory models this can cause arbitrary values to be written into memory space, possibly even overwriting not just the data area, but possibly even the area where code to be run is. MANY early cracking exploits were built on this flaw... (and laughably still happens in some software!!!). On systems that have protected memory space, an attempt to accesss outside the allocated area will trigger a "protection fault", making the OS kill the program before allowing it to do something... bad.

    To get around this, large or arbitrary length values are "passed by reference", a fancy way of saying a pointer is pushed onto the stack; a fixed length memory address at which the data is stored. On 16 bit x86 systems and fixed address 32 bit systems a pointer is 4 bytes, aka 32 bits. With 32 bit Intel and other selector based systems memory addresses are 8 bytes / 64 bits wide, whilst on 64 bit selector systems you can hit a whopping 16 bytes for a pointer! It's why 64 bit code can often be slower and consume more memory, offsetting the bonus of being able to move memory around faster thanks to the larger bus.

    Since the pointer to the actual long term storage memory address is passed to the function, that's where it's operated on which is why you can change it's values remotely.

    At the same time, smaller fixed values like integers can be stored as 8, 16, 32 or 64 bits, and floating point numbers are often 32, 64 or 80 bits wide. Being anywhere from 1 to ten bytes in length, these are spitting distance in size from pointers so... What's more efficient? Passing the actual value so the function can just read that from the stack, or making the function read the pointer from the stack to then read the value from somewhere else in memory?

    But since all you are passing on the stack is a COPY of the value, you can change that copy until blue in the face and the original will remain unchanged.

    MOST "real" programming languages give you a mechanism for FORCING variables to be passed by reference instead of by value.

    In Pascal the var keyword can do this in the procedure or function declaration. (pascal makes a distinction between functions that return values and functions that do not). The compiler will not let you attempt to pass an object or array by reference.

    Procedure test(byValue:integer; var byReference:integer);
    Code (markup):
    PHP has the & symbol to do much the same thing, though in modern versions of PHP Arrays and Objects are ALWAYS passed by reference!

    function test($byValue, &$byReference);
    Code (markup):
    JavaScript has no real mechanism for forcing pass by reference, which can be quite limiting at times -- it often leads programmers to use objects for no reason other than to trip said behavior.

    It really comes down to the machine language level implementation on which all other computer languages are pretty much based. I'm going to use Turbo Pascal's x86 compilation as an example here as Pascal was created as a teaching language, and it shows. The older architecture of a DOS environment also keeps it fairly simple.

    We take that procedure above, let's say we were to call it thus:

    TEST.PAS -- Turbo Pascal 7
    Stuff in curly brackets are pascal comments unless they start with a dollar sign, in which case they are compiler instructions
    program test;
    
    { external says that this procedure is linked in after compilation }
    Procedure testProc(byValue:integer; var byReference:integer); external;
    
    {$L TEST.OBJ} { links in our ASM obj file }
    
    var
    	value1, value2:integer;
    
    begin
    	value1 := 128;
    	value2 := 256;
    	writeln('Before we call the function');
    	writeln('Value 1: ', value1);
    	writeln('Value 2: ', value2);
    	testProc(value1, value2);
    	writeln('After calling the function');
    	writeln('Value 1: ', value1);
    	writeln('Value 2: ', value2);
    end.
    Code (markup):
    ... and then had the procedure written in assembly. When a procedure starts the 'stack pointer', "SP" points to the end of the stack where new values are stored. The first thing pushed onto the stack are the function arguments, followed by a 32 bit pointer to the return address (where the function was called from). Because the stack grows DOWNWARDS in memory, this means that:

    SS:SP == return address (32 bit pointer)
    SS:SP + 4 == byReference (32 bit pointer)
    SS:SP + 8 == byValue (16 bit integer)

    SS means "stack segment", the colon says use that segment to point into.

    The JavaScript equivalent of this would be as if to call the function we created an array, and did:

    var
    	stack = [],
    	byValue = 128,
    	byReference = { value : 256 };
    	
    stack.push(returnAddress);
    stack.push(byReference);
    stack.push(byValue);
    Code (markup):
    Unfortunately x86 machine language does NOT allow you to access SP directly for indexing, so you have to copy it to another register called BP, the "base pointer". This gets even screwier to deal with as you typically need to preserve BP for the whatever code is calling your function... so (and this is the laugh) you end up pushing BP onto the stack as well. Since we're talking 16 bit code that means BP is also two bytes, so that table of pointers becomes:

    SS:BP == BP's starting value
    SS:BP + 2 == return address
    SS:SP + 6 == byReference
    SS:SP + 10 == byValue

    An assembly language function trying to talk to Turbo Pascal would then look something like this... Let's try and copy the value from byValue to byReference. Anything after a semi-colon on a line is a comment. It is also required in most external functions for Pascal and C that they pop their own arguments off the stack before exit, but the x86 processors incorporate this functionality into the "RET" statement, that itself pops the return address off the stack and into the program counter. (a register that stores where code is to be executed).

    TEST.ASM -- Netwide Assembler
    ; TEST.ASM - simple test of Pascal to Assembly interaction
    BITS 16
    CPU 8086
    
    SEGMENT CODE
    
    ; Procedure testProc(byValue:integer; var byReference:integer);
    global testProc                ; say that code outside this file can reference this item.
    testProc:                      ; a label to say this is the start of the function
    	push bp                      ; save BP on the stack
    	mov  bp, sp                  ; then copy the stack pointer to it.
    	les  di, [bp + 6]            ; point our "extra segment" and "data index" at the data for "byReference"
    	mov  [es:di], WORD [bp + 10] ; copy the 16 bits of "ByValue" to the memory address of "byReference"
    	pop  bp                      ; restore BP by taking it off the stack
    	retf 6                       ; decrease stack pointer by six, then pop the return address off the stack
    
    Code (markup):
    If we assemble TEST.ASM with NASM to create an OBJ and compile TEST.PAS with TPC.EXE (a compiler and linker in one), and ran it, the output result would be:

    Before we call the function
    Value 1: 128
    Value 2: 256
    After calling the function
    Value 1: 128
    Value 2: 128
    Code (markup):
    If you're able to follow that, you'll realize that under the hood the difference is quite profound. It would be somewhat similar to:

    var
    	byValue = 128,
    	byReference = { value : 256 },
    	stack = [];
    	
    function testProc() {
    	stack[length - 1].value = stack[length - 2];
    	stack.splice(length - 2, 2);
    }	
    
    console.log("Before we call the function");
    console.log("Value 1: " + byValue);
    console.log("Value 2: " + byReference.value);
    stack.push(byValue);
    stack.push(byReference);
    testProc();
    console.log("After calling the function");
    console.log("Value 1: " + byValue);
    console.log("Value 2: " + byReference.value);
    Code (markup):
    NOT that you'd ever write JS that way and it removes the return address from the equation, but that's what the underlying mechanism is in fact doing and hiding from you!

    I know that's probably seems pretty complex, but that's what's really going on and, well... It's a VERY complex topic. It's why I'm something of an advocate of people taking the time to go back and learn assembly language even on simple outdated platforms like 16 bit DOS. It gives you a firmer grasp of what's REALLY going on, instead of leaving you woefully unaware of that.

    People who only learn high level languages often miss these nuances, and as such miss a great many opportunities to write more efficient code.

    -- edit -- Oh, and even this seemingly complex post is a gross oversimplification and not quite right. There are a myriad of variations and different methods based on different languages, different processor families, and just compiler or interpreter developers mindsets and habits.
     
    Last edited: Aug 6, 2015
    deathshadow, Aug 6, 2015 IP
  3. serialentre

    serialentre Member

    Messages:
    123
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    30
    #3
    Thank you Deathshadow for the wonderful breakdown in your explanations. I've learnt so much more from reading your post. Invariably, I don't fully understand everything you mentioned because of the lack of knowledge in the other programming languages.

    But I want to acquire more knowledge and know what you know about data structures. Where can I start reading up on them?
     
    serialentre, Sep 7, 2015 IP
  4. serialentre

    serialentre Member

    Messages:
    123
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    30
    #4
    Hi Deathshadow,

    Just a few clarifications about your explanations.

    As you pointed out, it's either (1) Pass by reference, or (2) Pass by value.

    Question 1: So most high-level programming languages pass values which are relatively small in size?

    Question 2: Because the stack is limited, what happens is that large or arbitrary length values are passed by reference, correct?

    Question 3: You asked if "passing the actual value is more efficient compared to making the function read the pointer from the stack to then read the value from somewhere else in memory"?

    Making the function read the pointer is more efficient because you don't have to retrieve the entire value. Instead, you just reference the address of the data, correct?

    How do we know WHEN we are passing a COPY of the value on the stack? Is it when we create a pointer to reference the data somewhere in memory?

    Question 4: So what actually happens is that when a pointer is created to reference a value, what is actually stored in the pointer is a reference to the address where the data in stored within the memory of the computer? And a pointer can only be created in some lower level languages, correct?

    Thank you!
     
    Last edited: Sep 8, 2015
    serialentre, Sep 8, 2015 IP