1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

What just happened to this text? and how do I fix it?

Discussion in 'PHP' started by sarahk, Jul 17, 2020.

  1. #1
    So, a client is working with MoodleCloud and it seems like every time we download a file the format is subtly different from the previous time.

    The latest is a list that is plain text but I'm guessing the utf coding is different.
    SEMrush
    In a text editor text looks just fine
    When I debug inside the import it looks just fine
    But when I view the source of the debug I see that

    mowog
    Code (markup):
    becomes

    m o w o g
    Code (markup):
    and put into a query it becomes

    \0\0m\0o\0w\0o\0g\0
    Code (markup):
    How should I be handling this?
     
    sarahk, Jul 17, 2020 IP
    SEMrush
  2. deathshadow

    deathshadow Acclaimed Member

    Messages:
    9,327
    Likes Received:
    1,821
    Best Answers:
    244
    Trophy Points:
    515
    #2
    It's almost certainly an encoding mismatch, but it's hard to say which one. Is the encoding being set properly in the http headers? Was it uploaded properly? Was the accept-encoding a mismatch from the server settings if it was uploaded via http? If it was uploaded via FTP was it set to binary transfer and not the default auto-detection of text that has ALWAYS mangled things and should be removed from FTP entirely?

    That said I recently had a client on Moodle where it was just a broken unmanageable shit-show. It's a cute learning tool, but trying to use it for live websites is foolhardy at best, a pathetic joke at worst. First thing I'd try is migrating off of that to a more conventional development stack and hosting, even just for testing.

    Particularly given the s***-show they vomit up and have the unmitigated gall to call HTML. All the hallmarks of "eye cans haz teh intarweb duvalupmunt" ignorance of the most basic aspects of site-building, where "For people who know nothing about websites, BY people who know NOTHING about websites" is no recipe for success.

    Just view-source any site built with it for proof enough of that. Just another scam preying on hopes and wishful thinking, CREATED by people unqualified to work in this industry, much less teach others how to do things.
     
    deathshadow, Jul 18, 2020 IP
    sarahk likes this.
  3. sarahk

    sarahk iTamer Staff

    Messages:
    26,223
    Likes Received:
    3,856
    Best Answers:
    108
    Trophy Points:
    665
    #3
    I ended up putting it into a spreadsheet and then exporting as csv to get a clean copy. It's a once a month process for one file so I can live with it. Uploaded via a regular form.

    And yep, quickly coming to that realisation about moodle but not my decision, I just handle the upload and download of students.
     
    sarahk, Jul 18, 2020 IP
    JEET likes this.
  4. JEET

    JEET Notable Member

    Messages:
    3,356
    Likes Received:
    371
    Best Answers:
    16
    Trophy Points:
    235
    #4
    Isn't \0 the end character added at the end of char and varchar fields in databases?
    If the character is there in raw downloaded file, then following could remove it:

    $c= //your data
    $c= str_replace( "\0", "", $c );

    I am thinking the server in question is preparing the file 1 byte at a time, which might be causing the problem. Not sure though.
     
    JEET, Jul 20, 2020 IP
  5. deathshadow

    deathshadow Acclaimed Member

    Messages:
    9,327
    Likes Received:
    1,821
    Best Answers:
    244
    Trophy Points:
    515
    #5
    It's called "null termination" and it's a really shitty way of handling strings. Hence it being the default string type as well as many other data streams in C, just part of why C and most things based on it are dumbass.

    Typically most SQL engines -- like MySQL -- do NOT use null termination, preferring to be "length limited" meaning an unsigned integer stored as byte, word, dword, or even qword is used to say how long the string is at the very start of the data, that way you know where it ends BEFORE you get to the end. This is sometimes called "pascal style" strings. They allow for more efficient and faster string operations, at least on Intel based architectures.

    That's why in MySQL the tinytext/tinybob types have a limit of 255 bytes but takes up 256, it's 8 bits. text/blob use a 16 bit run length so that's 65534 as the maximum length since two bytes have to be taken off the size of the data structure (65536 / 0x010000). Same for mediumtext/mediumblob which are 24 bit so the limit is 3 bytes less, or longtext/longblob which works out to 4 bytes less than the 32 bit unsigned limit, because 4 bytes are consumed to store the length.

    Though I think I might know what it is, thanks to @JEET bringing that up. They're probably using CHAR instead of VARCHAR in their database, which does indeed slop escaped null into the string as CHAR is the only null terminated string in mySQL (I think, don't quote me on that). Part of why CHAR is for chumps and is well known to reek all sorts of havoc including buffer overflows and execution exploits... why? Because null termination is dumbass shit.

    It probably is null termination interference and NOT an encoding issue, so I take back my previous post and the "almost certainly" part! I had almost completely forgotten that CHAR is the dipshit "I canz teh prugram?" null termination approach to string handling; a relic of 1970's mainframes that should have stayed there. But instead is how C programmers think all strings should work.
     
    deathshadow, Jul 20, 2020 IP
    JEET likes this.
  6. JEET

    JEET Notable Member

    Messages:
    3,356
    Likes Received:
    371
    Best Answers:
    16
    Trophy Points:
    235
    #6
    JEET, Jul 21, 2020 IP
  7. deathshadow

    deathshadow Acclaimed Member

    Messages:
    9,327
    Likes Received:
    1,821
    Best Answers:
    244
    Trophy Points:
    515
    #7
    Just to better illustrate null termination and how mysql char and varchar differs:

    "test" VARCHAR(8) == 0x04 0x74 0x65 0x73 0x74
    "test" CHAR(8) == 0x74 0x65 0x73 0x74 0x00 0x00 0x00 0x00 0x00

    Though in practice the varchar is likely reserving 9 bytes as well, because the length is specified first it doesn't matter what's in there, no code should ever try to access past the length. With null termination it's common practice to pad the entire field with null. This means writing more data than you need to, and it means that if at the C or ASM level you screw up checking for null you will start overwriting whatever variables come after. That's literally BEGGING for "buffer overflow" type errors.

    Like all the other dipshit errors C and C syntax languages seem to LOVE to encourage.

    Hence why I say Kernighan can go suck an egg. C is not my favorite programming language.
     
    deathshadow, Jul 21, 2020 IP
    JEET likes this.