func substr cuts string not precisely

Discussion in 'PHP' started by Nuzhser, Jun 16, 2010.

  1. #1
    Good day!
    I have a cyrrylic texts in mysql. I need it to be cutted to 130 symbols everytime after sql query on a cicle. But. Function does unequal strings, even more it returns less than 130, near 70-80 symbols. What is the mess there?:confused:

    Result is here
    marketsite.byethost12.com
     
    Last edited: Jun 16, 2010
    Nuzhser, Jun 16, 2010 IP
  2. MyVodaFone

    MyVodaFone Well-Known Member

    Messages:
    1,048
    Likes Received:
    42
    Best Answers:
    10
    Trophy Points:
    195
    #2
    Are you allowing for empty spaces ?

    Example :

    "1 2 3 4 5" = 9 characters

    If you wanted to show 1 2 3

    $str = substr($str, 0, 5); // you would have to count 5 characters
     
    MyVodaFone, Jun 16, 2010 IP
  3. Nuzhser

    Nuzhser Peon

    Messages:
    12
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Yes i have count all places in text including spaces and it is 72-90 not 130 as is in script substr($text, 0, 130);
    You can see on site
     
    Nuzhser, Jun 16, 2010 IP
  4. lukeg32

    lukeg32 Peon

    Messages:
    645
    Likes Received:
    19
    Best Answers:
    1
    Trophy Points:
    0
    #4
    It's working correctly...... kind of.........

    Simple explanation: The characters you are using are part of an extended character set (UTF-8) meaning that they will be represented by more than 1 byte each. As the core function process it per byte, you are getting less "characters" back because of it.......

    You might want to look at mb_substr instead.
    http://uk.php.net/manual/en/function.mb-substr.php
     
    lukeg32, Jun 16, 2010 IP
  5. Nuzhser

    Nuzhser Peon

    Messages:
    12
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #5
    I have remake this code from substr($text, 0, 130); to mb_substr($text, 0, 130,

    'utf8'); It becomes a little similar by length but far to equal
     
    Nuzhser, Jun 16, 2010 IP
  6. lukeg32

    lukeg32 Peon

    Messages:
    645
    Likes Received:
    19
    Best Answers:
    1
    Trophy Points:
    0
    #6
    What happens if you do this before the mb_substr?

    mb_internal_encoding('UTF-8');
    PHP:
    Failing that, you might want to also try mb-strcut; its similar but there are slight differences with how multi-byte characters are handled.

    http://uk.php.net/manual/en/function.mb-strcut.php
     
    lukeg32, Jun 16, 2010 IP
  7. Nuzhser

    Nuzhser Peon

    Messages:
    12
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #7
    It outputs cipher 1 everywhere instead of text after i insert mb_internal_encoding('UTF-8');
    marketsite.byethost12.com
     
    Nuzhser, Jun 16, 2010 IP
  8. Nuzhser

    Nuzhser Peon

    Messages:
    12
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #8
    i change to mb_strcut($text, 0, 130, 'utf8'); but its still unequal.
     
    Nuzhser, Jun 16, 2010 IP
  9. lukeg32

    lukeg32 Peon

    Messages:
    645
    Likes Received:
    19
    Best Answers:
    1
    Trophy Points:
    0
    #9
    It fine for me on a test server; What is the mb settings on the server you are using by default? Are you checking that the strings you are trying to shorten are actually more than 130 characters?

    Also, You might want to take a look at some of the pointers and problems described here which is pretty thorough;

    http://www.phpwact.org/php/i18n/charsets
     
    lukeg32, Jun 16, 2010 IP
  10. Nuzhser

    Nuzhser Peon

    Messages:
    12
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #10
    of course they are longer than 130 characters
    thank you for link
     
    Nuzhser, Jun 16, 2010 IP