1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Help with Regular expression

Discussion in 'Programming' started by promo, Jun 14, 2016.

  1. #1
    Hi all

    Can somebody point me in the right direction here..

    I want to create a regex that can sort domain names after their letter composition. I want to create two versions.

    One for western premium letters and one for Chinese premium letters.

    The western one is supposed to remove any domain that contains any of these letters:
    J, K, Q, U, V, W, X, Y or Z

    The Chinese one is supposed to remove any domain that contains any of these letters:
    A, E, I,O, U, V

    I am a bit lost.. I tried this for the western one:

    (\^JKQUVWXYZ)(\^JKQUVWXYZ)(\^JKQUVWXYZ)(\^JKQUVWXYZ)\.com

    Supposed to filter LLLL.com domains. Can anyone give me a pointer?

    Send me a pm if you know how to do this. I might be able to award you something for your help.
     
    Last edited: Jun 15, 2016
    promo, Jun 14, 2016 IP
  2. PoPSiCLe

    PoPSiCLe Illustrious Member

    Messages:
    4,623
    Likes Received:
    725
    Best Answers:
    152
    Trophy Points:
    470
    #2
    Could you give an example on some domains you would allow, and some you wouldn't? Do you want to filter out any that contains ANY of the letters, or only if they contain letters in a specific pattern (like 4 "forbidden" letters in a row)?
     
    PoPSiCLe, Jun 15, 2016 IP
  3. promo

    promo Well-Known Member

    Messages:
    1,077
    Likes Received:
    66
    Best Answers:
    0
    Trophy Points:
    160
    #3
    Sure. Thanks for stepping in..

    Lets take western premium 4 letter .com domain first..
    I want no: "J, K, Q, U, V, W, X, Y or Z" in any of the 4 positions.

    This means these domains are good and should be in the list:

    abcd.com
    glol.com
    rrtt.com

    These are not good and should be sorted out by the regex:

    jqua.com
    abcz.com
    uola.com
     
    promo, Jun 15, 2016 IP
  4. PoPSiCLe

    PoPSiCLe Illustrious Member

    Messages:
    4,623
    Likes Received:
    725
    Best Answers:
    152
    Trophy Points:
    470
    #4
    This works, and if you want to match the chinese one, just swap out the letters:
    
    <?php
    $domains = ['abcd.com','glol.com','rrtt.com','jqua.com','abcz.com','uola.com'];
    foreach ($domains as $key => $value) {
       $subject = explode('.',$value);
       if (preg_match('/([jkquvwxyz]+)/i',$subject[0],$matches) == false) {
       echo $value.'<br>';
       }
    }
    ?>
    
    PHP:
    This uses an array and a foreach, but you can of course put this in any function, just strip out whatever doesn't work - the important bit is the explode on the domain (to get rid of the TLD), and the preg_match itself
     
    PoPSiCLe, Jun 15, 2016 IP
  5. promo

    promo Well-Known Member

    Messages:
    1,077
    Likes Received:
    66
    Best Answers:
    0
    Trophy Points:
    160
    #5
    Ok, I couldent get this to work exactly.

    I am using DRT, I can only input a single line of code into the expression and it then runs it towards a list I paste into DRT.

    Examples of working code in DRT:

    Sample: ends with cheap or free
    (cheap|free)\.

    Sample: is LNLN.tld
    ^(\D\d\D\d)\.

    Sample: is NNNN.tld
    ^(\d){4,4}\.

    Sample: is .Mobi
    ()\.mobi

    So the sample part DRT takes care of, I need the expression that will filter certain criteria.. Such as not having specific letters present.
     
    promo, Jun 15, 2016 IP
  6. PoPSiCLe

    PoPSiCLe Illustrious Member

    Messages:
    4,623
    Likes Received:
    725
    Best Answers:
    152
    Trophy Points:
    470
    #6
    Aha... Try just having the list of characters in the () - so something like: (jkquvwxyz)\.com or (\jkquvwxyz)\.com The ^ at the beginning of an expression tells it to look from the start, so you probably want to leave that out
     
    PoPSiCLe, Jun 15, 2016 IP
  7. promo

    promo Well-Known Member

    Messages:
    1,077
    Likes Received:
    66
    Best Answers:
    0
    Trophy Points:
    160
    #7
    The first one was accepted by the error checker. The second was not (telling me it has to conform to .net expressions).

    Unfortunately currently the first one filters out all names for some reason.
     
    promo, Jun 15, 2016 IP
  8. PoPSiCLe

    PoPSiCLe Illustrious Member

    Messages:
    4,623
    Likes Received:
    725
    Best Answers:
    152
    Trophy Points:
    470
    #8
    What is this DRT thing you're using? If I could get some information about the limitations and syntax of the regex-engine, it would be easier to pinpoint what needs to be done.
     
    PoPSiCLe, Jun 15, 2016 IP
  9. ThePHPMaster

    ThePHPMaster Well-Known Member

    Messages:
    737
    Likes Received:
    52
    Best Answers:
    33
    Trophy Points:
    150
    #9
    Show us your code.
     
    ThePHPMaster, Jun 15, 2016 IP
  10. promo

    promo Well-Known Member

    Messages:
    1,077
    Likes Received:
    66
    Best Answers:
    0
    Trophy Points:
    160
    #10
    Popsicle is currently trying his hand at working the code in the software itself. I am using:


    Domain Research Tool - Lezon Inc.

    Filtering Domains using Regular Expressions


    The newest addition to DRT is a feature that allows you to use .NET formatted regular expressions (RegEx) to filter domain lists. Regular expressions allow for virtually unlimited customization when filtering keywords. For example, you can filter domains that contain LLL.tld, NNN.tld, NLN.tld, that start with i or e, that contain "test" as the third character and much more.

    This feature is accessible from the Options->Filter Settings->Advanced Keyword Settings tab.

    [​IMG]

    A regular expressions is a special text string that describes a search pattern.

    While regular expression syntax and usage is beyond the scope of this help file (regular expressions can be very complicated or very simple, depending on your needs) we will explain some very simple regex syntax and strongly recommend that you checkout Google, or RegEx Buddy for more help.

    Sample matching commands:

    ^ = the match must exist at the beginning of the string, ex: ^(e) would match any domain that starts with the letter e
    $ = the match must exist at the end of the string, ex: (e)$ would match any domain that ends with the letter e, but remember, domains always end with an extension
    | = or conditional, ex: (cheap|free|affordable) would match any domain that contains cheap or free or affordable (notice, no spaces)
    . = any character, ex: (gr.y) will match gray.com, grey.com, grzy.com, but not gry.com

    (cheap|free)\. = domain must end with cheap or free (noticed we escaped the dot with \.)

    More great regex information is available at the link below:

    http://www.regular-expressions.info/quickstart.html

    How does D.R.T. match regular expressions?

    When regular expression matching is turned on, Domain Research Tool loops through all of the "Enabled" regular expressions and attempts to match the current domain (whether pasted in or loaded from a file) against the regular expression. If a match is found, for even a single enabled regular expression rule then the domain passed the test, otherwise it failed (and won't be added to the scan list).

    There is a small performance cost when using regular expressions. On average, a file containing 100,000 domains will load 2 seconds slower with two regular expression rules

    Why use regular expressions?

    Regular expressions allow for extremely flexible pattern matching. You can customize the regular expression to match from just a single rule to a complex set of dozens of rules (ex: domain that is no longer than 10 characters, contains only letters, has "z" as the third letter, contains a,e,i,o,u all in one rule).

    Is there a regular expression generator?

    We use a tool called RegEx Buddy (no affiliation). It's not free but does an excellent job at generating and testing regular expressions.
     
    promo, Jun 16, 2016 IP