Extracting anchor text

ferret77 Heretic

Messages:: 5,276

Likes Received:: 230

Best Answers:: 0

Trophy Points:: 0

#1

is there a simple way to extract the anchor text of links using regualr expressions in php

like I am looking for basically this pattern

"target='_new'>Anti-War Protest In Texas</a>" from a bunch of text chunks

I want to extract the "Anti-War Protest In Texas" part

I could like split the string a fews times to get it, but is there a quicker way with regex

I was looking at php.net there is bunch of regex stuff that returns true or false but that not what I want

ferret77, Aug 14, 2005 IP

palespyder Psycho Ninja

Messages:: 1,254

Likes Received:: 98

Best Answers:: 0

Trophy Points:: 168

#2

found this, not sure if this is exactly what you are looking for
preg_match("/<a.>(.)<\/a>/", $matchText, $temp);
PHP:

palespyder, Aug 14, 2005 IP

ferret77 Heretic

Messages:: 5,276

Likes Received:: 230

Best Answers:: 0

Trophy Points:: 0

#3

does (.) repesent any amount of charaters, I was trying (*)

ferret77, Aug 14, 2005 IP

palespyder Psycho Ninja

Messages:: 1,254

Likes Received:: 98

Best Answers:: 0

Trophy Points:: 168

#4

yeah you could do (.*) to represent any number of characters

palespyder, Aug 14, 2005 IP

ferret77 Heretic

Messages:: 5,276

Likes Received:: 230

Best Answers:: 0

Trophy Points:: 0

#5

got it
">(.*)<\/a>"
PHP:
actually spoke too soon , gives me the url too

ferret77, Aug 14, 2005 IP

Gmorkster Peon

Messages:: 202

Likes Received:: 7

Best Answers:: 0

Trophy Points:: 0

#6

try |(<a[\s]+[^>]+>)([^</a>])(</a>)|i

The \s is important, otherwise it will match <abbr> and any other xhtml tag starting with a. Didn't test it, but it should be working.

Gmorkster, Aug 14, 2005 IP

ferret77 Heretic

Messages:: 5,276

Likes Received:: 230

Best Answers:: 0

Trophy Points:: 0

#7

is there a way to just get the link text?

or do I have to do some sort of replace?

ferret77, Aug 14, 2005 IP

Gmorkster Peon

Messages:: 202

Likes Received:: 7

Best Answers:: 0

Trophy Points:: 0

#8

preg_match("|(<a[\s]+[^>]+>)([^</a>])(</a>)|i", $link, $matches);

then $matches[1] will contain your anchor

Gmorkster, Aug 14, 2005 IP

J.D. Peon

Messages:: 1,198

Likes Received:: 65

Best Answers:: 0

Trophy Points:: 0

#9

Gmorkster said:

try |(<a[\s]+[^>]+>)([^</a>])(</a>)|i

The \s is important, otherwise it will match <abbr> and any other xhtml tag starting with a. Didn't test it, but it should be working.
Click to expand...

It's not even going to match

<a href=\"test\">abc</a>

This expression will work on all one-line anchors

<a(?:[ \t]+[^>]*)?>([^<]+)<\/a>

J.D.

J.D., Aug 14, 2005 IP

Gmorkster Peon

Messages:: 202

Likes Received:: 7

Best Answers:: 0

Trophy Points:: 0

#10

One char missing, sorry...

preg_match("|(<a[\s]+[^>]+>)([^</a>]+)(</a>)|i", "<a href=\"test\">foo</a>", $m);
print_r($m);

And the anchor is $m[2], not $m[1]

Gmorkster, Aug 14, 2005 IP

J.D. Peon

Messages:: 1,198

Likes Received:: 65

Best Answers:: 0

Trophy Points:: 0

#11

Gmorkster said:

One char missing, sorry...

preg_match("|(<a[\s]+[^>]+>)([^</a>]+)(</a>)|i", "<a href=\"test\">foo</a>", $m);
print_r($m);

And the anchor is $m[2], not $m[1]
Click to expand...

There's more than a char missing in this. Go ahead and give the anchor I quoted a try (the one with abc). You clearly don't understand what square brackets or parenthesis are for.

J.D.

J.D., Aug 14, 2005 IP

palespyder likes this.

Gmorkster Peon

Messages:: 202

Likes Received:: 7

Best Answers:: 0

Trophy Points:: 0

#12

bah-- I did, just replaced "abc" with "foo"!#!@#$

|(<a[\s]+[^>]+>)([^</a>]+)(</a>)|i is separator (match1) (match2) (match3) separator case_insensitive

- first parenthesis: match <a followed by any number of blanks (\s matches blanks and tabs), followed by any character but >
- second parentesis: match anything but </a> -- the anchor
- third parenthesis-- match </a>

Second parenthesis matches the anchor, which is $matches[2].

I believe I do understand how regex works...

Gmorkster, Aug 14, 2005 IP

J.D. Peon

Messages:: 1,198

Likes Received:: 65

Best Answers:: 0

Trophy Points:: 0

#13

Gmorkster said:

- second parentesis: match anything but </a> -- the anchor
Click to expand...

No. Square brackets mean "any of the listed characters" or "none of the listed characters" if used with ^. So, this [^</a>]+ says "one or more of any character except <, /, a or >".

On top of that, why would you put parenthesis around everything? What's the point of capturing </a>?

J.D.

J.D., Aug 14, 2005 IP

Gmorkster Peon

Messages:: 202

Likes Received:: 7

Best Answers:: 0

Trophy Points:: 0

#14

sheesh, got it now *blush*. Working for 15 straight hours must've gotten to me. Sorry!

Gmorkster, Aug 14, 2005 IP

J.D. likes this.

Log in or Sign up

Extracting anchor text

ferret77 Heretic

palespyder Psycho Ninja

ferret77 Heretic

palespyder Psycho Ninja

ferret77 Heretic

Gmorkster Peon

ferret77 Heretic

Gmorkster Peon

J.D. Peon

Gmorkster Peon

J.D. Peon

Gmorkster Peon

J.D. Peon

Gmorkster Peon

Useful Searches