ASP.NET/C# Web Spider (in the making)

Ferbal Peon

Messages:: 9

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#1

Heylo all,

Okay it's not a webspider or anything like that *YET*. Right now it takes in a url/web page, makes a new
instance of webClient and downloads the data (web page) into a string.
        string url = Text1.Value;
        WebClient browser = new WebClient();
        UTF8Encoding enc = new UTF8Encoding();
        string fContents = enc.GetString(browser.DownloadData(url));
        int len = fContents.Length;
        char c;
        string linkList = "";

        for (int i = 0; i < len; i++)
        {
            c = Convert.ToChar(fContents.Substring(i, 1));
            if (c == 'a')
            {
                i++;
                c = Convert.ToChar(fContents.Substring(i, 1));
                if (c == ' ')
                {
                    i++;
                    c = Convert.ToChar(fContents.Substring(i, 1));
                    if (c == 'h')
                    {
                        i = i + 6; // move our string counter to after the quotes ref="h
                        c = Convert.ToChar(fContents.Substring(i, 1));
                        while (c != '"')
                        {
                            c = Convert.ToChar(fContents.Substring(i, 1));
                            if (c == '"')
                            {
                                break;
                            }
                            linkList = linkList + c;
                            i++;
                        }
                        linkList = linkList + "\n";
                        TextArea1.Value = linkList;
                        
                    }
                }
            }
        }
        
       
    }
Code (markup):
As you can see you start off with the entire string and just go through it character by character. On some sites it works and it will display each link, however half the time it will fail and give an error of:

Index and length must refer to a location within the string.

The error occurs at THIS line:
c = Convert.ToChar(fContents.Substring(i, 1)); <-----------errors here
if (c == '"')
Code (markup):
Now I don't understand why this would be erroring. i is the position (character its on) in the string (or webpage) and 1 is the length of how many characters to put into my variable c.

Thanks in advance for all help, much appreciated!!

Ferbal, Jul 21, 2006 IP

Ferbal Peon

Messages:: 9

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#2

Anyone have any ideas at all?

Ferbal, Jul 22, 2006 IP

benjymouse Peon

Messages:: 39

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#3

You need to learn something about parsing. What you are doing is not top-down nor bottom-up parsing.

You need to divide the task. Usually compilers etc. will use a scanner to divide the input stream into symbols, so that the parser rules are free to concern themselves with grammar without having to deal with individual characters.

A scanner will also recognize whitespace correctly. Your attempt will fail on this text:

a h

an "a", two spaces and a "h".

benjymouse, Jul 24, 2006 IP

Ferbal Peon

Messages:: 9

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#4

Parsing requires a lot more work that doesn't need to be done with what I am trying to accomplish, thanks though!

And if it finds an 'a', it then looks at the next character and if it is a space, it checks to see if the next character is an 'h'. If it is a space, it will just continue on with the loop looking for the next 'a'.

Ferbal, Jul 24, 2006 IP

Free Born John Guest

Messages:: 111

Likes Received:: 4

Best Answers:: 0

Trophy Points:: 0

#5

why not put a couple of displays in to show the value of i and the the substring length.

Free Born John, Jul 24, 2006 IP

benjymouse Peon

Messages:: 39

Likes Received:: 1

Best Answers:: 0

Trophy Points:: 0

#6

Ferbal said: ↑

And if it finds an 'a', it then looks at the next character and if it is a space, it checks to see if the next character is an 'h'. If it is a space, it will just continue on with the loop looking for the next 'a'.
Click to expand...

A couple of points then.

You do not need to use the Substring method. It returns a string. If all you are interested in is character by character then just use the [index] indexer of the string. It will return the character in the position indicated by index (counted from 0 i believe).

Your code will *not* just continue. You advance the index beyond what you know is safe. If the string ends right after an "a" you'll have an indexing error.

At the very least you should guard the condition with a shortcut boolean and like

if (i<fContents.Length && fContents=='a')

benjymouse, Jul 24, 2006 IP

Ferbal Peon

Messages:: 9

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#7

Thanks, I did change the code to not use substring, however, now I cannot connect to any remote website, kinda odd since it was not doing this before . If I somehow get it working I will see if the original error continues happening, thanks again!

-Ferbal

Ferbal, Jul 24, 2006 IP

Darrin Peon

Messages:: 123

Likes Received:: 3

Best Answers:: 0

Trophy Points:: 0

#8

Have you looked at using regular expressions? The RegEx class in C# is really good and really fast at finding patterns. The syntax is a little tricky at first, but once you get it, you can pass in a large string and it will return an array of all the strings that matched your pattern.

It's very flexible and might work really well for what you are doing...

Darrin, Jul 25, 2006 IP

Ferbal Peon

Messages:: 9

Likes Received:: 0

Best Answers:: 0

Trophy Points:: 0

#9

Yeah, I recently stumbled upon some stuff on Regex. However, I still need to be able to connect to websites first lol Thanks guys!

Ferbal, Jul 25, 2006 IP

Log in or Sign up

Advertising (learn more)

ASP.NET/C# Web Spider (in the making)

Ferbal Peon

Ferbal Peon

benjymouse Peon

Ferbal Peon

Free Born John Guest

benjymouse Peon

Ferbal Peon

Darrin Peon

Ferbal Peon

Log in or Sign up

Advertising (learn more)

ASP.NET/C# Web Spider (in the making)

Ferbal Peon

Ferbal Peon

benjymouse Peon

Ferbal Peon

Free Born John Guest

benjymouse Peon

Ferbal Peon

Darrin Peon

Ferbal Peon

Useful Searches