Hello, I have an excel file of over 2,000,000 links that are all from the same website that needs to be extracted for data. For example the link is; http://www.xyz.com/person_name?pub?=________ The websites are all uniform and I just need 4-5 pieces of information from each one. Than it needs to goto an excel file for organization. Al
Well I know of software called AutomationAnywhere and OutWit and some excel commands but cannot put them to use. I am sure someone on this forum knows how without having to do the tedious work of copy-paste. Nevertheless, I appreciate your comment Al
If you want to initially separate the text that is between the slashes for a start into columns then do the following. 1. Paste the URLs into the excel file 2. They should then be one column of URLs 3. Select Data ': Text to Columns 4. Select Delimited 5. Select Other as your delimiter and put in / 6. Click Next and finish This should separate out the individual text in the URLs. Not sure if this is what you wanted but from the limited information it may be. Best of luck.
[i can't post links yet, so I stripped the http from the answer but you will need it in the search and replace] if they are all made in a specific way it is supereasy to get the data in excel. things to remember: - never ever ruin your original data. copy the data you want to work on to a second column or better sheet. - makes saves often with version numbers. the bigger the files the easier they crash. - get yourself a mentor who has knowledge in slicing and dicing of the data. it is really not complicated once you get the drift If this is your link (and I assume person name and ___ is what you are looking for) - copy all of your links xyz.com/person_name?pub?=________ into a second colum - copy xyz.com/ - mark this new colum - use "search and replace" to search for xyz.com/ and replace it with nothing - select data ': text to columns, select delimited and use ? as your delimiter - search in the column with "pub?=________ " and s&r for pub?= again with nothing person name and ____ should now be in separate columns assuming that it is not as easy as you describe, have a look into the text functions to retrieve the position of a character (in your case the ?) and the 'extract part left, part right, part' functions. hth