Scraping urls from webpage

The_Grove · November 15, 2018, 10:17pm

I am relatively new to this so probably making some basic mistake. My task is to scrape a sequence of urls from a webpage. Once I get it working I intend to step through all the urls on the page and save them to a csv file.

My initial attempt to code this times out, what have I done wrong?

“CreationDate”: “2018-11-15”,
“Commands”: [
{
“Command”: “open”,
“Target”: “Page Not Found”,
“Value”: “”
},
{
“Command”: “storeAttribute”,
“Target”: “.//*[@id=‘content’]/div[2]/div/ul[2]/li[1]/a@href”,
“Value”: “!csvLine”
},
{
“Command”: “csvSave”,
“Target”: “mydata.csv”,
“Value”: “”
}
]
}

Timo · November 15, 2018, 11:15pm

Is this the locator to a link?

As a test, if you use the same locator (but without @href added) with Click, does it work then?

The_Grove · November 16, 2018, 7:55am

Yes, as an example on the demo page, https://a9t9.com/kantu/demo/storeeval there are a list of links “This link” at the end of the page. This is intended to pick up the url for the first one.

The_Grove · November 16, 2018, 10:50am

Changing to “click” and removing @href also does not work. I found the path using Firepath in Firefox.