How can I get all ids from my html using sourceSearch or source extract?

web-scraping

#1

Hello everyone .

I’m trying to get the id list from the html with this comand :

{
“Command”: “sourceExtract”,
“Target”: “regex=<span id=sp[a-zA-Z0-9]{1,20}”,
“Value”: “match”
}

But this command doesnt function.

I’ve check this <span id=sp[a-zA-Z0-9]{1,20} on regex101, and it returns

5 matches like this :

<span id=sp280250338
<span id=sp280250339
<span id=sp280250340
<span id=sp280250341

I need to get only the number of each match like this: 280250338 and convert it on : r280250338 , to loop and work with this new id.

Could someone help me to achieve this me, because i dont know if Kantu support split text or something similar to achieve this process.

thank you in advance


#2

I am not a regex expert, but I know that Kantu supports “split text” :slight_smile: