Problem with REGEX scraping


#1

Hey folks,

unfortunately I have problems with scraping.
This is the part of the source:

 `<div class="cta_text">USERNAME jetzt kennenlernen!</div>` 

I want to scrape the Username, but my Target

REGEX=[(?<=<div class="cta_text">)(.+?)(?=jetzt kennenlernen!</div>)] does not return USERNAME

Instead it returns jus a <

Can you please give me a hint what I’m doing wrong?

Thanks,
drittaccount


#2

When I test this on https://regex101.com/ I get the same result:

PS: My point is that this is not an issue with Kantu’s SourceExtract command, but a general regex question. I hope someone better at regex than me can answer it :wink:


#3

Hey Ulrich,

I tried it again with regex101.com but it just claimed the / inside the regex as “unescaped delimiter (which) must be escaped with a backslash ()”.
It’s not important, so I deleted it. Works on regex101.com, but not on Kantu.

Regards,
drittaccount


#4

Hmm… I guess the square brackets [… ] are the problem. Without them, it works fine:

{
“Command”: “sourceExtract”,
“Target”: “regex=(?<=<div class=“cta_text”>)(.+?)(?=jetzt kennenlernen!<\/div>)”,
“Value”: “ww”
},

e1