QUERY: Can a RPA macro capture website info (table scraping)?

Hi beautiful Kantu community,

Wonder if possible to use Kantu to capture the IBAN info from
https://www.xe.com/ibancalculator/countrylist/

Currently,
There cookies hyperlinks, and each dataset are in different page in a table.
each country IBAN data is formulated in a table per page somewhat like below:

|IBAN|AL47 2121 1009 0000 0002 3569 8741|
|ISO Country Code|AL (Albania)|
|IBAN Check Digits|47|
|BBAN|2121 1009 0000 0002 3569 8741|
|Bank Identifier|212|
|Branch Identifier|1100|
|Account Number|0000000235698741|
|BBAN Check Digit(s)|9|
|SEPA Member|No|

Possible to do the following with Kantu?
(Q1) extract the information from each page using Kantu Macro

specifically in a STRING format of:
{“Switzerland IBAN Format Example”,“CH93 0076 2011 6238 5295 7”,“Bank Identifier”,“00762”,“Account Number”,“011623852957”,“SEPA Member”,“Yes”}

(Q2) Possible to use Kantu macro to save such information into an text file such as .csv?

(Q3) How to do so? Any reference to any such tutorial most appreciated.

P.S: Sorry still figuring out how to work with Kantu. A lot of trial and error seem not to work on my end. Cant wrap my head around the syntax of Kantu yet. Wonder if possible to do coding/scripting in Kantu like javascript? I tried to look for similar javascript looping logic/syntax/coding but seem not available in Kantu.

Short answer: Yes, Kantu can do that. For web scraping the storeText, storeValue and sourceExtract commands are very useful.

How to do it:

  1. Record a macro, and click on the data that you want to save. This creates the right locator to find the data. Then change click to storeText.

  1. If you want to save to a CSV file, use the special internal variable !csvLine with storeText. Everything that gets assigned to !csvLine will get saved as new column to a CSV file, once csvSave is called.

  1. That is all! Now, if you want to save the csv file to disk, too, then call localStorageExport ibandata.csv

What is missing in this flow is the logic that navigates to each iban data page. To create that part of the macro you can click on few of the country flags and will see that their locator is as follows:

//*[@id="country-list"]/li[1]/a/img (first flag)
//*[@id="country-list"]/li[2]/a/img (2nd flag)
//*[@id="country-list"]/li[3]/a/img (3rd flag)

=> so we replace the 1,2,3… with the !loop variable:

//*[@id=“country-list”]/li[${!loop}]/a/img

Done :slight_smile: . Here is a screencast of the macro running. Note that is is started via the Loop button:

Web scraping with storeText

And here is the macro:

{
  "CreationDate": "2018-7-4",
  "Commands": [
    {
      "Command": "open",
      "Target": "https://www.xe.com/ibancalculator/countrylist/",
      "Value": ""
    },
    {
      "Command": "clickAndWait",
      "Target": "//*[@id=\"country-list\"]/li[${!loop}]/a/img",
      "Value": ""
    },
    {
      "Command": "storeText",
      "Target": "//*[@id=\"contentL\"]/div[1]/table/tbody/tr[1]/th[2]",
      "Value": "!csvLine"
    },
    {
      "Command": "storeText",
      "Target": "//*[@id=\"contentL\"]/div[1]/table/tbody/tr[2]/td[2]",
      "Value": "!csvLine"
    },
    {
      "Command": "storeText",
      "Target": "//*[@id=\"contentL\"]/div[1]/table/tbody/tr[3]/td[2]",
      "Value": "!csvLine"
    },
    {
      "Command": "csvSave",
      "Target": "ibandata",
      "Value": ""
    }
  ]
}
3 Likes

Pardon the late reply.

Am impressed that the LOC is way shorter than I expect.

Was also figuring out how to work on the breadcrumb
~(I know there is a “List All” but am learning so also figuring out how to work stepping thru the breadcrumb.)

{
  "Command": "click",
  "Target": "link=**_A_**", <== Is there any way to stepthru a list of breadcrumb? Like array?
  "Value": ""
}

P.S: Very Very Thankful for the wonderful detailed and easy to understand tutorial :slight_smile: The support is outstanding I am very impressed.

This is a good question. So I created a short demo macro. Here I used while/endWhile, but if you use the !loop feature (like in the solution above) this is not needed.

The key part to read from an array is:

Create array and assign to a variable:
{
“Command”: “storeEval”,
“Target”: “new Array (‘A’,‘B’,‘C’,‘D’,‘E’)”,
“Value”: “names”
},

Access a value of the array with storedVars:

{
  "Command": "storeEval",
  "Target": "storedVars['names'][${num}]",
  "Value": "letter"
},

Complete demo macro (screencast of it here):

The macro takes a screenshot of each page (just to do something), but of course you can replace or enhance this part this with the web scraping code from above.

{
  "CreationDate": "2018-7-8",
  "Commands": [
    {
      "Command": "open",
      "Target": "https://www.xe.com/ibancalculator/countrylist/",
      "Value": ""
    },
    {
      "Command": "store",
      "Target": "fast",
      "Value": "!replayspeed"
    },
    {
      "Command": "comment",
      "Target": "Fill array and calculate its length",
      "Value": "!replayspeed"
    },
    {
      "Command": "storeEval",
      "Target": "new Array ('A','B','C','D','E')",
      "Value": "names"
    },
    {
      "Command": "storeEval",
      "Target": "storedVars['names'].length",
      "Value": "length"
    },
    {
      "Command": "echo",
      "Target": "array length = ${length}",
      "Value": "pink"
    },
    {
      "Command": "comment",
      "Target": "Arrays start with 0 (not 1)",
      "Value": "num"
    },
    {
      "Command": "store",
      "Target": "0",
      "Value": "num"
    },
    {
      "Command": "while",
      "Target": "${num} < ${length}",
      "Value": "num"
    },
    {
      "Command": "storeEval",
      "Target": "storedVars['names'][${num}]",
      "Value": "letter"
    },
    {
      "Command": "echo",
      "Target": "letter=${letter}",
      "Value": "blue"
    },
    {
      "Command": "click",
      "Target": "link=${letter}",
      "Value": ""
    },
    {
      "Command": "captureScreenshot",
      "Target": "page_${letter}",
      "Value": ""
    },
    {
      "Command": "storeEval",
      "Target": "${num}+1",
      "Value": "num"
    },
    {
      "Command": "endWhile",
      "Target": "",
      "Value": ""
    },
    {
      "Command": "echo",
      "Target": "done with ${num} while loops",
      "Value": "green"
    }
  ]
}

Hi again,

Thankful for the wonderful 2nd guide. Aid me a lot in understanding Kantu and is working on digesting it.

I ran the (original) macro code ( provided in the 1st wonderful guide) a few times and would like to share some observations,

Out of 12 runs,
9 runs stopped without looping (logs details below)
2 run looped 5 times (5 lines @ CSV) on the 1st entry (Albania) only
1 run looped 9 times but with mixed results (not in order and some CSV lines are repeated)

Wonder if it is my browser config? or some settings I had missed?
I initially assumed my connection had issues but is able to run Youtube videos smoothly.
I had closed all other software and apps before running the macro script.
I am using Chrome Version 67.0.3396.99 (Official Build) (64-bit)

Any advice most appreciated. Had tried tweaking it but seems to only made it worse >_<

======================================================
P.S: The logs for the most common single top-down macro run as below :

[status]
Playing macro TEST_AUTOMATION1
[info]
Executing: | open | IBAN Country List - Free IBAN Checker by Country - Xe | |
[info]
Executing: | clickAndWait | //[@id=“country-list”]/li[${!loop}]/a/img | |
[info]
Executing: | storeText | //
[@id=“contentL”]/div[1]/table/tbody/tr[1]/th[2] | !csvLine |
[info]
Executing: | storeText | //[@id=“contentL”]/div[1]/table/tbody/tr[2]/td[2] | !csvLine |
[info]
Executing: | storeText | //
[@id=“contentL”]/div[1]/table/tbody/tr[3]/td[2] | !csvLine |
[info]
Executing: | csvSave | ibandata | |
[info]
Executing: | open | IBAN Country List - IBAN Format Example for Country - Xe | |
[info]
Macro completed (Runtime 13.02s)

As my tried, the !csvLine will make the content in one line in my case, are there any way to sort it to be each line with storeText?

My example code

{
“CreationDate”: “2018-8-5”,
“Commands”: [
{
“Command”: “storeText”,
“Target”: “//[@id="twofactor_printable"]/div[3]/div[2]",
“Value”: “!csvLine”
},
{
“Command”: “storeText”,
“Target”: "//
[@id="twofactor_printable"]/div[3]/div[4]”,
“Value”: “!csvLine”
},
{
“Command”: “storeText”,
“Target”: “//[@id="twofactor_printable"]/div[3]/div[6]",
“Value”: “!csvLine”
},
{
“Command”: “storeText”,
“Target”: "//
[@id="twofactor_printable"]/div[4]/div[2]”,
“Value”: “!csvLine”
},
{
“Command”: “storeText”,
“Target”: “//[@id="twofactor_printable"]/div[4]/div[4]",
“Value”: “!csvLine”
},
{
“Command”: “storeText”,
“Target”: "//
[@id="twofactor_printable"]/div[4]/div[6]”,
“Value”: “!csvLine”
},
{
“Command”: “storeText”,
“Target”: “//[@id="twofactor_printable"]/div[5]/div[2]",
“Value”: “!csvLine”
},
{
“Command”: “storeText”,
“Target”: "//
[@id="twofactor_printable"]/div[5]/div[4]”,
“Value”: “!csvLine”
},
{
“Command”: “storeText”,
“Target”: “//[@id="twofactor_printable"]/div[5]/div[6]",
“Value”: “!csvLine”
},
{
“Command”: “storeText”,
“Target”: "//
[@id="twofactor_printable"]/div[6]/div[2]”,
“Value”: “!csvLine”
},
{
“Command”: “storeText”,
“Target”: “//[@id="twofactor_printable"]/div[6]/div[4]",
“Value”: “!csvLine”
},
{
“Command”: “storeText”,
“Target”: "//
[@id="twofactor_printable"]/div[6]/div[6]”,
“Value”: “!csvLine”
},
{
“Command”: “storeText”,
“Target”: “//[@id="twofactor_printable"]/div[7]/div[2]",
“Value”: “!csvLine”
},
{
“Command”: “storeText”,
“Target”: "//
[@id="twofactor_printable"]/div[7]/div[4]”,
“Value”: “!csvLine”
},
{
“Command”: “storeText”,
“Target”: “//[@id="twofactor_printable"]/div[7]/div[6]",
“Value”: “!csvLine”
},
{
“Command”: “storeText”,
“Target”: "//
[@id="twofactor_printable"]/div[8]/div[2]”,
“Value”: “!csvLine”
},
{
“Command”: “storeText”,
“Target”: “//[@id="twofactor_printable"]/div[8]/div[4]",
“Value”: “!csvLine”
},
{
“Command”: “storeText”,
“Target”: "//
[@id="twofactor_printable"]/div[8]/div[6]”,
“Value”: “!csvLine”
},
{
“Command”: “storeText”,
“Target”: “//[@id="twofactor_printable"]/div[9]/div[2]",
“Value”: “!csvLine”
},
{
“Command”: “storeText”,
“Target”: "//
[@id="twofactor_printable"]/div[9]/div[4]”,
“Value”: “!csvLine”
},
{
“Command”: “storeText”,
“Target”: “//[@id="twofactor_printable"]/div[9]/div[6]",
“Value”: “!csvLine”
},
{
“Command”: “storeText”,
“Target”: "//
[@id="twofactor_printable"]/div[10]/div[2]”,
“Value”: “!csvLine”
},
{
“Command”: “storeText”,
“Target”: “//[@id="twofactor_printable"]/div[10]/div[4]",
“Value”: “!csvLine”
},
{
“Command”: “storeText”,
“Target”: "//
[@id="twofactor_printable"]/div[10]/div[6]”,
“Value”: “!csvLine”
},
{
“Command”: “storeText”,
“Target”: “//[@id="twofactor_printable"]/div[11]/div[2]",
“Value”: “!csvLine”
},
{
“Command”: “storeText”,
“Target”: "//
[@id="twofactor_printable"]/div[11]/div[4]”,
“Value”: “!csvLine”
},
{
“Command”: “storeText”,
“Target”: “//[@id="twofactor_printable"]/div[11]/div[6]",
“Value”: “!csvLine”
},
{
“Command”: “storeText”,
“Target”: "//
[@id="twofactor_printable"]/div[12]/div[2]”,
“Value”: “!csvLine”
},
{
“Command”: “storeText”,
“Target”: “//[@id="twofactor_printable"]/div[12]/div[4]",
“Value”: “!csvLine”
},
{
“Command”: “storeText”,
“Target”: "//
[@id="twofactor_printable"]/div[12]/div[6]”,
“Value”: “!csvLine”
},
{
“Command”: “csvSave”,
“Target”: “steam62”,
“Value”: “”
}
]
}

Screenshot:

The content

The result

Monosnap

My wish

Monosnap

Thank you so much for your detail, it’s help me a lot to do something I need.

P/S: Sorry, I can’t upload more than one image, so I need store image in third party service.

Hi, so what you currently get is

AAA,BBB,CCC,…

but you want is this:

AAA
BBB
CCC

The solution is to add a csvSave after each storeText command, and not only at the end of the macro, each csvSave saves the content of !csvLine to the file, resets !csvLine to “”, and then starts a new line in the CSV file:

{
“Command”: “storeText”,
“Target”: "//[@id=“twofactor_printable”]/div[12]/div[6]”,
“Value”: “!csvLine”
},
{
“Command”: “csvSave”,
“Target”: “steam62”,
“Value”: “”
},
{
“Command”: “storeText”,
“Target”: "//[@id=“twofactor_printable”]/div[12]/div[8]”,
“Value”: “!csvLine”
},
{
“Command”: “csvSave”,
“Target”: “steam62”,
“Value”: “”
}
2 Likes

Thank you so much for your help :). It’s work well for me. Also, if it not to much, could you help me this question too?

I will get this format when run macro one time:

AAA
BBB
CCC

And it will like this when run macro second time:

AAA
BBB
CCC
AAA’
BBB’
CCC’

if it’s possible, are there any way to reset the csv file before run the macros again? So I will get:

AAA’
BBB’
CCC’

In my second time?

We will have a option to delete CSV files from within the macro soon. Currently you can only delete it manually.

Workaround: You can give the file a new name for each run - for example append the current time or the !loop number to it:

csvSave | myfile_${!loop}

1 Like

Thank you so much for your patience and great support :slight_smile:.

it’s computer science and science is reproducible

I have reproduced the instructions here:

question for devs:
what is the functional difference between
-Record click
-Record clickAt

Note to developers:

  • please include some kind of code check for missing brackets!
  • please put sort option for headers e.g Name, Size, Last Modified, Action
  • please record videos of ACTUALLY creating the macros not just the results!
  • please give users way to delete more than one CSV at a time
  • please give users way to delete more than one VISUAL at a time
  • do not overwrite existing CSV! append date or number to separate new csv from old

note to other users:
pay attention to brackets and braces!

Looking forward to open office hours with support/devs a la
https://data-miner.io/free-office-hours

thank you,
Dennis

1 Like

The solution with does not work stable with version 5.0.5.
It returns in many cases < * [error] e is undefined> where means it is an undefined error.
Using returns the same.
Using <executeScript_Sandbox> returns the no value.

Example:

  • [info]
    Executing: | storeEval | new Array (‘1’,‘2’,‘3’,‘4’,‘5’,‘6’,‘7’,‘8’,‘9’,‘10’) | var_column_array_01 |

  • [info]
    Executing: | echo | ${var_column_array_01} | |

  • [echo]
    1,2,3,4,5,6,7,8,9,10

Update:
It seems that the command does not work with the sequence which contains .

  • [info]
    Executing: | executeScript | storedVars[‘var_column_array_01’].length | length |

  • [error]
    Error in executeScript code: storedVars is not defined

Solution:
Replace the <<storedVars witn <<return using the command <executeScript_Sandbox> like this

  • Command": "executeScript_Sandbox
  • Target": "return new Array (‘1’,‘2’,‘3’,‘4’,‘5’,‘6’,‘7’,‘8’,‘9’,‘10’)
  • Value": "var_column_array_01
    and
  • Command": "executeScript_Sandbox
  • Target": "return ${var_column_array_01}.length
  • Value": "length