Scraper extractor types explained

Created by Kai Sasaki, Modified on Mon, 21 Nov, 2022 at 2:08 AM by Kai Sasaki

We support the following extractor types:


Text

Select this extractor type if you want to extract the text content from a selector.


For example, if your target is:

<h1>This is the page title</h1>

We will extract: 

{
  "page_title": "This is the page title"
}


HTML

Select this extractor type if you want to extract the text content but also the HTML markup.


For example, if your target is:

<h1>This is the page title</h1>

We will extract:

{
  "page_title_html": "<h1>This is the page title</h1>"
}


Attribute

Select this extractor type if you want to get the content from an HTML attribute.


For example, if your target is the attribute href:

<a href="https://google.com">click here</a>

We will extract "https://google.com" and assign it to the variable name you have defined.

{
  "link": "https://google.com"
}
You can use any HTML attribute. Some common examples are: href, id, class, src, alt, style, and type.


Collection

This option is useful to extract groups of data together. A collection can have sub-selectors in order to group information into a nested key. It is useful for cards and table rows.

You could extract something like this:

<div>
  <div class="coin">
    <h3 class="name">Botcoin</h3>
    <p class="price">$19</p>
  </div>
  <div class="coin">
    <h3 class="name">Etirum</h3>
    <p class="price">$17</p>
  </div>
</div>

Into this:

{
  "coins": [
    {
      "name": "Botcoin",
      "price": "$19"
    },
    {
      "name": "Etirum",
      "price": "$17"
    }
  ]
}



If you feel like we are missing something, please let us know and we will add it as soon as possible.

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons

Feedback sent

We appreciate your effort and will try to fix the article