Scraper extractor types explained

Created by Kai Sasaki, Modified on Mon, 21 Nov, 2022 at 2:08 AM by Kai Sasaki

We support the following extractor types:

Text

Select this extractor type if you want to extract the text content from a selector.

For example, if your target is:

<h1>This is the page title</h1>

We will extract:

{
  "page_title": "This is the page title"
}

HTML

Select this extractor type if you want to extract the text content but also the HTML markup.

For example, if your target is:

<h1>This is the page title</h1>

We will extract:

{
  "page_title_html": "<h1>This is the page title</h1>"
}

Attribute

Select this extractor type if you want to get the content from an HTML attribute.

For example, if your target is the attribute href:

<a href="https://google.com">click here</a>

We will extract "https://google.com" and assign it to the variable name you have defined.

{
  "link": "https://google.com"
}

You can use any HTML attribute. Some common examples are: href, id, class, src, alt, style, and type.

Collection

This option is useful to extract groups of data together. A collection can have sub-selectors in order to group information into a nested key. It is useful for cards and table rows.

You could extract something like this:

<div>
  <div class="coin">
    <h3 class="name">Botcoin</h3>
    <p class="price">$19</p>
  </div>
  <div class="coin">
    <h3 class="name">Etirum</h3>
    <p class="price">$17</p>
  </div>
</div>

Into this:

{
  "coins": [
    {
      "name": "Botcoin",
      "price": "$19"
    },
    {
      "name": "Etirum",
      "price": "$17"
    }
  ]
}

If you feel like we are missing something, please let us know and we will add it as soon as possible.