Web Scraping

Web scraping

Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites.

While web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler.

Web scraping a web page involves fetching it and extracting from it.

Fetching is the downloading of a page (which a browser does when you view the page). Therefore, web crawling is a main component of web scraping, to fetch pages for later processing.

In my own mind web scraping is a great tool at your disposal to automate tasks for working with UIs.

Instead of manually clicking buttons and submitting forms you instead write a web scraping script to do the job

Web scraping with Nodejs

You can use tools like Cheerio and Request to do web scraping with Node.js

Here is a very simple script for web scraping:

const request = require('request');
const cheerio = require('cheerio');

const URL = 'https://api-university.com/';

request(URL, (error, response, html) => {
  if (!error && response.statusCode == 200) {
    // load all of html into cheerio for dom manipulation
    const $ = cheerio.load(html);

    const submenu = $('.sub-menu');
    console.log(submenu.text());
  }
});

Essentially here I am dumping all of the HTML from GET https://api-university.com/ into the cheerio load function.

Now I can use jquery like syntax to simulate button clicks, find elements in the dom and more.

You can use other tools but keep in mind that web scraping is a good tool for automation and getting data from websites in a programmatic manner.

If you like this POST please check out Web Scraping Repo for more updates to the scripts and follow me at jbelmont at Github

comments powered by Disqus