Ecommerce WebPage Scraping Using BeautifulSoap and Selinium Libraries


HTML is almost intuitive. CSS is a great advancement that cleanly separates the structure of a page from its look and feel. JavaScript adds some pizazz. That's the theory. The real world is a little different.

In this page, you'll see how the content you see in the browser actually gets rendered and how to go about scraping it when necessary. In particular, you'll see how to count Disqus comments. Our tools will be Python and awesome packages like requests, BeautifulSoup, and Selenium.

When Should You Use Web Scraping?


Web scraping is the practice of automatically fetching the content of web pages designed for interaction with human users, parsing them, and extracting some information (possibly navigating links to other pages). It is sometimes necessary if there is no other way to extract the necessary information. Ideally, the application provides a dedicated API for accessing its data programmatically. There are several reasons web scraping should be your last resort:

  • It is fragile (the web pages you're scraping might change frequently).
  • It might be forbidden (some web apps have policies against scraping).
  • It might be slow and expansive (if you need to fetch and wade through a lot of noise).

  • Static Scraping vs. Dynamic Scraping


    Static scraping ignores JavaScript. It fetches web pages from the server without the help of a browser. You get exactly what you see in "view page source", and then you slice and dice it. If the content you're looking for is available, you need to go no further. However, if the content is something like the Disqus comments iframe, you need dynamic scraping.

    Dynamic scraping uses an actual browser (or a headless browser) and lets JavaScript do its thing. Then, it queries the DOM to extract the content it's looking for. Sometimes you need to automate the browser by simulating a user to get the content you need.

    for Source Code click here.



    Please reach me out over LinkedIn for any query.

    Thanks for reading!!!