性欧美丰满熟妇xxxx性,欧美16一18sexvideos,亚洲欧美色一区二区三区

Home

Backend Development

Python Tutorial

Scraping Infinite Scroll Pages with a &#Load More&# Button: A Step-by-Step Guide

Patricia Arquette

Jan 13, 2025 pm 06:09 PM

Are your scrapers stuck when trying to load data from dynamic web pages? Are you frustrated with infinite scrolls or those pesky "Load more" buttons?

You're not alone. Many websites today implement these designs to improve user experience—but they can be challenging for web scrapers.

This tutorial will guide you through a beginner-friendly walkthrough for scraping a demo page with a Load More button. Here’s what the target web page looks like:

Demo web page for scraping

By the end, you'll learn how to:

Set up Selenium for web scraping.
Automate the "Load more" button interaction.
Extract product data such as names, prices, and links.

Let's dive in!

Step 1: Prerequisites

Before diving in, ensure the following prerequisites:

Python Installed: Download and install the latest Python version from python.org, including pip during setup.
Basic Knowledge: Familiarity with web scraping concepts, Python programming, and working with libraries such as requests, BeautifulSoup, and Selenium.

Libraries Required:

Requests: For sending HTTP requests.
BeautifulSoup: For parsing the HTML content.
Selenium: For simulating user interactions like button clicks in a browser.

You can install these libraries using the following command in your terminal:

pip install requests beautifulsoup4 selenium

Before using Selenium, you must install a web driver matching your browser. For this tutorial, we'll use Google Chrome and ChromeDriver. However, you can follow similar steps for other browsers like Firefox or Edge.

Install the Web Driver

Check your browser version:
Open Google Chrome and navigate to Help > About Google Chrome from the three-dot menu to find the Chrome version.
Download ChromeDriver:
Visit the ChromeDriver download page.
Download the driver version that matches your Chrome version.
Add ChromeDriver to your system PATH:
Extract the downloaded file and place it in a directory like /usr/local/bin (Mac/Linux) or C:WindowsSystem32 (Windows).

Verify Installation

Initialize a Python file scraper.py in your project directory and test that everything is set up correctly by running the following code snippet:

from selenium import webdriver
driver = webdriver.Chrome() # Ensure ChromeDriver is installed and in PATH
driver.get("https://www.scrapingcourse.com/button-click")
print(driver.title)
driver.quit()

You can execute the above file code by running the following command on your terminal:

pip install requests beautifulsoup4 selenium

If the above code runs without errors, it will spin up a browser interface and open the demo page URL as shown below:

Demo Page in Selenium Browser Instance

Selenium will then extract the HTML and print the page title. You will see an output like this -

from selenium import webdriver
driver = webdriver.Chrome() # Ensure ChromeDriver is installed and in PATH
driver.get("https://www.scrapingcourse.com/button-click")
print(driver.title)
driver.quit()

This verifies that Selenium is ready to use. With all requirements installed and ready to use, you can start accessing the demo page's content.

Step 2: Get Access to the Content

The first step is to fetch the page's initial content, which gives you a baseline snapshot of the page's HTML. This will help you verify connectivity and ensure a valid starting point for the scraping process.

You will retrieve the HTML content of the page URL by sending a GET request using the Requests library in Python. Here's the code:

python scraper.py

The above code will output the raw HTML containing the data for the first 12 products.

This quick preview of the HTML ensures that the request was successful and that you're working with valid data.

Step 3: Load More Products

To access the remaining products, you'll need to programmatically click the "Load more" button on the page until no more products are available. Since this interaction involves JavaScript, you will use Selenium to simulate the button click.

Before writing code, let’s inspect the page to locate:

The "Load more" button selector (load-more-btn).
The div holding the product details (product-item).

You'll get all the products by loading more products, giving you a larger dataset by running the following code:

Load More Button Challenge to Learn Web Scraping - ScrapingCourse.com

This code opens the browser, navigates to the page, and interacts with the "Load more" button. The updated HTML, now containing more product data, is then extracted.

If you don’t want Selenium to open the browser every time you run this code, it also provides headless browser capabilities. A headless browser has all the functionalities of an actual web browser but no Graphical User Interface (GUI).

You can enable the headless mode for Chrome in Selenium by defining a ChromeOptions object and passing it to the WebDriver Chrome constructor like this:

import requests
# URL of the demo page with products
url = "https://www.scrapingcourse.com/button-click"
# Send a GET request to the URL
response = requests.get(url)
# Check if the request was successful
if response.status_code == 200:
    html_content = response.text
    print(html_content) # Optional: Preview the HTML
else:
    print(f"Failed to retrieve content: {response.status_code}")

When you run the above code, Selenium will launch a headless Chrome instance, so you’ll no longer see a Chrome window. This is ideal for production environments where you don’t want to waste resources on the GUI when running the scraping script on a server.

Now that the complete HTML content is retrieved extracting specific details about each product is time.

Step 4: Parse Product Information

In this step, you'll use BeautifulSoup to parse the HTML and identify product elements. Then, you'll extract key details for each product, such as the name, price, and links.

pip install requests beautifulsoup4 selenium

In the output, you should see a structured list of product details, including the name, image URL, price, and product page link, like this -

from selenium import webdriver
driver = webdriver.Chrome() # Ensure ChromeDriver is installed and in PATH
driver.get("https://www.scrapingcourse.com/button-click")
print(driver.title)
driver.quit()

The above code will organize the raw HTML data into a structured format, making it easier to work with and preparing the output data for further processing.

Step 5: Export Product Information to CSV

You can now organize the extracted data into a CSV file, which makes it easier to analyze or share. Python's CSV module helps with this.

python scraper.py

The above code will create a new CSV file with all the required product details.

Here's the complete code for an overview:

Load More Button Challenge to Learn Web Scraping - ScrapingCourse.com

The above code will create a products.csv which would look like this:

import requests
# URL of the demo page with products
url = "https://www.scrapingcourse.com/button-click"
# Send a GET request to the URL
response = requests.get(url)
# Check if the request was successful
if response.status_code == 200:
    html_content = response.text
    print(html_content) # Optional: Preview the HTML
else:
    print(f"Failed to retrieve content: {response.status_code}")

Step 6: Get Extra Data for Top Products

Now, let's say you want to identify the top 5 highest-priced products and extract additional data (such as the product description and SKU code) from their individual pages. You can do that using the code as follows:

from selenium import webdriver
from selenium.webdriver.common.by import By
import time
# Set up the WebDriver (make sure you have the appropriate driver installed, e.g., ChromeDriver)
driver = webdriver.Chrome()
# Open the page
driver.get("https://www.scrapingcourse.com/button-click")
# Loop to click the "Load More" button until there are no more products
while True:
    try:
        # Find the "Load more" button by its ID and click it
        load_more_button = driver.find_element(By.ID, "load-more-btn")
        load_more_button.click()
        # Wait for the content to load (adjust time as necessary)
        time.sleep(2)
    except Exception as e:
        # If no "Load More" button is found (end of products), break out of the loop
        print("No more products to load.")
        break
# Get the updated page content after all products are loaded
html_content = driver.page_source
# Close the browser window
driver.quit()

Here's the complete code for an overview:

from selenium import webdriver
from selenium.webdriver.common.by import By

import time

# instantiate a Chrome options object
options = webdriver.ChromeOptions()

# set the options to use Chrome in headless mode
options.add_argument("--headless=new")

# initialize an instance of the Chrome driver (browser) in headless mode
driver = webdriver.Chrome(options=options)

...

This code sorts the products by price in descending order. Then, for the top 5 highest-priced products, the script opens their product pages and extracts the product description and SKU using BeautifulSoup.

The output of the above code will be like this:

from bs4 import BeautifulSoup
# Parse the page content with BeautifulSoup
soup = BeautifulSoup(html_content, 'html.parser')
# Extract product details
products = []
# Find all product items in the grid
product_items = soup.find_all('div', class_='product-item')
for product in product_items:
    # Extract the product name
    name = product.find('span', class_='product-name').get_text(strip=True)

    # Extract the product price
    price = product.find('span', class_='product-price').get_text(strip=True)

    # Extract the product link
    link = product.find('a')['href']

    # Extract the image URL
    image_url = product.find('img')['src']

    # Create a dictionary with the product details
    products.append({
        'name': name,
        'price': price,
        'link': link,
        'image_url': image_url
})
# Print the extracted product details
for product in products[:2]:
    print(f"Name: {product['name']}")
    print(f"Price: {product['price']}")
    print(f"Link: {product['link']}")
    print(f"Image URL: {product['image_url']}")
    print('-' * 30)

The above code will update the products.csv and it will now have information like this:

Name: Chaz Kangeroo Hoodie
Price: 
Link: https://scrapingcourse.com/ecommerce/product/chaz-kangeroo-hoodie
Image URL: https://scrapingcourse.com/ecommerce/wp-content/uploads/2024/03/mh01-gray_main.jpg
------------------------------
Name: Teton Pullover Hoodie
Price: 
Link: https://scrapingcourse.com/ecommerce/product/teton-pullover-hoodie
Image URL: https://scrapingcourse.com/ecommerce/wp-content/uploads/2024/03/mh02-black_main.jpg
------------------------------
…

Conclusion

Scraping pages with infinite scrolling or "Load more" buttons can seem challenging, but using tools like Requests, Selenium, and BeautifulSoup simplifies the process.

This tutorial showed how to retrieve and process product data from a demo page, saving it in a structured format for quick and easy access.

See all the code snippets here.

The above is the detailed content of Scraping Infinite Scroll Pages with a &#Load More&# Button: A Step-by-Step Guide. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undress AI Tool

Undress images for free

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Grass Wonder Build Guide | Uma Musume Pretty Derby

4 weeks ago By Jack chen

Roblox: 99 Nights In The Forest - All Badges And How To Unlock Them

3 weeks ago By DDD

Uma Musume Pretty Derby Banner Schedule (July 2025)

4 weeks ago By Jack chen

Windows Security is blank or not showing options

4 weeks ago By 下次還敢

RimWorld Odyssey Temperature Guide for Ships and Gravtech

3 weeks ago By Jack chen

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Laravel Tutorial

1597

PHP Tutorial

1488

Related knowledge

Polymorphism in python classes Jul 05, 2025 am 02:58 AM

Polymorphism is a core concept in Python object-oriented programming, referring to "one interface, multiple implementations", allowing for unified processing of different types of objects. 1. Polymorphism is implemented through method rewriting. Subclasses can redefine parent class methods. For example, the spoke() method of Animal class has different implementations in Dog and Cat subclasses. 2. The practical uses of polymorphism include simplifying the code structure and enhancing scalability, such as calling the draw() method uniformly in the graphical drawing program, or handling the common behavior of different characters in game development. 3. Python implementation polymorphism needs to satisfy: the parent class defines a method, and the child class overrides the method, but does not require inheritance of the same parent class. As long as the object implements the same method, this is called the "duck type". 4. Things to note include the maintenance

Explain Python generators and iterators. Jul 05, 2025 am 02:55 AM

Iterators are objects that implement __iter__() and __next__() methods. The generator is a simplified version of iterators, which automatically implement these methods through the yield keyword. 1. The iterator returns an element every time he calls next() and throws a StopIteration exception when there are no more elements. 2. The generator uses function definition to generate data on demand, saving memory and supporting infinite sequences. 3. Use iterators when processing existing sets, use a generator when dynamically generating big data or lazy evaluation, such as loading line by line when reading large files. Note: Iterable objects such as lists are not iterators. They need to be recreated after the iterator reaches its end, and the generator can only traverse it once.

How to handle API authentication in Python Jul 13, 2025 am 02:22 AM

The key to dealing with API authentication is to understand and use the authentication method correctly. 1. APIKey is the simplest authentication method, usually placed in the request header or URL parameters; 2. BasicAuth uses username and password for Base64 encoding transmission, which is suitable for internal systems; 3. OAuth2 needs to obtain the token first through client_id and client_secret, and then bring the BearerToken in the request header; 4. In order to deal with the token expiration, the token management class can be encapsulated and automatically refreshed the token; in short, selecting the appropriate method according to the document and safely storing the key information is the key.

How to iterate over two lists at once Python Jul 09, 2025 am 01:13 AM

A common method to traverse two lists simultaneously in Python is to use the zip() function, which will pair multiple lists in order and be the shortest; if the list length is inconsistent, you can use itertools.zip_longest() to be the longest and fill in the missing values; combined with enumerate(), you can get the index at the same time. 1.zip() is concise and practical, suitable for paired data iteration; 2.zip_longest() can fill in the default value when dealing with inconsistent lengths; 3.enumerate(zip()) can obtain indexes during traversal, meeting the needs of a variety of complex scenarios.

What are python iterators? Jul 08, 2025 am 02:56 AM

InPython,iteratorsareobjectsthatallowloopingthroughcollectionsbyimplementing__iter__()and__next__().1)Iteratorsworkviatheiteratorprotocol,using__iter__()toreturntheiteratorand__next__()toretrievethenextitemuntilStopIterationisraised.2)Aniterable(like

Explain Python assertions. Jul 07, 2025 am 12:14 AM

Assert is an assertion tool used in Python for debugging, and throws an AssertionError when the condition is not met. Its syntax is assert condition plus optional error information, which is suitable for internal logic verification such as parameter checking, status confirmation, etc., but cannot be used for security or user input checking, and should be used in conjunction with clear prompt information. It is only available for auxiliary debugging in the development stage rather than substituting exception handling.

What are Python type hints? Jul 07, 2025 am 02:55 AM

TypehintsinPythonsolvetheproblemofambiguityandpotentialbugsindynamicallytypedcodebyallowingdeveloperstospecifyexpectedtypes.Theyenhancereadability,enableearlybugdetection,andimprovetoolingsupport.Typehintsareaddedusingacolon(:)forvariablesandparamete

Python FastAPI tutorial Jul 12, 2025 am 02:42 AM

To create modern and efficient APIs using Python, FastAPI is recommended; it is based on standard Python type prompts and can automatically generate documents, with excellent performance. After installing FastAPI and ASGI server uvicorn, you can write interface code. By defining routes, writing processing functions, and returning data, APIs can be quickly built. FastAPI supports a variety of HTTP methods and provides automatically generated SwaggerUI and ReDoc documentation systems. URL parameters can be captured through path definition, while query parameters can be implemented by setting default values ??for function parameters. The rational use of Pydantic models can help improve development efficiency and accuracy.

See all articles

亚洲国产日韩欧美一区二区三区,精品亚洲国产成人av在线,国产99视频精品免视看7,99国产精品久久久久久久成人热,欧美日韩亚洲国产综合乱