Collecting data from websites—commonly known as web scraping—is a practical technique for many projects. While libraries like BeautifulSoup are great for working with basic HTML, they often struggle when pages rely heavily on JavaScript to display content. That’s where Selenium comes in.
In this guide, you’ll learn how to use Selenium with Python to scrape dynamic websites effectively—step by step.
What is Selenium?
Selenium is a browser automation framework designed for testing web applications. It simulates real user behavior by controlling an actual browser like Chrome or Firefox. Because of this, it can handle JavaScript-rendered content that simpler tools can’t.
This makes Selenium a great choice for scraping content from interactive websites, forms, infinite scrolls, and more.
Installing Selenium
To get started, install Selenium with pip:
pip install selenium
Setting Up a WebDriver
Selenium requires a WebDriver to communicate with the browser. Here’s an example using Chrome:
pythonCopyEditfrom selenium import webdriver
from selenium.webdriver.chrome.service import Service
service = Service("/path/to/chromedriver")
driver = webdriver.Chrome(service=service)
If you want to run the browser without opening a window (useful on servers), enable headless mode:
pythonCopyEditfrom selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("--headless=new")
driver = webdriver.Chrome(options=options)
Finding Elements on the Page
You can use different strategies to locate HTML elements:
from selenium.webdriver.common.by import By
element = driver.find_element(By.CLASS_NAME, "product-title")
Other locator options include:
By.ID
By.TAG_NAME
By.CSS_SELECTOR
By.XPATH
Waiting for JavaScript to Load
Instead of using time.sleep()
, Selenium supports smart waiting using WebDriverWait
:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, "content"))
)
Executing JavaScript
Need to scroll the page or trigger lazy-loaded elements? You can run JavaScript:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
Taking Screenshots
Capture a screenshot of the current view with:
driver.save_screenshot("screenshot.png")
Handling Pagination
To scrape multiple pages, you can loop through links or interact with a “Next” button:
next_button = driver.find_element(By.LINK_TEXT, "Next")
next_button.click()
Exporting Data
You can use the Pandas library to save your scraped data to a CSV file:
import pandas as pd
df = pd.DataFrame(data)
df.to_csv("output.csv", index=False)
Scrolling with Keys
To simulate pressing keys like PAGE_DOWN
or END
:
pythonCopyEditfrom selenium.webdriver.common.keys import Keys
body = driver.find_element(By.TAG_NAME, "body")
body.send_keys(Keys.END)
Blocking Images and Other Resources
To speed up scraping and reduce resource usage:
driver.execute_cdp_cmd("Network.setBlockedURLs", {"urls": ["*.jpg", "*.png"]})
How Does Selenium Compare to Other Tools?
Tool | JavaScript Support | Speed | Ideal Use Case |
---|---|---|---|
Selenium | Full | Moderate | Interactive/dynamic pages |
BeautifulSoup | None | Fast | Static HTML scraping |
Scrapy | Optional (via Selenium) | Very Fast | Large-scale scraping projects |
Puppeteer | Full (Node.js only) | Moderate | Headless Chromium-based scraping |
When to Use Selenium
Choose Selenium when:
- The website relies heavily on JavaScript
- You need to simulate user interactions (clicks, scrolls, inputs)
- You’re working on a small or medium-scale scraping task
For larger or faster scraping jobs, consider tools like Scrapy, or specialized APIs that handle proxies, CAPTCHA, and JavaScript for you.
Conclusion
Selenium is an excellent option for scraping dynamic websites using Python. With a bit of setup, it allows you to extract content from even the most complex pages. While it’s not the fastest tool, its ability to automate a real browser makes it incredibly versatile.