Python Playwright 详细教程

Playwright 是一个现代化的浏览器自动化工具，支持 Chromium、Firefox 和 WebKit，由微软开发。它比 Selenium 更快速、更可靠，并且提供了更丰富的 API。

1. 安装与设置

1.1 安装 Playwright

# 安装 Playwright Python 包
pip install playwright
# 安装浏览器二进制文件（Chromium、Firefox 和 WebKit）
playwright install

1.2 安装特定浏览器

# 只安装 Chromium
playwright install chromium
# 安装特定版本的 Firefox
playwright install ff@latest  # 最新版
playwright install ff@1300    # 特定版本

2. 基本使用

2.1 同步 API

from playwright.sync_api import sync_playwright
with sync_playwright() as p:
    # 选择浏览器（chromium, firefox, webkit）
    browser = p.chromium.launch(headless=False)  # headless=False 显示浏览器

    # 创建新页面
    page = browser.new_page()

    # 导航到网址
    page.goto("https://example.com")

    # 打印页面标题
    print(page.title())

    # 截图
    page.screenshot(path="example.png")

    # 关闭浏览器
    browser.close()

2.2 异步 API

import asyncio
from playwright.async_api import async_playwright
async def main():
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        page = await browser.new_page()
        await page.goto("https://example.com")
        print(await page.title())
        await browser.close()
asyncio.run(main())

3. 元素定位与操作

3.1 基本选择器

# 通过文本定位
page.click("text=Login")
# 通过 CSS 选择器
page.fill("#username", "myuser")
# 通过 XPath
page.click("//button[@id='submit']")
# 组合选择器
page.click("article:has-text('Playwright') >> button")

3.2 常用操作方法

# 点击元素
page.click("button#submit")
# 输入文本
page.fill("input#username", "myuser")
# 获取文本
text = page.text_content("h1")
# 获取属性值
value = page.get_attribute("input", "value")
# 选择下拉框
page.select_option("select#colors", "blue")
# 勾选复选框
page.check("input#agree")
# 上传文件
page.set_input_files("input#file", "myfile.pdf")

3.3 等待元素

# 等待元素出现（最多10秒）
page.wait_for_selector("#loading", state="visible", timeout=10000)
# 等待元素隐藏
page.wait_for_selector("#loading", state="hidden")
# 等待导航完成
page.wait_for_url("**/dashboard")
# 等待函数返回True
page.wait_for_function("() => document.readyState === 'complete'")

4. 高级功能

4.1 处理弹窗和对话框

# 处理 alert/confirm/prompt
page.on("dialog", lambda dialog: dialog.accept())
# 处理新窗口/标签页
with page.expect_popup() as popup_info:
    page.click("a[target=_blank]")
popup = popup_info.value

4.2 处理 iframe

# 定位 iframe
frame = page.frame(name="iframe-name")
# 在 iframe 内操作
frame.click("button")

4.3 模拟设备

# 模拟 iPhone 11
iphone_11 = p.devices["iPhone 11"]
context = browser.new_context(**iphone_11)
page = context.new_page()

4.4 网络拦截

# 拦截请求
page.route("**/*.{png,jpg,jpeg}", lambda route: route.abort())
page.goto("https://example.com")
# 修改请求
def handle_route(route):
    response = route.fetch()
    json = response.json()
    json["message"]["big_red_button"] = False
    route.fulfill(response=response, json=json)
page.route("https://api.example.com/data", handle_route)

5. 测试相关功能

5.1 断言与期望

# 元素可见性
expect(page.locator("h1")).to_be_visible()
# 文本内容
expect(page.locator(".status")).to_have_text("Success")
# 元素计数
expect(page.locator("li")).to_have_count(3)
# 页面 URL
expect(page).to_have_url("**/dashboard")

5.2 录制测试

Playwright 提供测试录制功能：

# 启动测试录制器
playwright codegen https://example.com

5.3 与 pytest 集成

# 安装 pytest 插件
pip install pytest-playwright
# 示例测试用例
def test_example(page):
    page.goto("https://example.com")
    assert "Example" in page.title()

6. 性能优化与调试

6.1 性能分析

# 启动跟踪
context.tracing.start(screenshots=True, snapshots=True)
# 停止跟踪并保存
context.tracing.stop(path="trace.zip")

6.2 调试技巧

# 慢动作模式（方便观察）
browser = p.chromium.launch(headless=False, slow_mo=100)  # 100ms 延迟
# 打开开发者工具
browser = p.chromium.launch(devtools=True)
# 打印控制台日志
page.on("console", lambda msg: print(msg.text))

7. 实际应用示例

7.1 登录网站并截图

from playwright.sync_api import sync_playwright
with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    page = browser.new_page()

    # 导航到登录页面
    page.goto("https://example.com/login")

    # 填写登录表单
    page.fill("#username", "testuser")
    page.fill("#password", "password123")

    # 点击登录按钮
    page.click("#login-button")

    # 等待导航完成
    page.wait_for_url("**/dashboard")

    # 截图保存
    page.screenshot(path="dashboard.png")

    browser.close()

7.2 抓取动态加载数据

from playwright.sync_api import sync_playwright
with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()

    page.goto("https://example.com/products")

    # 等待动态加载的内容
    page.wait_for_selector(".product-item")

    # 获取所有产品
    products = page.query_selector_all(".product-item")
    for product in products:
        name = product.query_selector(".name").text_content()
        price = product.query_selector(".price").text_content()
        print(f"{name}: {price}")

    browser.close()

7.3 处理无限滚动页面

from playwright.sync_api import sync_playwright
with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    page.goto("https://example.com/infinite-scroll")

    # 获取初始项目数
    items = page.query_selector_all(".item")
    last_count = len(items)

    while True:
        # 滚动到底部
        page.evaluate("window.scrollTo(0, document.body.scrollHeight)")

        # 等待新项目加载
        page.wait_for_function(f"""() => {{
            const items = document.querySelectorAll('.item');
            return items.length > {last_count};
        }}""")

        # 更新项目计数
        items = page.query_selector_all(".item")
        new_count = len(items)

        # 如果没有新项目加载，退出循环
        if new_count == last_count:
            break

        last_count = new_count

    print(f"Total items loaded: {last_count}")
    browser.close()

8. 最佳实践

选择器策略：
- 优先使用 text= 和 CSS 选择器
- 避免使用 XPath 除非必要
- 使用 data-testid 等测试属性定位元素
等待策略：
- 使用 wait_for_selector 而不是 time.sleep
- 优先使用自动等待（Playwright 默认会等待元素可操作）
资源管理：
- 使用 with 语句确保浏览器正确关闭
- 重用浏览器上下文以提高性能
测试稳定性：
- 添加适当的等待和断言
- 使用 expect API 进行断言
- 考虑添加重试逻辑
性能考虑：
- 对于爬虫，可以禁用图片加载
- 重用页面和上下文
- 并行化多个浏览器实例 Playwright 是一个功能强大的工具，适用于自动化测试、网页抓取和机器人开发。它的跨浏览器支持、丰富的 API 和出色的性能使其成为现代 Web 自动化的首选工具。

Python Playwright 详细教程

Python Playwright 详细教程

1. 安装与设置

1.1 安装 Playwright

1.2 安装特定浏览器

2. 基本使用

2.1 同步 API

2.2 异步 API

3. 元素定位与操作

3.1 基本选择器

3.2 常用操作方法

3.3 等待元素

4. 高级功能

4.1 处理弹窗和对话框

4.2 处理 iframe

4.3 模拟设备

4.4 网络拦截

5. 测试相关功能

5.1 断言与期望

5.2 录制测试

5.3 与 pytest 集成

6. 性能优化与调试

6.1 性能分析

6.2 调试技巧

7. 实际应用示例

7.1 登录网站并截图

7.2 抓取动态加载数据

7.3 处理无限滚动页面

8. 最佳实践

results matching ""

No results matching ""