Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

type of parameter page should be a class, not an interface. #21

Open
value94 opened this issue Nov 27, 2020 · 7 comments
Open

type of parameter page should be a class, not an interface. #21

value94 opened this issue Nov 27, 2020 · 7 comments

Comments

@value94
Copy link

value94 commented Nov 27, 2020

请问怎么在框架中使用page类,我使用ppspider中的page类的时候,一直连接失败,但是使用puppeteer 中的page类又一直报 type of parameter page should be a class, not an interface. 网上实在是找不到相关资料,热切期望您的回复

@value94
Copy link
Author

value94 commented Nov 27, 2020

let result = await AppleReg(job.key, job.datas, page);
export async function AppleReg(key, datas, page): Promise<TaskObject> { page = await initPage(page, datas._.proxy ? datas._.proxy.href : null, options); }
export async function initPage(page: Page, proxy: string, options?: object) { page.setDefaultTimeout(unlockConfig.waitTimeout); page.setDefaultNavigationTimeout(unlockConfig.navigateTimeout); await page.setUserAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36"); await page.setExtraHTTPHeaders({"Accept-Language": "zh-CN,zh;q=0.9"}); await page.authenticate(ProxyPool.dynamicProxyOnAuthority(proxy)); await setupProxy(page, Object.assign(options || {}, { localProxy: null, remoteProxy: proxy, waitTimeout: unlockConfig.networkTimeout, loadImage: unlockConfig.loadImage, enableCache: unlockConfig.enableCache })); return page; }

@xiyuan-fengyu
Copy link
Owner

框架会根据方法中的参数类型,通过反射机制从工厂类中实例化worker,看你提供的代码没有涉及到这一块,提供一下异常栈和更多代码我才好帮你

@value94
Copy link
Author

value94 commented Nov 27, 2020

image
使用的我重构的工厂类,

`import {Browser, LaunchOptions} from "puppeteer";
import {logger, Page, PuppeteerWorkerFactory} from "ppspider";
import {WorkerFactory} from "ppspider/lib/spider/worker/WorkerFactory";
import {ProxyChain, ProxyPool} from "./proxy-chain";

const puppeteer = require('puppeteer-extra');
const puppeteer_stealth = require('puppeteer-extra-plugin-stealth')
puppeteer.use(puppeteer_stealth());

export class UserWorkerFactory implements WorkerFactory {
private readonly browser: Promise;
private readonly proxyChain: ProxyChain;

constructor(launchOptions?: LaunchOptions, proxyPool?: string|any) {
logger.info("init puppeteer worker factory...");
const chain = ProxyPool.initProxyPool(proxyPool, true);
const proxy = chain ? "http://127.0.0.1:" + chain.listenPort : (proxyPool || {}).url;
if (proxy) launchOptions.args = (launchOptions.args || []).filter(s => !s.startsWith("--proxy-server=")).concat("--proxy-server=" + proxy);
// 解决iframe跨域情况下page.frames可能找不到某些iframe的bug
if (!launchOptions.args) launchOptions.args = [];
if (launchOptions.args.indexOf("--disable-features=site-per-process") == -1) launchOptions.args.push("--disable-features=site-per-process");
// 初始化puppeteer对象
this.browser = puppeteer.launch(launchOptions);
this.proxyChain = chain;
}

workerType(): any {
return Page;
}

get(): Promise {
return new Promise(resolve => {
this.browser.then(async browser => {
const context = await browser.createIncognitoBrowserContext();
const page = await context.newPage();
await PuppeteerWorkerFactory.exPage(page);
resolve(page);
});
});
}

release(worker: Page): Promise {
return worker.browserContext().close();
}

shutdown(): any {
if (this.proxyChain) this.proxyChain.close();
if (this.browser) this.browser.then(browser => browser.close());
}
}

`

@value94
Copy link
Author

value94 commented Nov 27, 2020

image
这是请求失败的异常栈

@value94
Copy link
Author

value94 commented Nov 27, 2020

现在的主要问题是,我不太清楚 ppspider 框架中是怎么获取然后使用这个page对象的, 之前用的无头模式是挺正常的, 然后我现在想用 puppeteer 中对页面进行自动化模拟测试, 您有自动化测试操作的示例么。我看现有的都是爬虫相关的示例

@xiyuan-fengyu
Copy link
Owner

工厂类看起来没有问题,应该是声明任务的地方,参数没有设置正确的类型,导致page没有实例化成功

@xiyuan-fengyu
Copy link
Owner

import {AddToQueue, Job, Launcher, OnStart, Page, PuppeteerWorkerFactory} from "../..";

class TestTask {

    @OnStart({
        urls: "https://www.baidu.com/"
    })
    @AddToQueue({
        name: "test"
    })
    // page: Page 中 Page 可以让框架知道woker的类型,然后到工厂类中找到 workerType() 等于 Page的,然后通过这个工厂类实例化woker;这里框架使用了typescript reflect-metadata提供的反射机制才能知道page的类型
    async index(job: Job, page: Page) {
        console.log(job.url);
        await page.goto(job.url);
        console.log(await page.evaluate(() => document.title));
    }

}

@Launcher({
    workplace: "workplace",
    tasks: [
        TestTask
    ],
    workerFactorys: [
        new PuppeteerWorkerFactory({
            headless: true,
            devtools: false
        })
    ]
})
class App {}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants