Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generate csv file #1

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

generate csv file #1

wants to merge 2 commits into from

Conversation

Jessicawyt
Copy link
Collaborator

Parsed all the data from 3 table from the website. Overall the format is clean but I'm struggling to get rid of the extra spaces in Owner Mailing /Contact Address column. I tried replacing the non-breaking space with a regular space first and trim the space, but it didn't work for me. I'd love to learn the way to fix that!

Copy link
Owner

@wijohnst wijohnst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really great! I was especially impressed with the scraping method you wrote. I still wasn't able to figure that part out so it was great to see your solution. Left a question and one comment regarding cleaning up the iteration in the scraping method. I think reaching for a higher order array method for iterative tasks will be viewed more favorably than a loop. I think you'd likely see a similar ask from engineers on a larger project and could be some great practice with those higher order methods. As always let me know if you have questions!! Great job!

async function scrapeData(page) {
// Find all the elements with a className of 'DataletSideHeading'...
// ... loop through to add the key/value pair into the resultsArray
const resultObj = await page.$$eval('.DataletSideHeading', (titles) => {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious about the double $ for the $$eval handler. I've seen $eval but not $$eval. What's the difference here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my understanding, $eval selects a single html element or the first element with the same identifier. While $$eval selects multiple html elements.

index.js Outdated
// ... loop through to add the key/value pair into the resultsArray
const resultObj = await page.$$eval('.DataletSideHeading', (titles) => {
let result = {};
for (let i = 0; i < titles.length; i++) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to refactor this to a forEach? Or perhaps reduce since we aren returning a result object?

const result = titles.reduce(...)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or perhaps:

const resultsArray = titles.reduce(...)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reasons why I chose a plain for loop is that the array methods that I'm familiar with return an array but I wanted an object. The other reason is that methods like map or forEach can not allow me to flag the next iteration( this is specifically for combining the column Owner Mailing / and the column Contact Address). Besides I don't have too much experience with array methods so I'll definitely look into reduce() ! Thanks for that!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think reduce is the right approach here. Let's work together on an implementation?

@@ -4,12 +4,14 @@
"description": "",
"main": "index.js",
"scripts": {
"test": "echo \"Error: no test specified\" && exit 1"
"test": "echo \"Error: no test specified\" && exit 1",
"start": "nodemon index.js"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

smort. 👍🏻

Copy link
Collaborator Author

@Jessicawyt Jessicawyt Jun 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best code ever. Worth every character space!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants