Skip to content
Niels ten Oever edited this page Jun 18, 2021 · 2 revisions

BigBang bi-weekly call June 18 - 9:00 ET, 13:00 UTC, 15:00 CEST

Present: Maxigas, Niels, Christoph, Sebastian, Nick Doty

Preliminary agenda:

    - Updates BigBang

- Packaging BigBang for web (maxigas)

- Google Collab

- Allowed for Python notebooks + extensions to syntax to specify installing packages from Pip

- Running on machines that has storage that can be used

- Problem: BigBang could not be installed because it was not in Pip

- Solution: Produced BigBang package for Pip

- Still some issues, but working on it over the summer

- Updates BigBang

- My only update would be that I made a list of IGF mailman archives (had to leave out many closed ones and some DCs that use Google groups). However, when I scrape them I get an error message that says it cannot find a mailing list name... (riccardo)

- Make issue? DONE

https://github.com/datactive/bigbang/issues/472

- As for 3GPP, I'm going on with the scraping and shared some archives with Christoph, who's doing the same at a much quicker pace than me. (riccardo)

- Provenance / data storage

- Christoph is close to scraping all the lists that are currently in examples/, which allows discovering new edge cases and errors. These can go into the same issue as above:

https://github.com/datactive/bigbang/issues/472

- Niels will look at recent updates to lists mm.icann.txt, mm.ietf.txt, etc, maybe even add RIPE lists [TODO]

- Knowledge Graphs (Niels)

Xue (Effy) Li has presented her work on affiliation and coreference extraction. She does this in Jypyter notebook. Paul and Effy committed to adding to BigBang

- Prototype Fund (Christoph)

- Germany works with paper :)

- Last Friday there was a meeting with the funding body, mostly on admin

- Need to setup a collaboration agreement among us, because formally will be an issue

- Funding Period will be from September 1 to December 31

- 3 ppl for 2 - 3 hours per week (420 hours, 50 euros per hour)

- Works as a fellowship

- Additional equipment can be funded

- The restriction is that the work would have to be done by people who live in Germany, so the money would have to go through Christoph.  We need to arrange this carefully.  We could do this by filling up some time sheets -- the key is to keep track of hours and works done -- the biweekly meetings are useful but probably not sufficient for this.

- AOB: Next IETF hackathon is in July 19-23: should we show up in numbers?

- People should register for the Hackathon

      - Mallory and maybe others are invited to next meeting.

Things to work on during funding period: https://docs.google.com/document/d/193FBZrV2xRGm2v-AkXzOMDWYaCI6tP9AcavWCQVWxKQ/edit

BigBang bi-weekly call June 4 - 9:00 ET, 13:00 UTC, 15:00 CEST

Present: Sebastian, Niels, Christoph, NP Doty, Riccardo

Preliminary agenda:

    - Updates BigBang

- Listserv 16.5 ingress [DONE]

- bin/collect_mail can collect mail from Listserv 16.5

- asks for login

- 3GPP 

- Mail archive / database - provenance [TODO]

- Niels provided server space (200 GB)

- DaaS

- Scrape everything once

- Repeat and refresh weekly / biweekly? (cron)

- git-lfs 

- there is code with provenance per crawl

- look at provenance standards

- https://commoncrawl.org/the-data/get-started/

- https://en.wikipedia.org/wiki/Web_ARChive

- CodeCov

- Repo has a token connecting it to CodeCov

- Everytime someone wants to merge to main, it runs through tests

- CodeCov checks before merge

- Upload badge to the repo site indicating lines of code that are tested

- Indicates that we should go back to the mailman file. 

- Main repo page looks better

- Readme could use a clean up [TODO]

- 

- Wiki

- Add meeting notes to Wiki [TODO]

- Web version of BigBang 

- Maxigas has been working on this

- Will report to the mailinglist on where he has gotten

- Interesting Case study

- DNS over HTTPS (DoH)

- Prototype Fund

- We won! (300 submissions)

- Administrivia being handled by Christoph, also project leader for this

- One member of each group will be attending, this will be Christoph

- 6 months grant

- 420 (?) hours of development time (50 euros per hour)

- Limit is 50k. 

- 5% of the final sum needs to be self-financed by the team (Niels' time)(both Paul Groth and Stefania Milan can sign off on this from Universit of Amsterdam side)

- IAB workshop

- November 29 - December 3 IAB workshop

- In-person participation of BigBang team

- https://docs.google.com/document/d/1N9j4-jeWnW1BqhNxI_ZycRsZsf0o8UtBvtAOk3j-t6Y/edit?usp=sharing

- Giganet workshop

- https://enquetes.univ-rennes2.fr/limesurvey/index.php/927992?lang=en

- AOB

- Riccardo has got all data scraped

- Should we add IGF data [TODO]

- Should we add RIPE mailinglist data [TODO]

- Should we add a mailinglist list scraper to include new mailinglists from the respective organizations - or do it manually?

- How do we make it easier to select data sources for analysis in notebooks, especially now we'll have a central data repository

- Network analysis does not work for 3GPP because there is no clear reply-to

- Riccardo creates list of 3GPP mailing lists that are most important for his research

BigBang bi-weekly call May 7 - 9:00 ET, 13:00 UTC, 15:00 CEST

Present: Sebastian, Niels, Christoph, NP Doty, Colin Perkins, Corinne Cath-Speth

Preliminary Agenda:

    - Updates BigBang

- Release 0.3 of BigBang

- Milestones for next version 0.4 are set in the BigBang datatracker

- https://github.com/datactive/bigbang/milestone/7

- Niels and Maxigas are working on web environment

    - Updates Sodestream

- Writing things up for a submission to IMC (Internet Measurement Conference) (LaTeX, graphs, etc)

- Responding to a previous paper with metric about RFC deployment

- Predicting probablity of deployment based on mailinglist engagement

    - Inform people about Giganet Workshop https://enquetes.univ-rennes2.fr/limesurvey/index.php/927992?lang=en

    - Update workshop IAB

https://docs.google.com/document/d/1N9j4-jeWnW1BqhNxI_ZycRsZsf0o8UtBvtAOk3j-t6Y/edit

 On the agenda for discussion with IAB on May 19


    - BigBang core developers list (npd)

- npd proposed to add Niels and Christoph


    - Next IETF hackathon will be held July 19-23, 2021 - let's participate and set agenda


    - Upcoming work:

- Webversion of BigBang + Jupyter notebook op law enforcement egangement in 3GPP (Maxigas and Niels)

- tenure / retention calculations

- organizational affiliation

- autodocs/documentation

- wiki on how to collect data from each datasource

BigBang bi-weekly call April 23rd 15:00 CET, 10:00 AM ET, 14:00 GMT.

Present: Sebastian, Niels, Christoph, NP Doty,

Preliminary Agenda: - Prototype Fund: Proposal sent - will hear back in June - 3GPP ingress status: PR was merged, with this normal ingress comes closer, should be done this weekend. - URL list status: Fixed with new PR. - Update IAB workshop: https://docs.google.com/document/d/1N9j4-jeWnW1BqhNxI_ZycRsZsf0o8UtBvtAOk3j-t6Y/edit - Triaging version 0.2.1 && 0.3 on issue tracker

BigBang bi-weekly call April 9th 15:00 CEST, 09:00 AM EST, 13:00 GMT.

Present: Stephen, Sebastian, Niels

Preliminary agenda:

    - Discuss webconference with Sodestream project by St. Mary's and Glasgow U

https://csperkins.org/research/protocol-standards/2020-12-10-ignacio-iesg-talk/2020-12-10_IESG-50-years-IETF-send.pdf\

https://sodestream.github.io/impact-of-early-engagement-on-longevity-of-ietf-participation.html

- Agree on joint meetings (once a month)

Start with the one in four weeks (May 7)

- Discuss IAB workshop (Niels will draft text)

- Participation

- Stickiness

- What are patterns of success and patterns of embedding?

- Gender

- Colleagues?

- Affiliation?

- Diversity per affiliation

- Replies?

- Are people more involved if people respond?

- Or if they get published

- Do individuals or organizations have more staying power?

- Is language more conherent and consistent per affiliation?

- Discuss participation in IETF Hackathon

- Joint project / table

- Triaging version 0.2.1 && 0.3 on issue tracker

- AOB

BigBang bi-weekly call 15:00 CET, 10:00 AM ET, 14:00 GMT.

Present: Stephen McQuistin, Sebastian Benthall, Niels, Christoph, npdoty

https://uva-live.zoom.us/j/6365963924

Preliminary Agenda - Open PRs

        - No open PRs, all PRs are commented and are awaiting review

    - BigBang Web Version + use in seminar Maxigas

        - What is a good dataset + question for BA students?

        - Notebook on Cohort Vizualization

- IETF: Influx during TLS1.3 ?

- IETF: Influx during/after IPv6 / OSI debates ?

- ICANN: Influx during IANA transition ?

- Diversity notebooks from Nick + Nicks PhD ?

- Mapping debates around privacy

    - Update Prototype Fund

        Christoph is working on it - making good progress.

    - Workshop Giganet

        - Organizing is underway for workshop on methods for studying standards setting

    - Workshop IAB ?

        - Discussions are ongoing, also part of the call that Stephen is organizing

- Particular question of interest seemed to be stickiness of participation

- could look at stickiness as a function of gender (for Cath's research)

    - Development Roadmap

        - To send to ARTICLE19, maybe funding for future hackathons.

 - Maybe build on roadmap in Prototype fund application

- Maybe submit BigBang to the Journal of Open Source Software (JOSS)

- think about what we would want in a baked 1.0 package

    - AOB





BigBang Hackathon March 1 9:00 AM ET




BigBang Bi-Weekly Call Feb 26 15:00 CET

Present: S. Benthall, NP Doty, Christoph Becker, Riccardo Nanni, Paul Groth, Effy Xue Li, Juliana, Sandra Braman, Stephen McQueen

Agenda

  1. Update Listserv 16.5 ingress (Christoph)

    Test ingress is working

    Will work to integrate in BIN

  2. Update connection Conversation KG (Christoph and Paul)

    Might be an idea to merge ConversationKG with BigBang

  3. Update Hackathon March 1 - 3 (Sebastian) https://data-activism.net/2021/02/bigbang-sprint-at-ietf110-hackathon/

    Tools / Times / Objectives

    Where are the loci of power in the different standards bodies, as a function of affiliation

    Leadership - affiliation

    Standards - authors affiliation

    What are the trends + topics

    Email contributions - author affiliation

    What are the trends + topics

    over time!

    Do actors have concentrated efforts?

    Which actors are working conjunction?

    Who are competing / in contestation?

    Which companies are dominating in which standards body?

    yes: IEEE, IETF, 3GPP,

    we want ITU, 1M2M

    no: ICANN, RIPE

    To what extend is their standards participation effective?

    How do we measure deployments / standards uptake

    Procurement documents?

    IETF mailing list dashboards of activity

    diversity an active discussion inside IETF right now

    new IETF chair Lars Eggert may be interested

    comparing analyses between IETF and other SDOs like 3gpp (Niels)

    kickoff hackathon meeting 9am US Eastern on Monday

  4. Update Prototype Fund (Christoph and Niels)

    https://prototypefund.de/en/

    (another funding opportunity is: https://www.standict.eu/standicteu-2023-2nd-open-call)

  5. New work:

    Decidim

    ITU standards

    1M2M standards + conversations

  6. AOB

Chat history: 15:01:19 From Niels ten Oever To Everyone : https://pad.riseup.net/p/BIGBANGnotes 15:02:38 From Riccardo Nanni To Everyone : Hi everyone! 15:02:47 From Niels ten Oever To Everyone : https://pad.riseup.net/p/BIGBANGnotes 15:06:43 From Riccardo Nanni To Everyone : Same here, Juliana! 15:07:08 From Juliana To Everyone : great :) 15:08:21 From ChristophB To Everyone : https://github.com/datactive/bigbang/blob/master/tests/test_listserv.py 15:09:14 From Braman, Sandra To Everyone : How does BigBang differ from software suites like that offered by Provalis, that supports the use of multiple quantitative and qualitative methods on the same corpus (which can be VERY large), other than that it eases bringing texts from particular mailing lists into the software? What features does BigBang offer that isn't out there in commercially available software? 15:10:14 From Sebastian Benthall To Everyone : Sandra: Short answer is that BigBang does not currently support of content analysis like Provalis. 15:10:47 From Braman, Sandra To Everyone : So what does BigBang offer that goes BEYOND things like Provalis ProSuites? 15:11:31 From Sebastian Benthall To Everyone : I'm afraid I'm not conversant with the full feature offerings of Provalis ProSuites, which is required to answer your question fully. 15:13:13 From Nick Doty To Everyone : I think most of our bigbang work has been on obtaining mailing list (and some github) data, so that it can be analyzed using social network analysis tools, or some early metadata/content analysis. but it could be that the next step is piping all that crawled content into text analysis software 15:14:15 From Braman, Sandra To Everyone : Things being described as what BigBang does, such as modeling topics, etc., are all doable within several different existing commercial packages. The metadata analysis Paul Groth is now talking about is not. 15:14:24 From Nick Doty To Everyone : what was the JSON format being discussed? 15:14:37 From maxigas To Everyone : I am taking notes on the pad, currently line 53 15:15:14 From Paul Groth To Everyone : https://github.com/INDElab/conversationkg - end of the readme 15:15:15 From maxigas To Everyone : Nick: look at the bottom of https://github.com/INDElab/conversationkg

15:15:22 From maxigas To Everyone : exactly :) 15:15:45 From Niels ten Oever To Everyone : @npdoty at the bottom of this page: https://github.com/INDElab/conversationkg 15:16:34 From Nick Doty To Everyone : thanks for the links! is the json helpful for storage beyond the .mbox format? 15:18:16 From Paul Groth To Everyone : Yeah - because we can important from different sources other than emails 15:19:06 From maxigas To Everyone : In terms of use cases I would also like to make BigBang useful and available for students in the Media Studies department, for courses and summer/winter schools. 15:19:51 From Juliana To Everyone : what time? 15:20:07 From maxigas To Everyone : Maybe I would drop in but would not commit at the moment. 15:20:15 From Nick Doty To Everyone : yeah, that makes sense, @Paul (comparable from different sources), thanks 15:20:46 From Paul Groth To Everyone : @Nick - yep. I’d like to do for example IIRC chats 15:20:48 From Braman, Sandra To Everyone : Will check my schedule to see if can join for at least a while. Am indeed very interested to learn more about this. Responses to my questions here today have been convincing. 15:21:06 From Nick Doty To Everyone : https://trac.ietf.org/trac/ietf/meeting/wiki/110hackathon 15:21:18 From Nick Doty To Everyone : has times listed in UTC, but I think it’s a 24-hour thing 15:22:31 From Nick Doty To Everyone : is there a synchronous chat that different hackathon groups are using? 15:22:44 From maxigas To Everyone : I am writing this down on line 117 of the pad. 15:23:05 From Niels ten Oever To Everyone : Thanks - can be at 18 :) 15:23:36 From maxigas To Everyone : Yes, an agenda for subtopics of the hackathon would be useful. 15:26:14 From Braman, Sandra To Everyone : Thanks, all. Need to go. 15:26:56 From Paul Groth To Everyone : @ChristophB - I think the best bet is for us (me, Effy) to use BigBang and see where we can best integrate or use a package. We could also talk to you maybe separately and see where the integration is. 15:27:55 From Paul Groth To Everyone : @ChristophB - that was a question :-) 15:28:31 From ChristophB To Everyone : @PaulG - yes ;-) Will write you a sep. Email with times that are best for me 15:28:51 From Paul Groth To Everyone : thanks 15:30:45 From maxigas To Everyone : How students use tools in the Media Department is through a web interface (like Flask), so the software can run on our servers and users only need a browser. Of course, this is only for the most typical use cases. 15:32:35 From Nick Doty To Everyone : @maxigas, do you have capability for jupyter notebooks to run on servers, so that students can write python research code but it’s all running on your servers? 15:36:17 From maxigas To Everyone : No, because few students can actually code beyond basics, but I would like to work in the department to change that. This is the Humanities Faculty! I am running tutorials where we look at this stuff. 15:38:45 From Niels ten Oever To Everyone : Blue: https://ikiwiki.laglab.org/5g/blue5g-small.png 15:39:52 From Paul Groth To Everyone : @maxigas - you can contact me - I think we have services within the UvA for this already or at least a bunch of folks who want to support you for hosted notebooks 15:40:36 From maxigas To Everyone : Thanks Paul, I will do that early next week! 15:40:55 From Nick Doty To Everyone : IETF in the past has expressed interest in dashboard/analytics of mailing lists, to visualize what conversation is going on where 15:42:20 From maxigas To Everyone : Yeah, so ultimately there should be a web interface, even if that is so uncool/unflexible. That would enable a whole new class of users to benefit from what BigBang has to offer.. 15:42:36 From Juliana To Everyone : @maxigas the tutorials collected you mention are publicly available? (I come from humanities, so would be useful) 15:43:46 From maxigas To Everyone : You can drop me a mail on [email protected] and i am happy to share them. It is intense, since we look at the culture of code while also learning to code. So it has the Turing papers, history of computing, and Python coding. 15:44:52 From Juliana To Everyone : Sounds very good, thanks. 15:45:10 From ChristophB To Everyone : https://prototypefund.de/en/ 15:45:51 From Stephen McQuistin To Everyone : https://www.standict.eu/standicteu-2023-2nd-open-call 15:45:58 From Stephen McQuistin To Everyone : ^ also possibly relevant funding call 15:46:42 From Nick Doty To Everyone : (we don’t live in Germany, so I’m not sure whether the rest of us can apply) 15:46:52 From Niels ten Oever To Everyone : only Christoph can 15:49:16 From Paul Groth To Everyone : https://mybinder.org 15:50:11 From Paul Groth To Everyone : https://notebooks.gesis.org 15:51:30 From maxigas To Everyone : I added 3GPP standards to the Data ingest targets at the bottom of the pad. I am interested to get that working at some point. 15:51:38 From Paul Groth To Everyone : For the conversationKG I don’t think we are ready for a Hackathon next week 15:54:58 From Niels ten Oever To Everyone : Did Paul just call it BugBang? ahaHHAha 15:55:20 From Juliana To Everyone : I have to leave now, but thanks. I'll try to join next week and better figure out my questions and interests 15:55:34 From Juliana To Everyone : Have a nice day everyone!

BigBang Bi-Weekly Call Feb 12 15:00 CET

Present: Niels ten Oever (UvA), Nick Doty (UC Berkeley), Sebastian Benthal (NYE School of Law), Christoph Becker (Durham University), Stephen McQuister (Glasgow U)


* Niels worked on IETF, RIPE, ICANN with BigBang, now aiming to do 3GPP

* Nick worked on W3C

* Seb started BigBang, used BigBang for open source communities

* Stephen works on social decision making at IETF


# 1. 3GPP - IEE


PR by Christoph to create classes for analysis, als PR to import listserv 16.5 archives.


Seb is testing + reviewing it - some errors on new dependencies? Issues with BeautifulSoup versioning?


- access code for the 3GPP ListServ: this is available to everybody who registers.


- NEXT STEP: integrate into the normal ingress framework.
  • Use the hackathon to make the codebase and the documentation more user friendly.

2. Clean-ups

Clean ups with variables, FLAKE, and some other things. 

3. ConversationKG

https://github.com/INDElab/conversationkg

Aim is to create output in their data format.

   - Feb 26: Next meeting with INDELab (?)

      - can we demonstrate integration with their tool

      - Christoph?

Starting with the metadata from mailing list headers but focusing on the body text. The current work is on making BigBang compatible with the JSON format used by ConversationKG (see bottom of the link above). Maybe a smaller meeting on the interface would be useful.

Alternative idea: should we drop ConversationKG and integrate the analysis code into BigBang?

Maybe a thing to do would be see what analytics we can do in big bang and then conversation kg in a juypter notebook.

[+1 from npdoty]

4. University Glasgow update

- Working on ingress

- Trying to add on information to mailinglist dataset - not sure where to bring all together yet


goal is bringing together datasets (email, datatracker - primarily)


working on caching access to the datatracker, local operations are cached, which makes it faster and puts less stress on the ietf servers


larger goal is investigating decision-making, for the publication as rfc

trying to understand the social decisions made along the path before publication

identifying individuals who should be aware of a draft earlier in its lifecycle, etc.

looking at all the datatracker data prior to rfc publication, and then heuristics for whether it's successfully published, and maybe about actual adoption of the protocol itself


https://ieeexplore.ieee.org/document/7949061

(a project of manually tagged standards and whether they were adopted)


many different ways to define success, but the easiest is "did it get published", since many drafts don't make it to publication


goal: connect revisions to a draft to the mailing list discussion that proposed text, to see who proposed text that ended up in the drafts (to see influence)


email address is a useful key for matching


datatracker includes multiple email addresses over time for a single person

and ietf authorship data includes affiliation, but not very complete for all users

and ietf meeting attendance data might have better affiliation


seb: bigbang has ietf attendance crawling/data ingest, but this isn't well-documented yet.


4b. IPR

stephen: ietf datatracker also has IPR disclosures


niels: other SDOs also have standards-essential patents and disclosures


niels: lit review on standards and patents


seb: interest in future of work funding project, which might have connection to patents
  1. IETF BigBang hackathon project (1st week of March)

https://www.ietf.org/how/runningcode/hackathons/110-hackathon/

Now: Setting priorities.

  • Uses:
    • Niels - 3GPP
    • Stephen/Glasgow :
      • ingress bigbang / IETF IMAP

      • merging data

      • ietfdata + email data

TODO: set sync times in the context of time zones of participants. US Eastern Time? An agenda with subtopics and stand-up meetings where everyone participates would be useful. Some of these are things like functionality/development; use cases/research questions, etc. There are many ideas for the hackathon, lot to chew on. These range from operationalising research questions to improving the documentation.

TODO: Seb to send structured proposal for work items, priorities, scheduling, for the Hackathon.

  1. AOB

  2. TODO

  • listserv 16.5 scraper (not just file archive import) (with AUTH)
  • create export in ConversationKG format

Design Discussions: - Data ingest (current or imagined)

     - mbox 

- from local file

- scraped and downloaded from pipermail

- w3c list (scraped from site)

- git repositories

- IETF attendence records (scraped)

     - IETF drafts (from ietfdata)

- listserv [new] 

- local file

- #409 - scraped from website - need to authenticate the data to get the email addresses

- [Future?] GitHub issues

- [Future?] Discourse

- [Future?] Patents

- [Future?] 3GPP standards (found on their FTP server in Word98 format...)


 - Output

- mbox

- csv (for ingest into Excel, etc.)

- [Future?] ConversationKG format?


- Onboarding issues
  - BigBang as a Service? (web interface)
  - Hosted Jupyther notebooks (for instance in University of Amsterdam)
  - Windows workflow (Ricardo was trying to do this)

What is actionable?