Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Steven's Comments #23

Open
matthewfeickert opened this issue Jul 8, 2018 · 1 comment
Open

Steven's Comments #23

matthewfeickert opened this issue Jul 8, 2018 · 1 comment
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@matthewfeickert
Copy link
Member

matthewfeickert commented Jul 8, 2018

General

  • I'm not sure how picky you want to be, but some bulleted lists have each line end in a comma/semi-colon, while others have no line ending. I didn't change anything as I wasn't sure if you had a preferred style.

Section 2.1, first paragraph:

  • I'm not sure it is fair to say that it is even possible (ignoring whether it's tractable or not) to analytically calculate the detector response. Particle interactions with matter are an inherently probabilistic process, so there is really no deterministic way to do this. I would suggest changing "it is intractable to compute the detector response analytically" --> "it is inherently probabilistic and computing the detector response analytically would be intractable."

Section 2.2, second paragraph:

  • I find the following text a bit strong, and would change it as follows: "...and may be the only hope of performing the real-time reconstruction that enables real-time analysis in the first place" --> "...and look to be the best hope for performing suitably powerful real-time reconstruction to enable robust real-time analysis results."
  • Same paragraph, I think we may want to avoid "Level 1 trigger" and instead say "first stage of the trigger"

Sections 2.5 and 2.6:

  • In my opinion, the equations here (especially the first) should be removed and the text greatly simplified. I believe that the primary intended audience will not benefit from the equation and complex text, and would instead benefit from a more accessible higher-level description. It seems quite different from all of the preceding sub-sections which are much more easy to read for non-experts. Given my view here, I did not bother reading through it in detail for textual/grammatical fixes, as I think it can use a larger-scale revision.

Section 2.9:

  • I agree with this section in general, but I think it's important to point out that we do have ways to do this in specific cases. For example, if we use a regression to calibrate a given object, we can still evaluate the uncertainty using standard procedures. For jets for example, we can still balance a jet against a photon to derive the uncertainty on the jet energy scale. This of course doesn't work for all ML situations, but I think it's important to qualify the statement so the reader doesn't think we have been doing rubbish so far. I would suggest adding the following sentence immediately after the first sentence of the first paragraph: "While in some cases it is possible to exploit other methods to derive uncertainties on the outputs of a machine learning algorithm, this is not always possible, and the machine learning algorithms themselves do not currently have a robust means of assessing their own uncertainty." I would then remove "However" from the start of the current second sentence.

Section 2.11, paragraph 3:

  • I think "thousands of datasets" is really under-estimating it. There are already billions of files on the grid in ATLAS alone. Yes that's files not datasets, but I am sure that there are more than thousands of datasets between all of the collaborations. I would instead suggest "need to access thousands of datasets" --> "need to access an ever-growing number of datasets"

Section 3.2:

  • For some reason there is a "\clearpage" between the second and third paragraphs. Why is this? I didn't remove it just to be safe, but I imagine that we want to remove it. See line 30 of collaboration.tex. (resolved in commit e02f272)

Section 3.3, paragraph 1:

  • We should probably add a reference to the ongoing tracking challenge.

Section 3.4, paragraph 2:

  • It's great that CMS has done this, but do we have any references that we can cite showing that people are using it? If we want to point out that CMS has already done this, then it's important to also show that this is beneficial to the ML community, given that we are stating it is "very valuable" in the same paragraph.

Section 3.5, paragraph 2:

  • I've read the final sentence several times, and I'm not sure what it is trying to say. Is it saying that these are challenging to both industry and physics and so we can work together, or something else? I think it needs to be clarified, but as I'm not 100% sure of the intent, I didn't want to touch it.

Section 4.5.1, table 1:

  • It might be good to forward-refer to table 2 here to point out that ROOT is not as isolated as it seems thanks to the efforts of the HEP community. I wouldn't go into the details here, but something brief would I think be useful.

Section 4.5.2, paragraph 1:

  • Is write speed really important for ML? Obviously it is for recording and transferring our data in general, but for ML I think it's just the read speed we care about. The model will be frequently updated, but that's separate from the data format (and that's memory-resident). Am I forgetting something?

Section 5.1, paragraph 3:

  • I think "typical HEP application" and "up to 1 GPU-week" are contradictory. There are absolutely HEP applications that take up to 1 GPU-week, but I wouldn't call them the "typical HEP application" at the moment. I think it is important to stress the GPU-week side though, so I would suggest leaving that untouched and instead changing: "A typical HEP application" --> "HEP applications"

Section 5.4:

  • Paul should confirm, but I'm pretty sure that LHCb has been using bonsai BDTs in FPGAs for a while now. If so, I think it's important to also mention that here.

Section 5.6, paragraph 2:

  • PByte and EByte isn't saving much space. I would suggest directly writing petabyte and exabyte.
@srschramm
Copy link
Collaborator

Note: I don't think the very first one (consistency in ending bulleted lists with or without commas) is that critical. I listed it mostly for completeness. The other points here are much more important and I hope that most/all of them could be resolved before submitting to arxiv.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants