Convert PyROOT TH1/TH2 to numpy arrays #392
Replies: 3 comments 6 replies
-
If you have histograms in a ROOT file, load one in Uproot and call If your objects are in-memory PyROOT objects, then to use this method, you'd have to save them to a file. Uproot only recognizes serialized ROOT objects. We had talked about making a PyROOT-to-Uproot bridge by temporarily serializing them in in-memory files, but that seems circuitous, and we haven't tried it yet. Another option is to write a C++ function with gInterpreter.Declare that calls GetBinContent in a compiled loop, filling an array. That would be a fast ROOT-only option. RDataFrame is only for non-binned data, like TTrees, so it didn't fit your case at all. |
Beta Was this translation helpful? Give feedback.
-
I've been nerd-sniped: I had to find out if the in-memory file method would work. This should probably be turned into an easier-to-use, high-level function in Uproot, but there could be sharp edges we haven't identified yet—it would be easier to find them by doing this manually than if it's wrapped up and hidden in a function. Suppose that we have a PyROOT object like this: import ROOT
import numpy as np
import uproot
h = ROOT.TH1F("h", "", 10, -5, 5) We can serialize the object into an in-memory file (no disk involved), like this: ROOT.gInterpreter.Declare('''
void copy_buffer_for_uproot(char* destination, TMessage& message) {
memcpy(destination, message.Buffer(), message.Length());
}
''')
message = ROOT.TMessage(ROOT.kMESS_OBJECT)
message.WriteObject(h)
buffer = np.empty(message.Length(), np.uint8)
ROOT.copy_buffer_for_uproot(memoryview(buffer), message) A ROOT TMessage is an in-memory buffer, as though it were a file, but it can be for a single object, not on disk, and not compressed, which will make it easier to read back in Uproot. The above implementation lets the TMessage manage its own buffer and copy it into a NumPy array, message = ROOT.TMessage(ROOT.kMESS_OBJECT)
buffer = np.empty(1000, np.uint8)
message.SetBuffer(buffer, len(buffer), False)
message.WriteObject(h) and then look at You might also be wondering why we had to define a Now that we have the raw bytes of a TH1F object in a NumPy class FakeFile(object):
def class_named(self, classname, version=None):
return uproot.class_named(classname, version=version)
fakefile = FakeFile()
chunk = uproot.source.chunk.Chunk.wrap(None, buffer)
cursor = uproot.source.cursor.Cursor(8)
h2 = uproot.deserialization.read_object_any(chunk, cursor, {}, fakefile, fakefile, None) The Why do we need a FakeFile? The deserialization function has to look up TStreamerInfo to know how to read a serialized class, for a specified version of that class. It first queries the file the object came from because ROOT files should ship with TStreamerInfos for all the classes they contain (that "should" is a complicated story). But in this case, the object does not come from a file, and the TMessage doesn't contain any TStreamerInfos, as messages are supposed to be small and lightweight. So we have to defer to any globally defined class models in If the class model isn't there, the above will fail with a DeserializationError—that's another rough edge, and a properly wrapped up function should check to be sure we have a class model for the exact version that this ROOT will write into the TMessage—it matters which ROOT is imported in >>> ROOT.TClass.GetClass("TH1F").GetClassVersion()
3 and Uproot recognizes version 3: >>> uproot.class_named("TH1F").known_versions
{3: <class 'uproot.models.TH.Model_TH1F_v3'>} That sort of thing could be automatically checked, though the error message would have a lot of explaining to do if ROOT writes version 2 and Uproot reads version 3! |
Beta Was this translation helpful? Give feedback.
-
FYI: this is starting to be implemented in PR #420. |
Beta Was this translation helpful? Give feedback.
-
Hi all,
As uproot is completely decoupled from ROOT (and therefore PyROOT) I acknowledge I am probably slightly off-topic but I could use some guidance.
I am working with PyROOT TH1s and TH2s and at some point need to convert back and forth to numpy arrays. Until now I was using a plain python loop with the usual
GetBinContent
, relatively slow but it was fine so far. Unfortunately I have now reached the point where I have to read few hundred TH2s that can have many bins and my computation time skyrocketed.I was using in the past
root_numpy.hist2array
and was happy with it I but could not in the current project because I also needed the bin errors which was not supported until recently. If that is my only option then I would go for it but asroot_numpy
is currently deprecated I am looking for something more stable.Uproot looks very interesting and I would gladly use it but from I understand there is no way to use a PyROOT object, I tried to dig a bit into uproot-method but I could not find a workaround.
The other alternatives that are proposed do not seem implementable either : converting from TTrees does not fit my purpose and the proposition to use RDataFrame as here completely puzzles me.
Could anyone provide some guidance as to what option seems more suitable, or whether I should stick with the deprecated root_numpy ? It would be very much appreciated.
Best,
Florian
Beta Was this translation helpful? Give feedback.
All reactions