You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
for record in ArchiveIterator(writer.get_stream()):
AttributeError: 'WARCWriter' object has no attribute 'get_stream'. Did you mean: '_iter_stream'?
importos.pathimporthashlibfromwarcio.capture_httpimportcapture_httpfromwarcio.archiveiteratorimportArchiveIteratorimportrequests#https://github.com/webrecorder/warcio#writing-warc-recordsfrombs4importBeautifulSoup#https://gist.github.com/edsu/62bc39890806ffd19b597186a3619419OUTPUT_PATH='output/'defcache_and_return_bs(url):
ifurl_already_retrieved(url):
raiseException(url+' already there')
withcapture_http(get_output_filename(url),warc_version='1.1') aswriter:
#TODO: do we want to try to append to a single file?requests.get(url)
forrecordinArchiveIterator(writer.get_stream()):
ifrecord.rec_type=='response':
returnBeautifulSoup(record.raw_stream)
defget_output_filename(url):
returnOUTPUT_PATH+hashlib.sha256(url.encode()).hexdigest()
defurl_already_retrieved(url):
returnos.path.isfile(get_output_filename(url))
if__name__=='__main__':
print(cache_and_return_bs('https://example.org'))
I narrowed this down to specifying a filename in the writer object - if I don't do this, the get_stream method exists
The text was updated successfully, but these errors were encountered:
Yes, please leave it open, this is not the only place where we have a lack of clarity about streaming vs files.
voltagex
changed the title
capture_http writer with filename has no get_stream methood
Documentation: Clarify that capture_http writer with filename has no get_stream methood
Apr 26, 2022
I'm using Python 3.10.4 and warcio 1.7.4
Using a piece of code based on https://github.com/webrecorder/warcio#writing-warc-records, I'm getting
I narrowed this down to specifying a filename in the writer object - if I don't do this, the get_stream method exists
The text was updated successfully, but these errors were encountered: