I.T. Security and Linux Administration

Aug 31 2012   5:14PM GMT

Python: Requests and fetching parts of data



Posted by: Eric Hansen
Tags:
security

Another Python tip for you.  In using the “Requests’ module, it allows you to iterate through content n bytes at a time.  This is quite useful when you’re dealing with large amounts of data per request (such as downloading 2GB files), as if you just call request.content it will store all the data into memory.  However, in the newest release of Requests (0.13.9 at the time of this writing), the functions iter_content() and iter_lines() (both which do about the same thing) do not work as expected out of the box.

When you make a call to get, put or post, the internal workings of the module end up setting some variables to tell the program that the content has already been read (which is what you’d expect when calling requests.content twice, for example).  This isn’t what you’d expect when attempting to call iter_content() the first time.

It also never helped that the bug reports on the GitHub page for Requests stated that the issue is fixed, when in actuality it really wasn’t.  It took a bit of digging around to realize what the cause was.

When you make a request (i.e.: requests.get(…)), the module has an option called ‘prefetch’.  What this does is tell Requests to either store all of the body content into memory (which essentially caches requests.content) when it’s set to True.

If you’re only using Requests to deal with small files (or at least ones that can fit in the realm of your memory) then you have nothing to worry about.  But, by default, prefetch is set to True (enabled).  This means that whenever you try to make a call to iter_*() functions, you’ll get a raised error stating that the content has already been consumed.

There’s two ways to possibly fix this: one is to set prefetch = False when calling the get/put/post requests, and another is to edit the actual requests’ “models.py” file.  The first option is the safest and guarantees that when updating the module, it works as intended.  The second option makes it so you don’t have to specify ‘prefetch’ on each request, but is more prone to being overwritten on updates (and thus reintroducing this same issue).

However, I have submitted a request to have prefetch changed to ‘False’ (the pull request is here: https://github.com/kennethreitz/requests/pull/828).

 Comment on this Post

 
There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when other members comment.

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to: