You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description
The trigger_batch and trigger functions in the library are using sys.getsizeof(event['data']) to measure the size of the event data. However, sys.getsizeof() returns the size of the object in memory, which includes overhead and doesn't accurately represent the actual byte size of the data when encoded for transmission over HTTP. This can lead to inconsistencies and false positives when checking against the 10KB limit, resulting in ValueError: Too much data exceptions even when the data is within acceptable limits.
self, POST, "/apps/%s/batch_events"%self.app_id, params)
Steps to Reproduce:
Prepare a batch of events with a good number of non-ascii characters where the data size is below 10KB when encoded in UTF-8.
Use the trigger_batch function to send the batch.
Observe that a ValueError is raised despite the data being within the size limit.
Upon modifying the trigger_batch function to add some logging as follows:
@request_methoddeftrigger_batch(self, batch=[], already_encoded=False):
"""Trigger multiple events with a single HTTP call. http://pusher.com/docs/rest_api#method-post-batch-events """ifnotalready_encoded:
foreventinbatch:
validate_channel(event['channel'])
event_name=ensure_text(event['name'], "event_name")
iflen(event['name']) >200:
raiseValueError("event_name too long")
event['data'] =data_to_string(event['data'], self._json_encoder)
print("---- EVENT SIZE DETAILS ----")
print("len(event['data'])")
print(len(event['data']))
print()
print("sys.getsizeof(event['data'])")
print(sys.getsizeof(event['data']))
print()
print("len(event['data'].encode('utf-8'))")
print(len(event['data'].encode('utf-8')))
print()
ifsys.getsizeof(event['data']) >10240:
raiseValueError("Too much data")
ifis_encrypted_channel(event['channel']):
event['data'] =json.dumps(encrypt(event['channel'], event['data'], self._encryption_master_key), ensure_ascii=False)
params= {
'batch': batch}
returnRequest(
self, POST, "/apps/%s/batch_events"%self.app_id, params)
I get the following output:
---- EVENT SIZE DETAILS ----
len(event['data'])
5778 <-- Length of the string
sys.getsizeof(event['data'])
5827 <-- In-memory object size
len(event['data'].encode('utf-8')).
5778 <-- Actual byte-size of the data
---- EVENT SIZE DETAILS ----
len(event['data'])
5671 <-- Length of the string
sys.getsizeof(event['data'])
11416 <-- In-memory object size
len(event['data'].encode('utf-8'))
5673 <-- Actual byte-size of the data
ERROR:root:Too much data
Notice how the result of sys.getsizeof and the UTF-8 encoded byte size is drastically different for the last event just because it contains one non-ascii character (≥).
Expected Behavior:
The function should allow sending event data that is under the 10KB limit when encoded, without raising an exception.
Actual Behavior:
A ValueError is raised stating "Too much data" even when the actual encoded data size is under 10KB.
Analysis:
Using sys.getsizeof() is not reliable for measuring the size of the data to be sent over the network. It measures the memory footprint of the object in Python, which can include additional overhead and doesn't correspond to the actual size of the encoded data.
Here is some more proof on how sys.getsizeof can be wildly inaccurate for calculating the byte size of data:
Proposed Solution:
Replace the size check using sys.getsizeof(event['data']) with len(event['data'].encode('utf-8')) to accurately measure the byte size of the data when encoded.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you'd like this issue to stay open please leave a comment indicating how this issue is affecting you. Thank you.
Description
The
trigger_batch
andtrigger
functions in the library are usingsys.getsizeof(event['data'])
to measure the size of the event data. However,sys.getsizeof()
returns the size of the object in memory, which includes overhead and doesn't accurately represent the actual byte size of the data when encoded for transmission over HTTP. This can lead to inconsistencies and false positives when checking against the 10KB limit, resulting inValueError: Too much data
exceptions even when the data is within acceptable limits.pusher-http-python/pusher/pusher_client.py
Lines 117 to 143 in 239d67b
Steps to Reproduce:
Upon modifying the
trigger_batch
function to add some logging as follows:I get the following output:
Notice how the result of
sys.getsizeof
and the UTF-8 encoded byte size is drastically different for the last event just because it contains one non-ascii character (≥
).Expected Behavior:
The function should allow sending event data that is under the 10KB limit when encoded, without raising an exception.
Actual Behavior:
A ValueError is raised stating "Too much data" even when the actual encoded data size is under 10KB.
Analysis:
Using
sys.getsizeof()
is not reliable for measuring the size of the data to be sent over the network. It measures the memory footprint of the object in Python, which can include additional overhead and doesn't correspond to the actual size of the encoded data.Here is some more proof on how
sys.getsizeof
can be wildly inaccurate for calculating the byte size of data:Proposed Solution:
Replace the size check using
sys.getsizeof(event['data'])
withlen(event['data'].encode('utf-8'))
to accurately measure the byte size of the data when encoded.Additional Information:
The text was updated successfully, but these errors were encountered: