Ask Your Question
1

Error with Jython hmac package

asked 2018-05-03 09:51:46 -0500

Invigor gravatar image

updated 2018-05-03 16:01:09 -0500

Hi all,

I'm trying to use Streamsets to connect to an Azure Cosmos DB which requires an authorization token in the header of the REST API calls.

https://docs.microsoft.com/en-us/rest...

I've adapted the Python library to work in a Jython Executor but I'm getting an error when I try to create a new HMAC object.

As per the instructions, the key for the HMAC function is the base64 decoded version of the master key. This is a binary object but when I pass it into the HMAC function I get the following error.

com.streamsets.pipeline.api.base.OnRecordErrorException: SCRIPTING_04 - Script sent record to error: 'ascii' codec can't encode characters in position 1-3: ordinal not in range(128)
    at com.streamsets.pipeline.stage.processor.scripting.AbstractScriptingProcessor$Err.write(AbstractScriptingProcessor.java:71)

The error is in digest = hmac.new(key, body, sha256).digest().

Specifically, the key is causing the problem as it is not a normal ASCII string.

If I use ordinary ASCII text for the key, it works fine although the authentication token is incorrect. It seems to be refusing to accept the key as a binary object but if I run the same code using an online Jython compiler (2.7.1), it works correctly as per the example below.

from hashlib import sha256
import hmac
# import six
import base64

master_key = 'MM6C8BFQqxiDD28ZkoElDKPu2krIty328wwF0xrxRS3wNO0sKxGQFY7DuhHwKR8r0MBmn9zHt4K1CCeXiMk0Gg=='
verb = 'POST'
resource_type = 'docs'
database = 'visitor'
collection = 'manly'
x_ms_date = 'Thu, 3 May 2018 12:09:07 GMT'

resource_id_or_fullname = 'dbs/' + database + '/colls/' + collection
key = base64.b64decode(master_key)

# Skipping lower casing of resource_id_or_fullname since it may now contain "ID" of the resource as part of the fullname
text = '{verb}\n{resource_type}\n{resource_id_or_fullname}\n{x_ms_date}\n\n'.format(
    verb=(verb.lower() or ''),
    resource_type=(resource_type.lower() or ''),
    resource_id_or_fullname=(resource_id_or_fullname or ''),
    x_ms_date=x_ms_date.lower())

# python 2 support  
body = text.decode('utf-8')
digest = hmac.new(key, body, sha256).digest()
signature = digest.encode('base64')

master_token = 'master'
token_version = '2017-02-22'
authorization = 'type={token_type}&ver={ver}&sig={sig}'.format(token_type=master_token,
                                                ver=token_version,
                                                sig=signature[:-1])
print('authorization=',authorization)
print('key=',key)

I'm wondering if there is an issue with the Streamsets Jython package (which is 2.7.0) and whether it is possible to upgraded it?

Thanks,

Michael

edit retag flag offensive close merge delete

Comments

Do you know which line of the above code the error is thrown from?

metadaddy gravatar imagemetadaddy ( 2018-05-03 15:00:43 -0500 )edit

Hi @metadaddy, The error is in digest = hmac.new(key, body, sha256).digest(). Specifically, the key is causing the problem as it is not a normal ASCII string. Thanks, Michael

Invigor gravatar imageInvigor ( 2018-05-03 16:00:42 -0500 )edit

Looks like you might need to base64 encode the key then. If this works, let me know and I'll write it up as the official answer

metadaddy gravatar imagemetadaddy ( 2018-05-04 13:30:31 -0500 )edit

According to the Microsoft documentation, the master_key is already base64 encoded so I need to decode it before passing to the HMAC function [key = base64.b64decode(master_key)] Decoding the key results in non-ASCII characters so the HMAC function throws the error as it isn't able to handle them

Invigor gravatar imageInvigor ( 2018-05-04 19:50:52 -0500 )edit

1 Answer

Sort by ยป oldest newest most voted
0

answered 2018-05-07 15:56:08 -0500

metadaddy gravatar image

updated 2018-05-07 15:57:20 -0500

The core problem here seems to be that hmac wants a string as a key, but we want to pass it binary data. I'm not sure why it works in the Jython REPL but not in Data Collector - I spent an hour going round the houses on this until I realized we're in Jython, so we can use the Java crypto classes.

Here is your example using javax.crypto.spec.SecretKeySpec and javax.crypto.Mac - it works just fine in both the REPL and Data Collector:

from javax.crypto.spec import SecretKeySpec
from javax.crypto import Mac
import base64

master_key = 'MM6C8BFQqxiDD28ZkoElDKPu2krIty328wwF0xrxRS3wNO0sKxGQFY7DuhHwKR8r0MBmn9zHt4K1CCeXiMk0Gg=='
verb = 'POST'
resource_type = 'docs'
database = 'visitor'
collection = 'manly'
x_ms_date = 'Thu, 3 May 2018 12:09:07 GMT'

resource_id_or_fullname = 'dbs/' + database + '/colls/' + collection
key = base64.b64decode(master_key)

# Skipping lower casing of resource_id_or_fullname since it may now contain "ID" of the resource as part of the fullname
text = '{verb}\n{resource_type}\n{resource_id_or_fullname}\n{x_ms_date}\n\n'.format(
    verb=(verb.lower() or ''),
    resource_type=(resource_type.lower() or ''),
    resource_id_or_fullname=(resource_id_or_fullname or ''),
    x_ms_date=x_ms_date.lower())

# python 2 support  
body = text.decode('utf-8')

mac = Mac.getInstance('HmacSHA256')
mac.init(SecretKeySpec(key, 'HmacSHA256'))
digest = mac.doFinal(body)

signature = base64.b64encode(digest)

master_token = 'master'
token_version = '2017-02-22'
authorization = 'type={token_type}&ver={ver}&sig={sig}'.format(token_type=master_token,
                                                ver=token_version,
                                                sig=signature[:-1])
print('authorization=',authorization)
print('key=',key)

In Data Collector:

image description

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2018-05-03 09:51:46 -0500

Seen: 46 times

Last updated: May 07