Ask Your Question

Special characters (accent, apostrophe, trema) work in custom Source tests, but no longer when deployed in dockerized Streamsets

asked 2019-01-22 08:03:47 -0600

nnuytten gravatar image

I've written a custom Streamsets origin. Some of the records contain characters like é or ë. When running my automated tests I can validate that the data is emitted as a list of SDC Records as intended.

When I use my custom origin in a pipeline on a dockerized Streamsets Data Collector however, all of those special characters are displayed in the UI (preview) and pushed to my Target as '?'.

Is Streamsets interpreting the output of my origin and applying some character encoding?

edit retag flag offensive close merge delete


It's most likely some difference between the two environments. Have you tried Data Collector on the same machine as your automated tests?

metadaddy gravatar imagemetadaddy ( 2019-01-22 11:39:00 -0600 )edit

Are you able to share your code (or at least some code that reproduces the problem) via GitHub or similar? I might be able to take a look in the debugger and see what's happening.

metadaddy gravatar imagemetadaddy ( 2019-01-23 15:42:32 -0600 )edit

Hey Pat, I was able to fix it today. Problem was as you indicated in the environment. Alpine and locale settings are terrible. I'll document it in the answer. Thanks for your pointer!

nnuytten gravatar imagennuytten ( 2019-01-23 16:10:04 -0600 )edit

1 Answer

Sort by » oldest newest most voted

answered 2019-01-23 16:12:48 -0600

nnuytten gravatar image

The problem was not in the custom origin or Streamsets at all, rather it was an issue with the Docker container itself. The official Streamsets container from which I inherit, is based on Alpine Linux. No locale support is installed by default, so the trick is to add it by yourself.

This post helped me out in installing it in my container and configuring the container. Afterwards, all worked as expected.

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower


Asked: 2019-01-22 08:03:47 -0600

Seen: 118 times

Last updated: Jan 23 '19