Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Trouble transforming US Census API data into Avro

Hello,

I have a pipeline that ingests JSON formatted data from the US Census API. Here is the API call I am making for reference: https://api.census.gov/data/2018/pep/population?get=GEONAME,POP,DENSITY&for=place:&in=state:39&in=county:&key=4dc4a831c4ab825b649451a82890fc68bc7fe976

I am using an HTML origin configured to ingest JSON array of objects : image description

My goal is to use Schema generator to create an Avro formatted dataset to be ingested into a Kafka Producer as seen below:

image description

I have ingested JSON data before, but the results were returned in a LIST-MAP format, so there was no issue with my pipeline configuration. Now my data is returning in LIST format and I am unsure on what steps are needed in order to proceed.

image description

I am not sure if it is helpful, but some language from the Census website states that their JSON format is non-traditional: The Census uses a nonstandard version of JSON that is streamlined:  Data are represented in a two-dimensional array  Square brackets [ ] hold arrays  Values are separated by a , (comma).

Any help is very much appreciated. Thank you.

Trouble transforming US Census API data into Avro

Hello,

I have a pipeline that ingests JSON formatted data from the US Census API. Here is the API call I am making for reference: https://api.census.gov/data/2018/pep/population?get=GEONAME,POP,DENSITY&for=place:&in=state:39&in=county:&key=4dc4a831c4ab825b649451a82890fc68bc7fe976https://api.census.gov/data/2018/pep/population?get=GEONAME,POP,DENSITY&for=place:*&in=state:39&in=county:*&key=4dc4a831c4ab825b649451a82890fc68bc7fe976

I am using an HTML origin configured to ingest JSON array of objects : image description

My goal is to use Schema generator to create an Avro formatted dataset to be ingested into a Kafka Producer as seen below:

image description

I have ingested JSON data before, but the results were returned in a LIST-MAP format, so there was no issue with my pipeline configuration. Now my data is returning in LIST format and I am unsure on what steps are needed in order to proceed.

image description

I am not sure if it is helpful, but some language from the Census website states that their JSON format is non-traditional: The Census uses a nonstandard version of JSON that is streamlined:  Data are represented in a two-dimensional array  Square brackets [ ] hold arrays  Values are separated by a , (comma).

Any help is very much appreciated. Thank you.

Trouble transforming US Census API data into Avro

Hello,

I have a pipeline that ingests JSON formatted data from the US Census API. Here is the API call I am making for reference: https://api.census.gov/data/2018/pep/population?get=GEONAME,POP,DENSITY&for=place:*&in=state:39&in=county:*&key=4dc4a831c4ab825b649451a82890fc68bc7fe976

I am using an HTML origin configured to ingest JSON array of objects : image description

My goal is to use Schema generator to create an Avro formatted dataset to be ingested into a Kafka Producer as seen below:

image description

I have ingested JSON data before, but the results were returned in a LIST-MAP format, so there was no issue with my pipeline configuration. Now my data is returning in LIST format and I am unsure on what steps are needed in order to proceed.

image description

I am not sure if it is helpful, but some language from the Census website states that their JSON format is non-traditional: The Census uses a nonstandard version of JSON that is streamlined:  streamlined:

  • Data are represented in a two-dimensional array  array
  • Square brackets [ ] hold arrays  arrays
  • Values are separated by a , (comma).

Any help is very much appreciated. Thank you.

Trouble transforming US Census API data into Avro

Hello,

I have a pipeline that ingests JSON formatted data from the US Census API. Here is the API call I am making for reference: https://api.census.gov/data/2018/pep/population?get=GEONAME,POP,DENSITY&for=place:*&in=state:39&in=county:*&key=4dc4a831c4ab825b649451a82890fc68bc7fe976

I am using an HTML origin configured to ingest JSON array of objects : image description

My goal is to use Schema generator to create an Avro formatted dataset to be ingested into a Kafka Producer as seen below:

image description

I have ingested JSON data before, but the results were returned in a LIST-MAP format, so there was no issue with my pipeline configuration. Now my data is returning in LIST format and I am unsure on what steps are needed in order to proceed.

image description

I am not sure if it is helpful, but some language from the Census website states that their JSON format is non-traditional: The Census uses a nonstandard version of JSON that is streamlined:

  • Data are represented in a two-dimensional array
  • Square brackets [ ] hold arrays
  • Values are separated by a , (comma).

Any help is very much appreciated. Thank you.

UPDATE:

Work in progress script..

for record in records: try: keys = record.value[0] index = 1 for values in record.value: mapped = dict(zip(keys,record.value[index])) index = index + 1 output.write(mapped) except Exception as e: error.write(record, str(e))

Trouble transforming US Census API data into Avro

Hello,

I have a pipeline that ingests JSON formatted data from the US Census API. Here is the API call I am making for reference: https://api.census.gov/data/2018/pep/population?get=GEONAME,POP,DENSITY&for=place:*&in=state:39&in=county:*&key=4dc4a831c4ab825b649451a82890fc68bc7fe976

I am using an HTML origin configured to ingest JSON array of objects : image description

My goal is to use Schema generator to create an Avro formatted dataset to be ingested into a Kafka Producer as seen below:

image description

I have ingested JSON data before, but the results were returned in a LIST-MAP format, so there was no issue with my pipeline configuration. Now my data is returning in LIST format and I am unsure on what steps are needed in order to proceed.

image description

I am not sure if it is helpful, but some language from the Census website states that their JSON format is non-traditional: The Census uses a nonstandard version of JSON that is streamlined:

  • Data are represented in a two-dimensional array
  • Square brackets [ ] hold arrays
  • Values are separated by a , (comma).

Any help is very much appreciated. Thank you.

UPDATE:

Work in progress script..

for record in records: records: try: keys = record.value[0] index = 1 for values in record.value: mapped = dict(zip(keys,record.value[index])) index = index + 1 output.write(mapped) output.write(mapped) except Exception as e: error.write(record, str(e))

Trouble transforming US Census API data into Avro

Hello,

I have a pipeline that ingests JSON formatted data from the US Census API. Here is the API call I am making for reference: https://api.census.gov/data/2018/pep/population?get=GEONAME,POP,DENSITY&for=place:*&in=state:39&in=county:*&key=4dc4a831c4ab825b649451a82890fc68bc7fe976

I am using an HTML origin configured to ingest JSON array of objects : image description

My goal is to use Schema generator to create an Avro formatted dataset to be ingested into a Kafka Producer as seen below:

image description

I have ingested JSON data before, but the results were returned in a LIST-MAP format, so there was no issue with my pipeline configuration. Now my data is returning in LIST format and I am unsure on what steps are needed in order to proceed.

image description

I am not sure if it is helpful, but some language from the Census website states that their JSON format is non-traditional: The Census uses a nonstandard version of JSON that is streamlined:

  • Data are represented in a two-dimensional array
  • Square brackets [ ] hold arrays
  • Values are separated by a , (comma).

Any help is very much appreciated. Thank you.

UPDATE:

Work in progress script..

for record in records: try: keys = record.value[0] index = 1 for values in record.value: mapped = dict(zip(keys,record.value[index])) index = index + 1 output.write(mapped) except Exception as e: error.write(record, str(e))

image description

Trouble transforming US Census API data into Avro

Hello,

I have a pipeline that ingests JSON formatted data from the US Census API. Here is the API call I am making for reference: https://api.census.gov/data/2018/pep/population?get=GEONAME,POP,DENSITY&for=place:*&in=state:39&in=county:*&key=4dc4a831c4ab825b649451a82890fc68bc7fe976

I am using an HTML origin configured to ingest JSON array of objects : image description

My goal is to use Schema generator to create an Avro formatted dataset to be ingested into a Kafka Producer as seen below:

image description

I have ingested JSON data before, but the results were returned in a LIST-MAP format, so there was no issue with my pipeline configuration. Now my data is returning in LIST format and I am unsure on what steps are needed in order to proceed.

image description

I am not sure if it is helpful, but some language from the Census website states that their JSON format is non-traditional: The Census uses a nonstandard version of JSON that is streamlined:

  • Data are represented in a two-dimensional array
  • Square brackets [ ] hold arrays
  • Values are separated by a , (comma).

Any help is very much appreciated. Thank you.

UPDATE:

Update from my end, so far I have an almost working Jython Script for this issue. It works fine when I explicitly call what index I am mapping the field names. However, when I try to initiate the process through a nested for loop to go through every index, I keep getting this error...

SCRIPTING_04 - Script sent record to error: write(): 1st arg can't be coerced to com.streamsets.pipeline.stage.processor.scripting.ScriptRecord

Work in progress script..

image description

Trouble transforming US Census API data into Avro

Hello,

I have a pipeline that ingests JSON formatted data from the US Census API. Here is the API call I am making for reference: https://api.census.gov/data/2018/pep/population?get=GEONAME,POP,DENSITY&for=place:*&in=state:39&in=county:*&key=4dc4a831c4ab825b649451a82890fc68bc7fe976

I am using an HTML origin configured to ingest JSON array of objects : image description

My goal is to use Schema generator to create an Avro formatted dataset to be ingested into a Kafka Producer as seen below:

image description

I have ingested JSON data before, but the results were returned in a LIST-MAP format, so there was no issue with my pipeline configuration. Now my data is returning in LIST format and I am unsure on what steps are needed in order to proceed.

image description

I am not sure if it is helpful, but some language from the Census website states that their JSON format is non-traditional: The Census uses a nonstandard version of JSON that is streamlined:

  • Data are represented in a two-dimensional array
  • Square brackets [ ] hold arrays
  • Values are separated by a , (comma).

Any help is very much appreciated. Thank you.

09/10/19 UPDATE:

Update from my end, so far I have an almost working Jython Script for this issue. It works fine when I explicitly call what index I am mapping the finally mapped field values as field names. However, when I try to initiate the process through a nested Here is the code I have so far.

for loop to go through every index, I keep getting this error...

SCRIPTING_04 - Script sent record to error: write(): 1st arg can't be coerced to com.streamsets.pipeline.stage.processor.scripting.ScriptRecord

Work in progress script..

image descriptionin records: outlist = {} i=0 for index in record.value: index = i keys = record.value[index] values = record.value[index] outrecord = dict({keys:values}) outlist.update(outrecord) i = i + 1 record.value = outlist output.write(record)

I now am working on applying the field names from Record 1 onto the remaining records. I cannot find any documentation related to calling a specific record number out of an array of objects. Any guidance is much appreciated. Thank you.

Trouble transforming US Census API data into Avro

Hello,

I have a pipeline that ingests JSON formatted data from the US Census API. Here is the API call I am making for reference: https://api.census.gov/data/2018/pep/population?get=GEONAME,POP,DENSITY&for=place:*&in=state:39&in=county:*&key=4dc4a831c4ab825b649451a82890fc68bc7fe976

I am using an HTML origin configured to ingest JSON array of objects : image description

My goal is to use Schema generator to create an Avro formatted dataset to be ingested into a Kafka Producer as seen below:

image description

I have ingested JSON data before, but the results were returned in a LIST-MAP format, so there was no issue with my pipeline configuration. Now my data is returning in LIST format and I am unsure on what steps are needed in order to proceed.

image description

I am not sure if it is helpful, but some language from the Census website states that their JSON format is non-traditional: The Census uses a nonstandard version of JSON that is streamlined:

  • Data are represented in a two-dimensional array
  • Square brackets [ ] hold arrays
  • Values are separated by a , (comma).

Any help is very much appreciated. Thank you.

09/10/19 UPDATE:

I have finally mapped field values as field names. Here is the code I have so far.

for record in records:
  outlist = {}
  i=0
  for index in record.value:
    index = i
    keys = record.value[index]
    values = record.value[index]
    outrecord = dict({keys:values})
    outlist.update(outrecord)
    i = i + 1
  record.value = outlist
  output.write(record)

I now am working on applying the field names from Record 1 onto the remaining records. I cannot find any documentation related to calling a specific record number out of an array of objects. Any guidance is much appreciated. Thank you.