There is a thread in database forum that it should be here: http://forums.teradata.com/forum/database/teradataimporttool-charset-problem
---
Hi.
I am importing data from teradata to hadoop with "Teradata Connector for Hadoop (Command Line Edition): Cloudera" v1.2:
http://downloads.teradata.com/download/connectivity/teradata-connector-for-hadoop-command-line-edition
I have a table like this:
create table testable (
id int not null,
value varchar(50),
text varchar(200),
PRIMARY KEY (id)
);
And I have inserted this data:
insert into testtable values (1, '#1€', 'aá');
insert into testtable values (2, '#2€', 'eé');
The import job works normally:
export USERLIBTDCH=/usr/lib/tdch/teradata-connector-1.2.jar
hadoop jar $USERLIBTDCH com.teradata.hadoop.tool.TeradataImportTool -classname com.teradata.jdbc.TeraDriver -url jdbc:teradata://teradataServer/ DATABASE=test,CHARSET=UTF8 -username dbc -password dbc -jobtype hdfs -fileformat textfile -targetpaths /temp/hdfstable -sourcetable testtable -splitbycolumn id
But the resulting file in hdfs:
1 #1? a?
2 #2? e?
How can I import "special" characters from teradata to hadoop (UTF-8)? If I use the jdbc driver directly (e.g. java program), it works ok. the problem seems to be in the connector...