系统环境
- CentOS
- Python 2.6.9
- https://github.com/alibaba/DataX
Python安装
1 | yum groupinstall -y "development tools" |
编译DataX
下载DataX
1 | yum install git -y # 如果没有安装git则安装一下 |
编译DataX
1 | # 通过maven打包 |
打包成功,日志显示如下:
1 | [INFO] BUILD SUCCESS |
打包成功后的DataX包位于 {DataX_source_code_home}/target/datax/datax/ ,结构如下:
1 | cd {DataX_source_code_home} |
配置环境变量
这一步其实也不是必须,只是配置上会比较方便
vim /etc/profile
1 | DATAX_HOME={DATAX_HOME} |
source /etc/profile
常见编译异常
maven配置阿里云的maven私服
1 | <mirror> |
最好是配置一下,否则一堆报错,解决起来很费事。
本地编译报错无法找到工件com.aliyun.openservices:tablestore-streamclient:jar:1.0.0-SNAPSHOT
解决方法:
1 | vim $ {DataX_source_code_home} /otsstreamreader/pom.xml |
编译datax odps插件模块报错
1 | [ERROR] Failed to execute goal on project odpsreader: Could not resolve dependencies for project com.alibaba.datax:odpsreader:jar:0.0.1-SNAPSHOT: The following artifacts could not be resolved: com.alibaba.datax:datax-common:jar:0.0.1-SNAPSHOT, com.alibaba.external:bouncycastle.provider:jar:1.38-jdk15: Could not find artifact com.alibaba.datax:datax-common:jar:0.0.1-SNAPSHOT in dtwave (http://repo2.dtwave-inc.com/repository/public/) -> [Help 1] |
解决方法:
修改pom.xml
1 | com.aliyun.odps |