centos中编译安装datax

系统环境

Python安装

1
2
3
4
5
6
7
8
9
10
yum groupinstall -y "development tools"
yum install -y zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel expat-devel
yum install -y wget

wget https://www.python.org/ftp/python/2.6.9/Python-2.6.9.tgz
tar xzvf Python-2.6.9.tgz -C /opt/python/
cd /opt/python/
./configure --prefix=/usr/local --enable-shared LDFLAGS="-Wl,-rpath /usr/local/lib"
make && make altinstall
python -V # 显示python的版本

编译DataX

下载DataX
1
2
yum install git -y # 如果没有安装git则安装一下
git clone https://github.com/alibaba/DataX
编译DataX
1
2
3
# 通过maven打包
$ cd {DataX_source_code_home}
$ mvn -U clean package assembly:assembly -Dmaven.test.skip=true

打包成功,日志显示如下:

1
2
3
4
5
6
[INFO] BUILD SUCCESS
[INFO] -----------------------------------------------------------------
[INFO] Total time: 08:12 min
[INFO] Finished at: 2015-12-13T16:26:48+08:00
[INFO] Final Memory: 133M/960M
[INFO] -----------------------------------------------------------------

打包成功后的DataX包位于 {DataX_source_code_home}/target/datax/datax/ ,结构如下:

1
2
3
$ cd  {DataX_source_code_home}
$ ls ./target/datax/datax/
bin conf job lib log log_perf plugin script tmp

配置环境变量

这一步其实也不是必须,只是配置上会比较方便

vim /etc/profile

1
2
3
DATAX_HOME={DATAX_HOME}
PATH=$PATH:$JAVA_HOME/bin:$DATAX_HOME/bin:$PATH
EXPORT DATAX_HOME

source /etc/profile

常见编译异常

maven配置阿里云的maven私服

1
2
3
4
5
6
<mirror>
<id>nexus-aliyun</id>
<mirrorOf>central</mirrorOf>
<name>Nexus aliyun</name>
<url>https://maven.aliyun.com/repository/central</url>
</mirror>

最好是配置一下,否则一堆报错,解决起来很费事。

本地编译报错无法找到工件com.aliyun.openservices:tablestore-streamclient:jar:1.0.0-SNAPSHOT

解决方法:

1
2
3
4
5
6
7
vim $ {DataX_source_code_home} /otsstreamreader/pom.xml 
<dependency>
<groupId>com.aliyun.openservices</groupId>
<artifactId>tablestore-streamclient</artifactId>
<version>1.0.0-SNAPSHOT</version>
</dependency>
# 此处把<version>1.0.0-SNAPSHOT</version>改成<version>1.0.0</version>

编译datax odps插件模块报错

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
[ERROR] Failed to execute goal on project odpsreader: Could not resolve dependencies for project com.alibaba.datax:odpsreader:jar:0.0.1-SNAPSHOT: The following artifacts could not be resolved: com.alibaba.datax:datax-common:jar:0.0.1-SNAPSHOT, com.alibaba.external:bouncycastle.provider:jar:1.38-jdk15: Could not find artifact com.alibaba.datax:datax-common:jar:0.0.1-SNAPSHOT in dtwave (http://repo2.dtwave-inc.com/repository/public/) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException

# 比较过之前odps-sdk-core-0.19.3-public.pom的的依赖是

org.bouncycastle
bcprov-jdk15on
1.52
现在是

com.alibaba.external
bouncycastle.provider
1.38-jdk15

# 原因: 后来的这个依赖应该是阿里内部jar,外部仓库无法加载这个jar

解决方法:

修改pom.xml

1
2
3
com.aliyun.odps
odps-sdk-core
换一下版本 :0.20.7-public

参考资料