Flume配置导入kafka,ElasticSearch

环境:CentOS 6.3,  Kafka 8.1, Flume 1.6, elasticsearch-1.4.4

配置文件如下:

[adadmin@s9 apache-flume-1.6.0-bin]$ vi conf/flume.conf

#define source, sink, channel
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.channels = c1
a1.sources.r1.command = tail -F /home/adadmin/.bash_history

# Describe the sink
#only test
#a1.sinks.k1.type = logger

#load to Kafka
#a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
#a1.sinks.k1.batchSize = 5
#a1.sinks.k1.brokerList = xxx.xxx.xxx.xxx:9092,xxx.xxx.xxx.xxx:9092,xxx.xxx.xxx.xxx:9092
#a1.sinks.k1.topic = flume_topic1

#load to ElasticSearch
a1.sinks.k1.type = org.apache.flume.sink.elasticsearch.ElasticSearchSink
a1.sinks.k1.hostNames = xxx.xxx.xxx.xxx:9300
a1.sinks.k1.clusterName = elasticsearch
a1.sinks.k1.batchSize = 100
a1.sinks.k1.indexName = logstash
a1.sinks.k1.ttl = 5
a1.sinks.k1.serializer = org.apache.flume.sink.elasticsearch.ElasticSearchLogStashEventSerializer

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

 

启用Flume agent

[adadmin@s9 apache-flume-1.6.0-bin]$ bin/flume-ng agent -c /home/adadmin/apache-flume-1.6.0-bin/conf -f /home/adadmin/apache-flume-1.6.0-bin/conf/flume.conf -n a1 -Dflume.root.logger=INFO,console

 

(注:在导入ElasticSearch时需要把此文件的lib导入到flume的库目录下,操作如下:

[adadmin@s9 apache-flume-1.6.0-bin]$ mkdir -p plugins.d/elasticsearch/libext
[adadmin@s9 apache-flume-1.6.0-bin]$cp /home/adadmin/elasticsearch-1.4.4/lib/*.jar plugins.d/elasticsearch/libext

)

Pentaho BI Server实现同比和环比

环境:  Pentaho 5.3, postgresql 9.3

最近在看pentaho report designer和CDE有没有实现类似同比和环比的功能,可惜的是没有找到。那只好从数据库的角度来解决这个问题。

假如有两表test, dim_date

test:

20140818;4
20150817;40
20150818;10
20150819;55
20160817;30
20160818;50

dim_date:

20150104;”2015年01月04日”;”2015年”;”第01月”;”2015-01-04″;”第1周”
20150103;”2015年01月03日”;”2015年”;”第01月”;”2015-01-03″;”第1周”
20150102;”2015年01月02日”;”2015年”;”第01月”;”2015-01-02″;”第1周”

 

通过olap窗口函数lag,语句实现如下:

select * from (
select date_id, volume, lag(volume, 1) over (order by date_fmt) pre_volume, lag(volume, 2) over (order by date_fmt) pre_365_volume from (
select b.date_id, b.date_fmt, a.volume from test a, dim_date b where a.date_id = b.date_id and b.date_fmt in(to_date(‘2016-08-18’, ‘yyyy-mm-dd’) – interval ‘1 day’, ‘2016-08-18’, to_date(‘2016-08-18’, ‘yyyy-mm-dd’) – interval ‘1 year’)
) m
) n where n.date_id = 20160818

可查实现查询日期的前一天和前一年同一日期的数据,这样就可以实现同比和环比的功能。

安装zookeeper监控软件taokeeper

安装zookeeper监控软件taokeeper

(安装文件位于 http://jm-blog.aliapp.com/?p=1450)

安装步骤:

0. 下载apache tomcat, wget http://archive.apache.org/dist/tomcat/tomcat-6/v6.0.43/bin/apache-tomcat-6.0.43.zip, unzip apache-tomcat-6.0.43.zip,cd apache-tomcat-6.0.43, chmod +x bin/*.sh

1. 下载taokeeper.sql,初始化数据库(Mysql).

2. 下载taokeeper-monitor.war文件,解压到tomcat的webapps目前下,确保最后目录结构如下:%TOMCAT_HOME%/webapps/taokeeper-monitor.war

3. 下载 taokeeper-monitor-config.properties文件,存放到一个指定目录

内容如下:

systemInfo.envName=Zookeeper_Monitor
#DBCP
dbcp.driverClassName=com.mysql.jdbc.Driver
dbcp.dbJDBCUrl=jdbc:mysql://xxx.xxx.xxx.xxx:3306/taokeeper
dbcp.characterEncoding=GBK
dbcp.username=xx
dbcp.password=xxxxxx
dbcp.maxActive=30
dbcp.maxIdle=10
dbcp.maxWait=10000
#SystemConstant
SystemConstent.dataStoreBasePath=/home/adadmin/taokeeper-monitor/datastore/
#SSH account of zk server
SystemConstant.userNameOfSSH=
SystemConstant.passwordOfSSH=

4. 修改%TOMCAT_HOME%/bin/catlina.sh,增加一项: JAVA_OPTS=-DconfigFilePath=”/home/adadmin/taokeeper-monitor/taokeeper-monitor-config.properties”

5. 启动tomcat服务,bin/start.sh

调度系统Azkaban

Azkaban包括三个关键组件:

关系数据库:使用 Mysql数据库,主要用于保存流程、权限、任务状态、任务计划等信息。
AzkabanWebServer:为用户提供管理留存、任务计划、权限等功能。
AzkabanExecutorServer:执行任务,并把任务执行的输出日志保存到 Mysql;可以同时启动多个 AzkabanExecutorServer,通过mysql获取流程状态来协调工作。

 

安装步骤

1. 创建数据库azkaban,加载相应的元数据表(azkaban-sql-script-2.5.0.tar.gz)

CREATE DATABASE azkaban;
GRANT all privileges ON azkaban.* to ‘hq’@’%’;

mysql> source create-all-sql-2.5.0.sql

2. 下载并安装azkaban-web-server-2.5.0.tar.gz

tar xvf azkaban-web-server-2.5.0.tar.gz

创建SSL配置,命令如下:
keytool -keystore keystore -alias jetty -genkey -keyalg RSA
cp keystore azkaban-web-2.5.0/

cd azkaban-web-2.5.0

修改配置参数
vi conf/azkaban.properties

default.timezone.id=Asia/Shanghai

database.type=mysql
mysql.port=3306
mysql.host=xxx.xxx.xxx.xxx
mysql.database=azkaban
mysql.user=hq
mysql.password=xxxxxx

jetty.keystore=keystore
jetty.password=azkaban #(配置SSL的密码)
jetty.keypassword=azkaban #(配置SSL的密码)
jetty.truststore=keystore
jetty.trustpassword=azkaban #(配置SSL的密码)
3. 下载并安装azkaban-executor-server-2.5.0.tar.gz
tar xvf azkaban-executor-server-2.5.0.tar.gz
cd azkaban-executor-2.5.0

修改executor的运行参数
vi conf/azkaban.properties

mysql.host=xxx.xxx.xxx.xxx
mysql.database=azkaban
mysql.user=hq
mysql.password=xxxxxx
4. 启动web和executor的服务

cd azkaban-web-2.5.0
bin/azkaban-web-start.sh

cd azkaban-executor-2.5.0
bin/azkaban-executor-start.sh

借助Phantomjs生成pentaho dashboard的pdf输出格式

环境: CentOS 5.4, Pentaho 5.3

下载 phantomjs编译版本

wget http://phantomjs.googlecode.com/files/phantomjs-1.9.2-linux-x86_64.tar.bz2

tar xvf phantomjs-1.9.2-linux-x86_64.tar.bz2

cd phantomjs

wget https://raw.githubusercontent.com/ariya/phantomjs/master/examples/rasterize.js

生成pdf文件:

bin/phantomjs rasterize.js http://www.baidu.com baidu.pdf

bin/phantomjs rasterize.js ‘http://xxx.xxx.xxx.xxx:8080/pentaho/api/repos/%3Apublic%3ASteel%20Wheels%3ADashboards%3AHome%20Dashboard.xcdf/generatedContent?ts=1439186533366&userid=admin&password=password’ Steel_Whells.pdf

Pentaho BI Server生成pdf格式时中文字体丢失

环境:CentOS 5.2, Pentaho BI Server 5.3

定位问题是出在linux服务器缺少对宋体字体的支持

在windows XP上的c:\\\\windows\\\\fonts目录下找到宋体文件simsun.ttc,放至到linux目录下/tmp

在linux上的操作如下:
sudo mkdir /usr/share/fonts/songti
sudo cp /tmp/simsun.ttc /usr/share/fonts/songti/

fc-cache /usr/share/fonts/songti/
fc-list :lang=zh

访问Pentaho API时加入密码参数

环境: CentOS 5.4, Pentaho BI Server 5.3

切换到 biserver-ce/pentaho-solutions/system目录下,编辑security.properties

修改如下:

requestParameterAuthenticationEnabled=true

重启BI Server服务

通过浏览器或curl访问

http://xxx.xxx.xxx.xxx:8080/pentaho/api/repos/%3Apublic%3ASteel%20Wheels%3AReports%3ATop%20Customers%20%28report%29.prpt/viewer?ts=1438767939338&userid=admin&password=password

 

curl -o 2.pdf –user admin:password http://xxx.xxx.xxx.xxx:8080/pentaho/api/repos/%3Ahome%3Aantifraud%3A%E8%A2%AB%E6%94%BB%E5%87%BB%E5%85%B3%E9%94%AE%E8%AF%8DTop20%E6%97%A5%E6%8A%A5.prpt/generatedContent?output-type=pdf

 

升级gcc至4.9

环境: Ubuntu 14.02, gcc 4.8

安装gcc 4.9

sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt-get update
sudo apt-get install g++-4.9

修改默认的gcc版本

sudo update-alternatives –install /usr/bin/gcc gcc /usr/bin/gcc-4.9 150
sudo update-alternatives –install /usr/bin/gcc gcc /usr/bin/gcc-4.8 100
sudo update-alternatives –config gcc