环境: CentOS 5.7, CDH 4.2.0
Cascalog是一款基于cascading和hadoop上用clojure定义的DSL。由于clojure的元数据和函数编程范式,它很好地定义函数和查询。
下面讲解下使用场景:
1. 使用lein创建一个工程
lein cascalog_incanter
2. 切入到cascalog_incanter,编辑project.clj 如下所示:
(defproject cascalog_incanter “0.1.0-SNAPSHOT”
:description “FIXME: write description”
:url “http://example.com/FIXME”
:license {:name “Eclipse Public License”
:url “http://www.eclipse.org/legal/epl-v10.html”}
:dependencies [[org.clojure/clojure “1.6.0”]
[cascalog/cascalog-core “2.1.1”]
[incanter “1.5.5”]]
:repositories [[“conjars.org” “http://conjars.org/repo”]
[“cloudera” “https://repository.cloudera.com/artifactory/cloudera-repos/”]]
:profiles {
:provided {
:dependencies [
;[org.apache.hadoop/hadoop-core “1.2.1”] ; Apache Hadoop MapReduce v1
;[org.apache.hadoop/hadoop-core “2.0.0-mr1-cdh4.2.0”] ; CDH 4.2.0 MapReduce v1
[org.apache.hadoop/hadoop-common “2.0.0-cdh4.2.0” ] ; Cloudera Hadoop 4.2.0 YARN
[org.apache.hadoop/hadoop-mapreduce-client-core “2.0.0-cdh4.2.0” ] ; Cloudera Hadoop 4.2.0 MapReduce v2
]
}
:dev {
:dependencies [
[org.apache.hadoop/hadoop-minicluster “2.0.0-cdh4.2.0”] ; Cloudera Hadoop 4.2.0
]}
}
)
lein repl
4. 参考示例http://cascalog.org/articles/getting_started.html