这周一直感觉不舒服,可能是晚上睡觉受凉了,没精打采的,头晕晕沉沉!
ubuntu add-apt-repository command not found
环境: Ubuntu 14.04
使用add-apt-repository增加某个repository时出现如下问题:command not found
解决方法如下:
sudo apt-get remove software-properties-common python-software-properties
sudo apt-get install python-software-properties
apt-get update 问题
环境:Ubuntu 14.04
使用apt-get update出现如下问题:
Fetched 11.8 MB in 26s (448 kB/s)
W: GPG error: http://security.debian.org wheezy/updates InRelease: The following signatures couldn’t be verified because the public key is not available: NO_PUBKEY 9D6D8F6BC857C906 NO_PUBKEY 8B48AD6246925553
W: GPG error: http://http.debian.net wheezy-updates Release: The following signatures couldn’t be verified because the public key is not available: NO_PUBKEY 8B48AD6246925553 NO_PUBKEY 7638D0442B90D010
W: Failed to fetch http://packages.couchbase.com/ubuntu/dists/trusty/InRelease Unable to find expected entry ‘precise/main/binary-amd64/Packages’ in Release file (Wrong sources.list entry or malformed file)
W: Failed to fetch http://http.debian.net/debian/dists/wheezy/Release.gpg Connection failed
解决方法,增加公共的key:
sudo apt-key adv –keyserver keyserver.ubuntu.com –recv-keys 9D6D8F6BC857C906 8B48AD6246925553 7638D0442B90D010 6FB2A1C265FFB764
数据仓库ETL中注意两项
1. 不要使用update操作,这个对数据库影响极大。用delete和insert操作来替换,
2. 对于源数据的字符类型,不能确认的类型一律采用varchar类型
使用淘宝NPM镜像
环境: Ubuntu 14.04
在使用js库的时候老是有些包无法下载使用(又被墙了),最近发现淘宝开发一个NPM镜像。
地址如下:http://npm.taobao.org/
安装如下:
npm config set ca “”
sudo npm install -g cnpm –registry=https://registry.npm.taobao.org
安装包 leaflet-storage
sudo cnpm install napa
sudo cnpm install leaflet-storage
/usr/bin/env: node: No such file or directory
解决如下:sudo ln /usr/bin/nodejs /usr/bin/node
单机安装Ceph
环境: Ubuntu 14.04
一直想用做图片方面的存储,之前有用过Riak CS,但布署,安装和管理挺麻烦的。后来知道ceph,了解它的途径是使用它了。先说下安装步骤:
wget -q -O- ‘https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc’ | sudo apt-key add –
echo deb http://ceph.com/debian-giant/ trusty main | sudo tee /etc/apt/sources.list.d/ceph.list
更新系统
sudo apt-get -q update
安装ceph布署工具
sudo apt-get install ceph-deploy
查询主机名
hostname
主机名为ubuntu
初始化节点信息
sudo ceph-deploy new ubuntu
安装ceph软件
sudo ceph-deploy install ubuntu
创建mon集群
sudo ceph-deploy mon create
启动mon进程
sudo ceph-deploy mon create-initial
安装OSD
sudo mkdir -p /data/osd
sudo ceph-deploy osd prepare ubuntu:/data/osd
sudo ceph-deploy osd activate ubuntu:/data/osd
查看ceph状态
sudo ceph health
增加一个元数据服务器
sudo ceph-deploy mds create ubuntu
查看进程
jerry@ubuntu:~$ ps ax | grep ceph
8863 ? Ssl 0:00 /usr/bin/ceph-mon –cluster=ceph -i ubuntu -f
9357 ? Ssl 0:01 /usr/bin/ceph-osd –cluster=ceph -i 0 -f
9496 ? Ssl 0:00 /usr/bin/ceph-mds –cluster=ceph -i ubuntu -f
9517 pts/0 S+ 0:00 grep –color=auto ceph
有三个ceph服务
查看状态
sudo ceph -s
使用ceph
启动ceph-rest-api
sudo ceph-rest-api -n client.admin &
通过浏览器查看 http://192.168.56.101:5000/
Myriad编译
环境: CentOS 6.4, Myriad
Myriad是一个支持yarn的mesos框架 ,用于整合yarn和mesos资源管理。
编译如下:
git clone https://github.com/mesos/myriad.git
由于gradle被墙了(真不知道为什么要墙住这个软件,万恶的gfw),只好从 http://get.jenv.mvnsearch.org/download/gradle/gradle-2.4.zip 下载,将其放在myriad/gradle/wrapper目录下。修改gradle-wrapper.properties配置文件,
vi myriad/gradle/wrapper/gradle-wrapper.properties
#Wed Jun 10 10:58:12 CDT 2015
distributionBase=GRADLE_USER_HOME
distributionPath=wrapper/dists
zipStoreBase=GRADLE_USER_HOME
zipStorePath=wrapper/dists
#distributionUrl=https\\\\://services.gradle.org/distributions/gradle-2.4-bin.zip
distributionUrl=gradle-2.4.zip
注释掉原有的distributionURL,并新增。
最后编译文件
cd myriad
./gradlew build
配置和启用myriad
复制相应库到目录下
sudo cp myriad/myriad-executor/build/libs/myriad-executor-runnable-0.0.1.jar /usr/local/libexec/mesos
sudo cp myrida/myriad-scheduler/build/libs/*.jar /usr/lib/hadoop-yarn/
编辑环境变量
sudo vi /etc/hadoop/conf/hadoop-env.sh
export MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.so
sudo vi /etc/hadoop/conf/yarn-site.xml
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>${nodemanager.resource.cpu-vcores}</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>${nodemanager.resource.memory-mb}</value>
</property>
<!–These options enable dynamic port assignment by mesos –>
<property>
<name>yarn.nodemanager.address</name>
<value>${myriad.yarn.nodemanager.address}</value>
</property>
<property>
<name>yarn.nodemanager.webapp.address</name>
<value>${myriad.yarn.nodemanager.webapp.address}</value>
</property>
<property>
<name>yarn.nodemanager.webapp.https.address</name>
<value>${myriad.yarn.nodemanager.webapp.address}</value>
</property>
<property>
<name>yarn.nodemanager.localizer.address</name>
<value>${myriad.yarn.nodemanager.localizer.address}</value>
</property>
<!– Configure Myriad Scheduler here –>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>com.ebay.myriad.scheduler.yarn.MyriadFairScheduler</value>
<description>One can configure other scehdulers as well from following list: com.ebay.myriad.scheduler.yarn.MyriadCapacityScheduler, com.ebay.myriad.scheduler.yarn.MyriadFifoScheduler</description>
</property>
<property>
<description>A comma separated list of services where service name should only contain a-zA-Z0-9_ and can not start with numbers</description>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle,myriad_executor</value>
<!– If using MapR distribution
<value>mapreduce_shuffle,mapr_direct_shuffle,myriad_executor</value> –>
</property>
<property>
<name>yarn.nodemanager.aux-services.myriad_executor.class</name>
<value>com.ebay.myriad.executor.MyriadExecutorAuxService</value>
</property>
sudo vi /etc/hadoop/conf/mapred-site.xml
<!–This option enables dynamic port assignment by mesos –>
<property>
<name>mapreduce.shuffle.port</name>
<value>${myriad.mapreduce.shuffle.port}</value>
</property>
重启resource manager服务
sudo /etc/init.d/hadoop-yarn-resourcemanager restart
sudo /etc/init.d/hadoop-yarn-resourcemanager status
Kafka获取最近的offset
有两个函数
kafka.api.OffsetRequest.LatestTime
OffsetFetchRequest
0.8.2以前都是从zookeeper中读offset,从0.8.2之后从kafka中读取offset
Spark standalone 模式控制应用使用的cpu和内存
环境:Spark 1.3.0
由于搭建的spark 是standalone模式,因而应用使用的内存和cpu数应由spark-env.sh的环境变量或应用程序的控制参数spark.executor.memory和spark.cores.max,不然应用将占用所有cpu数并使用其它应用无法获取cpu数,并且spark-submit中的控制参数(total-executor-cores, executor-memory无效。
环境变量:
SPARK_WORKER_MEMORY=”50g”
SPARK_WORKER_CORES=22
应用程序:
new SparkConf().set(“spark.executor.memory”, “5g”).set(“spark.cores.max”, “5”)
Linux查找选定进程并kill掉
有种方法:
方法1:
ps ax | grep “postgres” | cut -f2 -d” ” | xargs kill
方法2:
ps ax | grep “postgres” | awk ‘{print $1}’ | xargs kill