技术 – 第15页 – 强的部落格

Micrsoft开源的lightLDA

环境：Ubuntu 14.04

git clone https://github.com/Microsoft/lightlda.git
cd lightlda/

vi build.sh
修改如下：
#git clone https://github.com:Microsoft/multiverso.git
git clone https://github.com/Microsoft/multiverso.git

sh build.sh

cd example

export LD_LIBRARY_PATH=~/lightlda/multiverso/third_party/lib:$LD_LIBRARY_PATH
sh nytimes.sh

DMLC XGBoost安装使用

环境：Ubuntu 14.04

git clone https://github.com/dmlc/xgboost.git
cd xgboost

编译

make cxx11=1

运行示例

cd demo/binary_classification
../../xgboost mushroom.conf

安装python包

cd ~/xgboost/python-package
sudo python setup.py install

Chrome上的sqlite在windows 7位置

环境: Windows 7

Chrome浏览器会把浏览记录信息等放在sqlite数据库上，它的位置在C:\\Users\\{username}\\AppData\\Local\\Google\\Chrome\\User Data\\Default目录下，有一个命名为History的文件就是浏览记录。

备案，拜拜

前一段时间弄了一下备案，麻烦透啦，而且还没通过。只好再租了香港的服务器把Wordpress部署上去，这样万恶的备案滚一边去。

ubuntu add-apt-repository command not found

环境： Ubuntu 14.04

使用add-apt-repository增加某个repository时出现如下问题：command not found

解决方法如下：

sudo apt-get remove software-properties-common python-software-properties

sudo apt-get install python-software-properties

Fetched 11.8 MB in 26s (448 kB/s)
W: GPG error: http://security.debian.org wheezy/updates InRelease: The following signatures couldn’t be verified because the public key is not available: NO_PUBKEY 9D6D8F6BC857C906 NO_PUBKEY 8B48AD6246925553
W: GPG error: http://http.debian.net wheezy-updates Release: The following signatures couldn’t be verified because the public key is not available: NO_PUBKEY 8B48AD6246925553 NO_PUBKEY 7638D0442B90D010
W: Failed to fetch http://packages.couchbase.com/ubuntu/dists/trusty/InRelease Unable to find expected entry ‘precise/main/binary-amd64/Packages’ in Release file (Wrong sources.list entry or malformed file)

W: Failed to fetch http://http.debian.net/debian/dists/wheezy/Release.gpg Connection failed

解决方法，增加公共的key:
sudo apt-key adv –keyserver keyserver.ubuntu.com –recv-keys 9D6D8F6BC857C906 8B48AD6246925553 7638D0442B90D010 6FB2A1C265FFB764

数据仓库ETL中注意两项

1. 不要使用update操作，这个对数据库影响极大。用delete和insert操作来替换，

2. 对于源数据的字符类型，不能确认的类型一律采用varchar类型

单机安装Ceph

环境: Ubuntu 14.04

一直想用做图片方面的存储，之前有用过Riak CS，但布署，安装和管理挺麻烦的。后来知道ceph，了解它的途径是使用它了。先说下安装步骤：

wget -q -O- ‘https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc’ | sudo apt-key add –

echo deb http://ceph.com/debian-giant/ trusty main | sudo tee /etc/apt/sources.list.d/ceph.list

更新系统
sudo apt-get -q update

安装ceph布署工具
sudo apt-get install ceph-deploy

查询主机名
hostname
主机名为ubuntu

初始化节点信息
sudo ceph-deploy new ubuntu

安装ceph软件
sudo ceph-deploy install ubuntu

创建mon集群
sudo ceph-deploy mon create

启动mon进程
sudo ceph-deploy mon create-initial

安装OSD
sudo mkdir -p /data/osd
sudo ceph-deploy osd prepare ubuntu:/data/osd
sudo ceph-deploy osd activate ubuntu:/data/osd

查看ceph状态
sudo ceph health

增加一个元数据服务器
sudo ceph-deploy mds create ubuntu

查看进程
jerry@ubuntu:~$ ps ax | grep ceph
8863 ? Ssl 0:00 /usr/bin/ceph-mon –cluster=ceph -i ubuntu -f
9357 ? Ssl 0:01 /usr/bin/ceph-osd –cluster=ceph -i 0 -f
9496 ? Ssl 0:00 /usr/bin/ceph-mds –cluster=ceph -i ubuntu -f
9517 pts/0 S+ 0:00 grep –color=auto ceph

有三个ceph服务

查看状态
sudo ceph -s

使用ceph

启动ceph-rest-api

sudo ceph-rest-api -n client.admin &

通过浏览器查看 http://192.168.56.101:5000/

Myriad编译

环境: CentOS 6.4, Myriad

Myriad是一个支持yarn的mesos框架，用于整合yarn和mesos资源管理。

编译如下：

git clone https://github.com/mesos/myriad.git

由于gradle被墙了（真不知道为什么要墙住这个软件，万恶的gfw），只好从 http://get.jenv.mvnsearch.org/download/gradle/gradle-2.4.zip 下载，将其放在myriad/gradle/wrapper目录下。修改gradle-wrapper.properties配置文件，

vi myriad/gradle/wrapper/gradle-wrapper.properties

#Wed Jun 10 10:58:12 CDT 2015
distributionBase=GRADLE_USER_HOME
distributionPath=wrapper/dists
zipStoreBase=GRADLE_USER_HOME
zipStorePath=wrapper/dists
#distributionUrl=https\\\\://services.gradle.org/distributions/gradle-2.4-bin.zip
distributionUrl=gradle-2.4.zip

注释掉原有的distributionURL，并新增。

最后编译文件

cd myriad

./gradlew build

配置和启用myriad

复制相应库到目录下

sudo cp myriad/myriad-executor/build/libs/myriad-executor-runnable-0.0.1.jar /usr/local/libexec/mesos
sudo cp myrida/myriad-scheduler/build/libs/*.jar /usr/lib/hadoop-yarn/

编辑环境变量
sudo vi /etc/hadoop/conf/hadoop-env.sh
export MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.so
sudo vi /etc/hadoop/conf/yarn-site.xml
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>${nodemanager.resource.cpu-vcores}</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>${nodemanager.resource.memory-mb}</value>
</property>
<!–These options enable dynamic port assignment by mesos –>
<property>
<name>yarn.nodemanager.address</name>
<value>${myriad.yarn.nodemanager.address}</value>
</property>
<property>
<name>yarn.nodemanager.webapp.address</name>
<value>${myriad.yarn.nodemanager.webapp.address}</value>
</property>
<property>
<name>yarn.nodemanager.webapp.https.address</name>
<value>${myriad.yarn.nodemanager.webapp.address}</value>
</property>
<property>
<name>yarn.nodemanager.localizer.address</name>
<value>${myriad.yarn.nodemanager.localizer.address}</value>
</property>

<!– Configure Myriad Scheduler here –>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>com.ebay.myriad.scheduler.yarn.MyriadFairScheduler</value>
<description>One can configure other scehdulers as well from following list: com.ebay.myriad.scheduler.yarn.MyriadCapacityScheduler, com.ebay.myriad.scheduler.yarn.MyriadFifoScheduler</description>
</property>
<property>
<description>A comma separated list of services where service name should only contain a-zA-Z0-9_ and can not start with numbers</description>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle,myriad_executor</value>
<!– If using MapR distribution
<value>mapreduce_shuffle,mapr_direct_shuffle,myriad_executor</value> –>
</property>
<property>
<name>yarn.nodemanager.aux-services.myriad_executor.class</name>
<value>com.ebay.myriad.executor.MyriadExecutorAuxService</value>
</property>
sudo vi /etc/hadoop/conf/mapred-site.xml
<!–This option enables dynamic port assignment by mesos –>
<property>
<name>mapreduce.shuffle.port</name>
<value>${myriad.mapreduce.shuffle.port}</value>
</property>

重启resource manager服务
sudo /etc/init.d/hadoop-yarn-resourcemanager restart
sudo /etc/init.d/hadoop-yarn-resourcemanager status

分类：技术