Belajar Mahout

My brain exploded. That’s pretty much my limit.

So, yes, I’ve been interested in SOLR, Apache Tika, and of course Mahout. The promise of classifying and clustering data are enough to persuade me digging up examples about Mahout. So far, what really helps is Seinfeld demo example. It gives me a proper example to try. We can replace the data with our own to get the gist on how Mahout would work.

However, I haven’t get the gist yet. So far, I’ve tried to cluster 2 datasource. One of them is blog post from navinot.com. Here’s an excerpt from cluster-dump:

C-18 [Ponsel, Mobile, Internet, Mobile internet, Iphone]
- /6 Hal Tentang Mobile Internet.txt
- /Mobile Application_ Masa Depan Yang Ditunggu?.txt
- /Netbook_ Bakal Lenyap Seperti PDA?.txt
- /Premium Mobile Internet?.txt
- /The Gaps in Indonesian Internet.txt
- /iPhone & Telkomsel_ Deal or No Deal?.txt

I’m imagining Mahout with cluster it into similarity groups. My guess is, it was clustered by keyword. I was using kmeans.

Anyway, obviousy we need to filter out stopwords. Mahout can read directly from SOLR/Lucene index. But I didn’t have much luck on it. Something to do with empty terms or whatever. Probably, feed my raw data to SOLR and then query it out to get text files will make a decent workaround.

That’s a wrap for today. Time for Pocket Legend!

 

 
 
 
 
 
 

How to use Lucene 3.4 with Mahout 0.5

As you may have been frustrated by, Mahout 0.5 was build with Lucene 3.1 dependencies. How on earth can we use Lucene 3.4 then? My SOLR is 3.4, I want to use its index to play with Mahout.

Fear not. Just download mahout 0.5, both source and binaries. Extract them, it will reside on the same folder i.e: mahout-distribution-0.5. Now, open up that pom.xml. Find lucene and replace 3.1.0 with 3.4.0. I reckon there are only 4 of them. The do mvn install. You may want to skip tests with: mvn -DskipTests=true install.

Once done, do: export MAHOUT_CORE=1

Run mahout from mahout-distribution-0.5/bin folder.

I don’t get index incompatibility anymore. But, I keep getting not enough term vector on document. Even I’ve set the schema.xml dan reindex my docs.

Will write more once I pass it.

 
 
 
 
 
 

11 Things I want in Japan

When Hiro asked me what I would particularly see in Japan, I became spaced out. It turned out that I don’t really have the list. But when think about it all over again, I do have a short list[1].

  1. Hatsune Miku, either having her figure or ultimately watching her live concert.
  2. Dolfie, see one, touch it. Owning it can wait until I have moved into my own apartment (and permission from my wife)
  3. Akihabara. It’s a common thing in everyone’s list I believe. Probably visiting a maid cafe would be nice.
  4. Comiket. Seeing a field of figures would be awesome. The crowd looks scary tho.
  5. Tokyo Game Show. Same as above.
  6. My own figures collection. I believe I have a different taste with@dannychoo. Not on the dolfie part tho :D . I have been wanting a Macross figure and other figures I saw in some dannychoo’s pics.
  7. See cosplayer at Yoyogi Park.
  8. Tanabata festival. Firework show. I want to wear yukata someday.
  9. Taking photos of lots sailorfuku school girls.
  10. Talking of sailorfuku, I love Scandal girlband. I want to have their merchandise.
  11. Dir En Grey concert? I want to scream in Dozing Green song.
More will come.
Yeah, I am in Japan right now. It still feels unreal, even after almost a week. I keep saying I am living in a dream to my wife. Wife said it’s jetlag effect.
The tag of this blog finally came true.
Yeayyyyyyyyyyyyyyyyyyyyyyyyyyy! Aaaaaaaaaaaa! Saiko desuuuuuuuuuuuuuuuuuuuu! Yeaaaayyyyyyyyyyy! *rolling on the floor, bursting happiness tears*
Footnote:
[1] This is pretty common to me. My brain spins slowly. And it tends work best when writing when I can took more time thinking. This results in me being unspontaneous.

 
 
 
 
 
 

bacula-fd authentication failed

So, been trying to setup two-tier bacula. Stuck on cannot connect to client.

To grab more clues, run this line on bacula-fd machine:

sudo /usr/sbin/bacula-fd -f -d100 -c /etc/bacula/bacula-fd.conf

Then do bconsole dance on bacula-dir machine. Use “status” command to test connection to client. I you see cram-md5 authentication failed in bacula-fd output then you have the same problem as I did. Otherwise, check your connection between bacula-dir and nacula-fd

Here’s the solution:

in bacula-fd.conf:

Director {
  Name = bacula-director
  Password = "remote-fd-passwd"
}

“Name” should be your bacula-dir Name. You can found this in bacula-dir.conf. See below:

Director {                            # define myself
  Name = bacula-director
  DIRport = 9101                # where we listen for UA connections
  QueryFile = "/etc/bacula/scripts/query.sql"
  WorkingDirectory = "/var/lib/bacula"
  PidDirectory = "/var/run/bacula"
  Maximum Concurrent Jobs = 1
  Password = "blahblahblah"         # Console password
  Messages = Daemon
  DirAddress = 127.0.0.1
}

Then the password part on bacula-fd.conf should be the same with your client definition in bacula-dir.conf. eg:

Client {
  Name = remote-fd
  Address = remote.fd.ip
  FDPort = 9102
  Catalog = MyCatalog
  Password = "remote-fd-passwd"          # password for FileDaemon
  File Retention = 30 days            # 30 days
  Job Retention = 6 months            # six months
  AutoPrune = yes                     # Prune expired Jobs/Files
}

Don’t forget to restart bacula-dir and bacula-fd after modifying conf files. Good luck!

 
 
 
 
 
 

Bacula Backup Management

So, been evaluating backup management solutions. Simple shell script won’t do good since I want auto-rotation, better scheduling and incremental backup support (storage friendly). Open source solution is a no-brainer priority. So, I’ taking bacula from bacula.org for a spin for a few days to understand how it works. So far so good. It has good scheduler with better-than-cron syntax, eg: 1st mon at 23:05 to schedule a backup on first monday of a month at 23:05. Neat eh? Installing Bacula in Ubuntu is a pretty straight forward process. There’s a fatal misconfiguration tho. It’s known and simple to fix.

The definition of the catalog Mycatalog contains a line starting with ‘ dbname = “bacula;”‘. The semicolon inside the quotes should follow the quotes, so should start with ‘ dbname = “bacula” ;’

Another tip, Pool resources by default are not enabling auto-volume naming. This is pretty annoying for a newbie. And it is way better to have it enabled by default to make it work out-of-the-box. To this, add label format option into your Pool resource definition. Something like this:

Pool {
  Name = File
  Pool Type = Backup
  Volume Use Duration = 23h
  LabelFormat = "VolFile-${Year}-${Month:p/2/0/r}-${Day:p/2/0/r}"
}

It will automagically creating proper Pool Volume when job runs, eg: Vol-2011-12-02.

You can use bat GUI to list your jobs and volumes. To restore files, see my tips here.

PS:

When you changed bacula-sd.conf, aside from restarting bacula-sd service do restart bacula-director service as well.

 
 
 
 
 
 

On the obvious

Why would google put Youtube sidebar (pop up) on Google+?

Because it is logical. Youtube has many great content. I love watching CN Blue video on it all the times.

Why does it feel awkward?

Awkward? How? You mean the pop up window? Because making the Youtube sidebar reside in G+ page would take space. It’s logical.

Will other button from Google product catalog follow suit?

Obviously. Perhaps. Aside from Youtube? Blogger?

Are you sure?

What? Why looking at me like that? Do I look like Vic?

 
 
 
 
 
 

Javascript is the new cool

Still on NLP. You’re read about UIMA and the Stanford parser (typed dependency) the other day. I’ve been wondering if there is an online service provider for Stanford parser. Lo and behold, there is. Although it is better to spend some cash and run my own Standford parser, this service should be suffice to test my idea. You can find it here, along with JSONP API to access it.

More resource on javascript and NPL, there are some on github. There are some Entity Extractors as well. And it concludes that some extractor simply cannot get away from using a dictionary. Maybe I will end up with one.

 
 
 
 
 
 

At the end of the tide

I holding “Naked Conversation”, reading trough one of its chapter.Apparently, blog projects for big companies had been started since 2003. I was still in college, getting myself familiar with Delphi. Eight years later, blog has gone mainstream. Even some may say: outdated.

That made me think, I and many of us have been trapped at the end of the tides for too long. WE never see the small ripple. But we always ended up being washed away by the resulting tide. That is probably the “perks” of staying in developing country. To make it even worse, the internet that has been helping us cutting the gap has also keeping us away from the ripple. Small changes, controversial innovation that may be the next mainstream.

Needless to say, given those ripples put up upon our eyes, determining the next mainstream will need eagle hunch. Still, the years of gap feels unfair. Those years may have been spent on learning and honing what matters most.

All we can do is cutting more gap, catch up faster. And hopefully we will land on the same plateu. Hear what everyone hear, see what everyone see.

Sounds like american dream eh?

 
 
 
 
 
 

on Cloning Siri, understanding the query

Well, IUMA is interesting. I haven’t able to make the Feature extraction work. However I got the gist that it’s working similar to NLTK with additional benefit: we can construct/pipeline several analysis by configuring an XML. This is almost as sweet as SOLR config.

Today, I’ve just found another approach on understanding user query. I thought it will help alot if we can determine the Subject, Predicate and Object of a query. We do\n’t need to understand the whole sentence but we do need to extract the essence of the query. What should our clone do if user says: how is the weather? where is bandung? do I have any meeting today?

Fortunately there are free implementation of typed dependency. If you want to know more about typed dependency, just google it. I will only give you an example of it. Given the query “how is the weather in jakarta”, typed dependency analysis will give us:

advmod(is-2, how-1)
det(weather-4, the-3)
nsubj(is-2, weather-4)

From this output, we can use the availability of subject or object to determine the essence of a query. Example above show us, it probably, weather is the essence of the query. You can test more typed dependency here. Below are some more examples:

Do I have meeting today
aux(have-3, do-1)
nsubj(have-3, I-2)
dobj(have-3, meeting-4)
tmod(have-3, today-5)
call John
amod(John-2, call-1)
make appointment with John on 3
dobj(make-1, appointment-2)
prep_with(make-1, John-4)
prep_on(John-4, 3-6)
texts John, send me detail
prep_text(send-4, John-2)
nsubj(detail-6, me-5)
ccomp(send-4, detail-6)

From above example, it is possible for us to choose a pattern as a trigger for a datasource query. However, it will not always adequate. Some question may be hard to understand, still. As is it still too vague, such as: how do I get home. To understand this, we need to be aware that “home” is a destination/location. This should trigger some sort of map datasource.

I have been imagining the clone as a pluggable framework. The main function of the host program is to provide as many analysis as it can, via plugins. And then decide which datasource plugin to trigger. Typed dependency should be one plugin, feature extraction should be another plugin.

Hmm, interesting.

PS:

There are more dependency parsers I still need to check.

 
 
 
 
 
 

on Cloning Siri

Ya ya ya, it’s a novel goal. Nevertheless, it’s an interesting journey to take.

To make our clone clever, it must be smart enough to understand any general query. “Who is Obama?”. “Where is Taj Mahal?”. To answer this, we can simply forward the query to Wolfram Alpha. With a simple trick, we can also answer a floating question such as: “How’s the weather tomorrow?”. How? Simply add current geolocation to the quesry then pass along to Wolfram Alpha, eg:”How’s the weather tomorrow jakarta”. Don’t worry, Wolfram Alpha will understand what you mean.

Now, the hard part. We need to teach our Siri clone about ourself. I wish Wolfram Alpha is open enough that we can add new information into its database. Unfortunately it’s not open enough. Unless for enterprise user. Now, a solution for an information/data mining is inevitable.

NLTK on Python is a good candidate to solve the problem. I am depicting a sentence got tagged. We, then, extract the Subject, Verb and Object and pass it along to appropriate data source provider. A question such as “do I have a meeting tomorrow?” should should trigger Calendar datasource. A datasource will be an addon which register its trigger in Verb,and check other Tag type availability within a sentence.

Another solution may come from Apache UIMA project. I am looking at its Configurable Feature Extractor addon. It is capable of tagging and identifying entities. Compared to simple pattern matching in our first solution, this second alternative has more metadata to match against. Further, we can combine it with SOLR to harness its search engine power. Boosting, synonim, stopword and what not.

Do you have something else in mind? I am being a bit practical here because I can’t comprehend much math :(

 
 
 
 
 
 

jumpa lagi

Uda lama banget gak nge-blog...ada blog baru yang mencoba merangkum tulisan-tulisan serius di http://ayokesini.com , mesti banyak belajar cara mengembangkan dan berharap ada teman-teman yang mau menjadi kontributor di blog tersebut untuk memperbanyak komunitas. Selain kesibukan yang lain untuk mengumpulkan berlian:D. Selama gak nge-blog disini ada beberapa hal istimewa yang tidak tertulis disini:p. pertama ultah gw yang ke 27 tahun tanggal 31 agustus lalu, trus mudik ma lebaran yang lalu, bertemu dengan ikei-ponakan lucu yang bikin gw jatuh cinta pada pandangan pertama dan tentunya kelucuan ponakan-ponakan gw yang mulai beranjak abegeh...ah.....

Gak banyak cerita yang asik karena tahun ini, gw ngerasa kurang maksimal, meskipun harus banyak bersyukur juga. maunya upload foto, tapi bentar ah...kabel nya kok raib, doh....

 
 
 
 
 
 

si nikei


Aha, sesuai janji gw yang mo pamer ponakan,,, niiiyy potonya....
namanya nikeisha, entahlah trend sekarang namanya keisha, keysha, sheza, hihihi...gak papa ya biar gaul ya jeng nikei:p
tapi gw lebih suka manggil ponakan gw yang ndut dan menggemaskan ini nikei...kayak nama
bursa nikkei jepang itu lhoooo, hekekeke
duh gak sabar pengen mudik, mau uweluwel bayi lucu iniyy....love u nikei, tunggu aunty mudik yaaaa!!, mwah!

 
 
 
 
 
 

tokyo banana


Masih ingat dengan yakitate japan? ya...ya anime yang ceritanya tentang masak-masak itu, salah satunya kan ada roti melon, penasaran banget sama rasanya, eh bulan lalu pas liburan dapet deh a gift from tokyo, roti pisang, tapi beda banget sama roti pisang disini....tokyo banana ini ada fla-nya dan bentuknya lucu, dengan penampilan memikat....haiyah kayak apaan aja ya....ni tampilannya.
Salut sama produk jajanan dari japan ini, karena mereka emang bikin kayak kado imut gitu, kotaknya, kertas pembungkusnya, plus tampilan makanannya. Kuenya manis tapi enggak eneg, sayang makannya, tapi karena enak, hingga detik ini cuma tinggal dua biji, nyisain buat...buat...buat gw lah:p...tetep...kemaren siy masih sempet icip-icip yang lain tapi gak difoto, lupa:p. Teman yang asik buat makan roti pisang ala tokyo banana ini ya teh atau kopi atau minuman segar, di sore hari....sambil maen zuma:p

gw posting di bulan agustus ini, gak terasa uda berapa taon nge-blog yah? dengan ragam cerita dan fluktuasi mood, hehehehe. Eh hari ini ada yang ulang tahun, adek gw...met ultah ya mbak nuning, dan dapet kado fantastis lagi: a baby, heheheh.....iyah gw punya ponakan!!namanya nikeisha anindya fadilla...foto ponakan gw yang mengingatkan gw pada jual beli saham, pasar uang nikei itu belum dikirim...ntar..ntar.....pasti gw akan pamer:p

 
 
 
 
 
 

liburan kemaren


Seperti yang ditulis sebelumnya, kalau bulan Juli itu bulan liburan, jadi kebetulan pas swamih lagi tugas ke negeri seberang, maka diputusken abis konferensi selama seminggu, saya juga butuh liburan plus honeymoon...hahaha...ni salah satu hasil jepretan sambil nunggu wowok (temen gw), di takashimaya, setelah ikut ngublek festival mainan- di hari terakhir greatsale, selonjoran dulu, poto-poto dulu juga sama temen-teman kantornya toni...foto lainnya belum bisa diupload, nunggu kabel yang masih dibawa toni, sebenernya bisa si mmc lewat leptop, lha kok leptopnya juga lagi banyak maunya, jadi ini nyomot foto punya swamih gw dulu...abis liburan? ngapain ya...ehm....di festival maenan itu toni dapet figurine, aku dapet juga tapi belum dipoto, ada tas ada kerajinan tangan abis pada murah siy, only SGD 1-SGD 10, lumayan buat nyesekin kos:p. Toni siy masih disana, sementara gw balik dulu karena masih ada tanggungan, duh padahal masih kangen...

 
 
 
 
 
 

back!!

ini tulisan awal, untuk menandai bahwa mungkin saya sudah memiliki semangat lagi untuk 'tampil', hahaha...oke kemaren sibuk persiapan ini itu, trus dilanjut liburan, yang fotonya belum bisa diupload, karena kabel kamera dibawa ma swamih, lewat leptop-leptopnya gak nangkep...susah amat siyyy...

ya uda gitu dulu, kasian amat blog ini, dicuekin sebulanan lebih....nanti disambung lagi deh ama tulisan-tulisan lain, semoga semangat saya tetap terjaga:p

 
 
 
 
 
 

pepino's

DSC03266
Kategorinya buah-buahan, deket dengan buah naga. saya ngeliatnya di superindo, tebet.
ada yang gak tau siy, gimana rasanya pepino? apa ini termasuk buah-buahan atau sayur?
bentuknya kayak terung siy abisnya.....


Powered by ScribeFire.

 
 
 
 
 
 

seaworld

DSC03202
sebuah iming-iming buat adek gw sebenernya, kalo ngerjain skripsi niat, lulus sesuai target...gw traktir ke dufan
eh, ternyata she did it, hehehe, akhirnya awal mei (gw lupa persis tanggalnya), gw, swamih
plus ina (adek gw inih), pergilah ke ancol. ternyata ina gak mau ke dufan, maunya ke seaworld, mo liat ikan...
ya sud, ke seaworldlah kita dari tebet ke kampung melayu, trus naik transjakarta (untung ada transjakarta!:p),
yang langsung kek ancol. dengan sedikit menebak-nebak gimana cara masuk ancol dari halte busway, akhirnya sampelah
ke seaworld, naik kendaraan gratisan dari pintu masuk ke seaworld yang lumayan jauh (meskipun jauh, kita jalan lho waktu pulangnya).

seinget gw, terakhir ke seaworld pas smp:)), makanya seneng banget di usia yang kian bertambah ini gw kesana...maen, liat ikan,penyu, teriak-teriak waktu kasi acara kasi makan ikan, hahaha puwas...puwas!!, abis muter-muter dengan segala macem kenorakan kita keluar, beli boneka ikan pari sama gantungan kunci hiu dan makan di sate senayan, enak!!!

kita pulang siang dengan napas ngos-ngosan karena panas...mana pake salah naik bis transjakarta yang ke cililitan lage..
eh transjakarta,lain kali semua bis yang dalam satu jalur tapi tujuan berbeda kasi kertas petunjuk ditempel di depan muka bis dunk, masak nyediain kertas aja gak mampu sih?!

eh, sudah...sudah..sekian dulu ceritanya..


Powered by ScribeFire.

 
 
 
 
 
 

kawos yang bikin senewen

DSC03134
uh, akhirnya nongol juga niy paketan, pake acara mesti dilaporken dulu baru dapet jadi begitulah saking pengennya tapi gak muncul-muncul, jadinya inilah kawos yang bikin senewen, hihihihi
lucu yaaaaa paketnya, langsung dipajang....


*selamat hari buruh dan selamat hari pendidikan*


Powered by ScribeFire.

 
 
 
 
 
 

Sms SELEB

Hihihihihihi, ini postingan norak sayah....
boleh dunk, sabtu-sabtu pas lagi baca dan leyeh-leyeh ada sms masuk
dasarnya hape kurang terupdate, saya di sms sama nomer 0811-sekian-sekian...
isinya? rahasia, hiaiaiaiiaiaia
pengirimnya: seleb blog dunk

ya, norak boleh ya...saya bilang saya tersanjung karena di sms belio
mungkin yang lain pada biasah, kalo buat saya ini semacam 'kenorakan'
berhubungan dengan selebritis, lha wong saya inih belum ikut-ikut masuk ke komunitas
masih sebagai 'pengamat' (sok-sok-an) luar, hihihihihi....

seneng deh...terima kasih ya om seleb
:D

*norak di-sms seleb? biarin...sekali-sekali inih, namanya juga orang biasah*


Powered by ScribeFire.

 
 
 
 
 
 

Selamat Ulangtahun, hon!

Exia under construction
Selamat Ulang Tahun, haniyyyy:*
Semoga selalu diberi kesehatan dan diridhoi sama Yang Di Atas, apa yang jadi cita-cita dan mimpimu...
Nah, itu kan fotonya lagi maen gundam, sekarang maenannya lain lagi, tuh kado-nya sekarang jadi maenan baru
revvi
Tapi katanya mo minta kado lagii???, plisss deh hon....kamu aja gantian kasi kado aku, hihihihi...
all the best buat kamu deh hon, plus ingat-ingat sudah umur 27, jadi mesti siap mental punya b*by, huauauauau
love u hon, selamat menikmati kadonya yah, hehehe


Powered by ScribeFire.

 
 
 
 
 
 

new hair cut

new hair cut
hiaiaiaiaiaiaia, new hair cut, anyone?;)
eh, ini gw dengan bob asimetris ala rudy hadisuwarno (salon baru di kisaran tebet, depan cheesecake factory)
hahaha, gak keliatan yang asimetrisnya?apakah menurut anda gw seperti dora the explorer?:D


*terus berjuang...hiks*


Powered by ScribeFire.