Machine Generated Books - Changing Face of Scientific Publications

Researchers in any subject will have a dream; given a bunch of books provide a summary of the research. It saves time; may not gaining much knowledge. Still, this will reduce, the volume of the haystack to find the needle. Survey of literature will never be as simple as that. Is that a daydream? Not really!!! Recent advances in Machine Learning (ML) and Artificial Intelligence (AI) is proving that this is a possibility. Recent news from Springer sheds light to a new direction in Machine Assisted book writing.

Read More

Notes on the Data2Vis - Deep Learning Paper

Data visualization plays an essential role in Machine Learning/Data Science. Starting from Exploratory Data Analysis (EDA) to visualizing model metrics and production results visualization is required. Have we ever thought about using Machine Learning to generate visualization from the data? Maybe sometimes! If you ever used Microsoft Excel 2016 or Microsoft PowerBI or Tableau you might have sensed some sort of a reactive intelligence in recommending or enabling the visualization type. However, the paper “Data2Vis: Automatic Generation of Data Visualizations Using Sequence to Sequence Recurrent Neural Networks” by Victor Dibia and Çağatay Demiralp from IBM research narrates better story [1]. The paper reports result from a Deep Learning experiment, which tries to build a Vega-Lite visualization recommendation from a given data.

Read More

Reading notes : mondrian forests

Breiman’s Random Forest (RF)[1] algorithm remains as one of the most popular algorithms in Machine Learning. It’s competitive performance, accuracy, scalability and robustness in real-world classification and regression problems made it very popular. There are lots of variant implementation of RF is now available such as Decision Forest, Decision Jungle [2] etc.. Some of the current Machine Learning use cases demands effective online algorithms, which can bring the power of ensemble learning algorithms such as RF. Many researchers proposed online learning system by combining RF. One of the most recent advances in this area is Mondrian Forests[3][4]. The seminal paper was authored by Lakshminarayan et.all. They combine the power of RF with properties of Mondrian Process [5]. The key attraction to this paper is clean reproducibility and availability of a Python implementation [6] by the authors.

Read More

Finding the ‘k’ in K-Means Clustering

Determining the ideal number of cluster in a given data-set is one of the key task in data clustering. Typically, people may prefer to go by hunch based numbers assigned to clusters. Given a value to ‘k’ K-Means will produce ‘k’ number of clusters, such that the total intra-cluster variation (total within-cluster variation/total within-cluster sum of square) is minimized.

Read More

Downloading the Citi Bike Data with Python

Recently I was thinking about creating a visualization project with R and [Pyxley] (https://github.com/stitchfix/pyxley). I was looking for some cool data and selected the [Citi Bike Data] (https://www.citibikenyc.com/system-data) . The data is available as 24 zip files. I thought of writing a script to fetch the data. Here is the script which I wrote.

Read More

What is new in Apache Spark

Recent vesrion of Apache Spark (1.4) comes with lots of cool features, specifically in the Machine Learning capability. It has a parameter tuning, pipeline, evaluation, R and Predictive Model Markup Language (PMML) support. It seems that Spark will be gaining lots of traction due to extensive support to Data Science.

Read More

Life update

It is quite long time I updated something in this blog. The reason is Feeflux ended service and I was too lazy to manage a server by my own. Today I realized that Freeflux is till operational. Let me re-start with one big update in life :

Read More

Quick MySQL to CouchDB migration with Python

I used to play a lot with text databases. Today I was just thinking of migrating some of my data collection to CouchDB. I used the following script to convert one of my DB table (Almost all fields are TEXT) to a CouchDB collection.

Read More

HBase Administration Cookbook by Yifeng Jiang - Review

HBase Administration Cookbook Packt publishers has announced a new book HBase Administration Cookbook by Yifeng Jiang. I think this is the first Big-Data book from Packt. The name suggest that the book HBase Administration Cookbook by Yifeng Jiang is essentially for people who is playing with HBase and would like to deep dive into HBase administration essentials. The book discusses various essential topics in HBase administration starting from installation to performance tuning. The book targets big-data administration professionals primarily. The author discusses the art and science of HBase administration in nine systematically arranged chapters. The initial chapter deals with installation of Habse in Amazone EC2 instance and discusses various setting . The chapter ends with High Availability master settings. The second chapter deals with migrating data to Habse. There is a detailed discussion on how to migrate MySQL data to HBase. This may be interesting for people who plans to migrate existing data to HBase. The third chapter mainly deals with HBase administration tools and over view of the tools. Data backup and restoration is one of the key concept when we discuss about data management. Fourth chapter of this book deals with data backup, restoration and replication in HBase. The fifth chapter deals with HBase cluster monitoring and diagnosis. The chapter comes with beautiful scripts for reporting cluster status. Security aspects of Habse is being discussed in chapter six. Security essentials for HBase and Hadoop with Kerberos is also discussed with detailed examples. Necessary troubleshooting aspects for HBase administration is discussed in chapter seven. Performance tuning and advanced configuration etc are discussed in chapter eight and nine.

Read More

Hadoop Database access experiment

Over a couple of weeks I was reading and practicing the book “Hadoop in Action”. After getting some insight on Hadoop and Map Reduce I worked out a couple of examples from the book and some example problems which I created too. Then I was discussing about features of Hadoop with some of my colleagues over a cup of tea. One of the guy asked a question regarding accessing database from Hadoop and process the data. I saw some discussions related to Hadoop and database access some where in the internet. Finally I digged-out the article “Database Access with Hadoop” for Cloudera blog. After reading the same I decided to work with a sample problem.

Read More

Experiments with NoSQL databases- CouchDB

I started reading about NoSQL databases for a long time. Occasionally I used some NoSQL databases like Apache CouchDB and Apache Cassandra for some analytics purpose(Some minor projects) with Python. This time I just thought why can’t try something on Java + NoSQL. I created a small for project to play with. The idea of this project is: store Twitter search result to CouchDB. I used the following Operating System, Programming Languages and Libraries in this project.

Read More

Lucene Index Writer API changes from 2.x to 3.x

The 3.x version of Lucene introduces lots of changes in its API. In 2.x we used IndexWriter API like this:

         Directory dir = FSDirectory.open(new File(indexDir));
        writer = new IndexWriter(dir,new StandardAnalyzer(Version.LUCENE_30),true,IndexWriter.MaxFieldLength.UNLIMITED);
Read More

Taming Text - Review

We are living in the era of Information Revolution. Everyday wast amount of information is being created and disseminated over World Wide Web(WWW). Even though each piece of information published in the web is useful in some way; we may require to identify and extract relevant/useful information.Such kind of information extraction includes identifying Person Names, Organization Names etc.. ,finding category of a text, identifying sentiment of a tweet etc … Processing large amount text data from web is a challenging task, because there is an information overflow. As more information appears there is a demand for smart and intelligent processing and text data. The very field of text analytics has been attracted attention of developers around the glob. Many practical as well as theoretical books has been published on the topic.

Read More

Seven years of 'humanity to others' through a Free Operating System

On 20th Oct. 2004 Canonical Ltd announced the first release of worlds most popular and sexy operating system Ubuntu. The first release was code named as “Warty Warthog”. From their onwards Ubuntu was a grant success with more than 20 million users across the glob. A totally free and open source operating system attains much popularity than any other similar proprietary operating systems with in seven years. This is a remarkable achievement by the humanity behind Ubuntu operating system. People who dedicated their free times to write code, test and use the operating system and Cannonical Ltd. made it possible. And they continues the journey to serve the humanity with better Operating System that satisfies the computing needs of “Common Man”. Kudos to the entire team behind Ubuntu !!!

Read More

Mahout in Action - Review

Apache Mahout is an Open Source scalable Machine Learning library in Java. It is designed to handle large data set. More than a dozen of Machine Learning and Data Mining algorithms are available in Mahout. All those algorithms are implemented on top of Apache Hadoop. The framework is distributed under a commercially friendly Apache License. It helps researchers and corporate to build scalable and practical products based on Machine Learning and Data Mining Principles. A wide range of big companies as well as startups are using Apache Mahout in their products.

Read More

Using Yahoo! Term Extractor web service with Python

Yesterday I was listening Van Lindberg’s talk in PyCon US 2011 about Patent Mining. In his talk he mentioned about the Yahoo! Term Extractor Web Service. Before some times I heard that the service is not available now. Again I checked the web site and found that it is working now. I just played with the web service using a simple python script. Soon after seeing that it is working fine i created a dirty Python API for the web service. I am sharing code and documentation here Code: https://bitbucket.org/jaganadhg/yahootermextract/overview Sample : https://bitbucket.org/jaganadhg/yahootermextract/wiki/Home

Read More

FOSS Workshop at PSR Engineering College Sivakasi

Ane one day workshop on Free and Open Source Software has been conducted at PSR Engineering College, Sevalpatti, Sivakasi,Tamilnadu. I was invited to give an introduction to Python in the workshop. Mr. Chidambaresan an alumni of the PSR Engineering college picked me for the workshop. I reached Sattur by morning 4.30 and Chidambaresan picked me to a lodge for refreshments. By 7.30 A.M Chidambaresan and his friend arrived at lodge and we started to the college. On the way we picked Suthan HOD, MCA, Sivanthi Engineering College from Kovilpatti. After taking breakfast from Kovilpatti we headed towards the college. During the journy we discussed about FOSS and Engineering Syllabus, FOSS, ILUGC, and ILUGCBE and bit of politics too ;-) . We reached the college by 09.30 AM and we met the HOD, faculty members and Principal of the College. We had a nice discussion about students and their learning mentality, the necessity of motivating students to learn FOSS and contribute to FOSS. The college is located in a very nice and ambient village called Sevalpatti. Most of the students are from nearby villages or towns.

Read More

CSV to CouchDB data importing, a Python hack

Last month I was playing with Apache CouchDB. Just some introductory stuff, map reduce etc… Soon I received some Linguistic data in .cvs format, as part of the project which I was managing. There was a need to analyze it. Usually we used MySQL/Spreadsheets to store and analyze the data. Suddenly I thought why can’t I do it with CouchDB ?? . There was no direct option for import CSV data to CouchDB. I searched in the web and ended with a hint. Manilal a friend of mine also pointed to the same hint http://www.apacheserver.net/Load-CSV-file-into-couchdb-at1056996.htm .

Read More

Book Review - Python 2.6 Text Processing Beginner's Guide by Jeff McNei

Book Cover Python 2.6 Text Processing Beginner’s Guide by Jeff McNeil is one of the latest books by Packt Publishers. I received the review copy of this book before one and half months or so. Due to busy schedule I was not able to finish the review process. Finally I got enough time to review it. The book gives good insight to on different technical aspects and use of Python standards and third party libraries for text processing. It is filled with lots of examples and practical projects. I think I might have took almost one year to gather knowledge in the topic discussed in this book, when I started my career in Natural Language Processing domain. I am giving a bit detailed review on the book here.

Read More

My Village comes to Openstreetmap

My village Kamukumchury(Belongs to Kollam District Kerala State) too comes in to Openstreetmap. Last week I payed visit to my village . I bought GPS device with me and mapped some parts of my village. Also I traced some roads and marked it. Some roads which passes through my village was mere straight lines . Based on GPS traces I made correction too .

Read More

Book Review- Python Text Processing with NLTK 2.0 Cookbook by Jacob Perkins

Python Text Processing with NLTK 2.0 Cookbook by Jacob Perkins is one of the latest books published by Packt in the Open Source series. The book is meant for people who started learning and practicing the Natural Language Tool Kit(NLTK).NLTK is an Open Source Python library to learn practice and implement Natural Language Processing techniques. The software is licensed under the Apache Software license. It is one of the most widely recommended tool kit for beginners in NLP to make their hands dirty. The toolkit is part of syllabus for many institutions around the globe where Natural Language Processing/ Computational Linguistics courses are offered. Perkins book work is the second book published on the toolkit NLTK. The first book is written by core developers of NLTK; Steven Bird, Ewan Klein, and Edward Loper, published by O’rielly. Steven et.all’s book is a comprehensive introduction to the toolkit with basic Python lessons. People who has gone through the book may definitely like the new book by Perkin. The book is must have desktop reference for students, professionals, and faculty members interested in the area of NLP, Computational Linguistics and NLTK. Perkins handles the topic in an elegant way. Most of the people who searched for some NLTK tips might have gone through the author’s blog. He maintains same simplicity and explanation style and hands-om approach throughout the book; which makes the reader to digest the topic with much easiness. The book is a collection of practical and working recipes related to NLTK.

Read More

Book Review- MySQL for Python (Packt) by Albert Lukaszewski

MySQL for Python by Albert Lukaszewski is a must have for all Python programmers who is working with MySQL database. It provides a comprehensive overview of MySQL Python programming. There was a lack of such a good book on MySQL + Python. Developers and newbies who is interested and working in MySQL and Python used to refer some blog posts as reference resource for Python MySQL programming. Thankfully we have a new comprehensive book on MySQL Python programming. If you are a Python programmer and novice in MySQL definitely this book will help you to get good knowledge in MySQL too. I am giving 4.5 out of 5 stars for this book.

Read More

Second Meeting of ILUGCBE - A report

The second meeting of ILUGCBE held at Amritha University, Ettimadai on 14th Nov 2010 . The meeting started by 4.30 PM . The meeting started by a welcome talk by Prof.Adhinarayanan of Amritha University. Kennethe Gonsalves gave a brief introduction about ILUGCEB. After the introduction Jaganadh G gave a talk on “Will FOSS help me to get a job?” . The talk was followed by a small discussion. Prof. Adhinarayanan explained how they are training the students to work on FOSS. Some suggestions were pop up from the audience regarding the same. The next talk was given by Kenneth Gonsalves about Python programming language. We are surprised to see that a couple of Amritha students are good in Python. The next talk was by Sundaram Ramachandran . He started with an intro to FOSS philosophy. Then he switched to the FOSS based development activities in his organization. Both talks were followed by some hot discussions. The meeting ended by 6.15 PM.

Read More

BarCamp Kerala 9 a report

BarCamp Kerala 9 held at Amrithapuri Campus kollam on 14 November 2009. This is the first time I am appearing in a BarCamp. I reached the venue around 12.15 PM. I came with my fried Rajeev Raj. When I reached Hiran was explaining about Open Movie and the Sintel movie show was in progress. There was a discussion on open movie and the ‘Chamba’ project. When they mentioned about ‘blender’ lots of tweets appeared comparing Maya with Blender. Praseed Pai , Hiran and another guy(I forgot his name) explained about Blender and replied for some question regarding Maya vs Blender. Then everybody went for lunch. The post lunch session started with a talk on Open Street map by Pavithran. Lots of questions came from the audience about the data and its licensing. Even people tried to compare Open Stree tMap with Google maps. In the discussion time I too tried answer questions regarding Google and Open Street Map comparison. The next talk was on PhoneGap an application for building web applications for mobile platform. A group of hackers took a session on Wifi and they did some demo also. It was an informative one. Some people tweeted that it is 12th class lessons. The next session was handled by me on “Practical Machine Learning” . When the next talk was in progress I left the campus. Because I was in a hurry to reach home and return to Coimbatore.

Read More

BarCamp Kerala9

The 9th edition of BarCamp Kerala (BCK9), will be held at Amrita Vishwa Vidyapeetham, Amritapuri Campus at Karunagapally on 14th November 2010.

Read More

Speech Recognition with Python

Recently I saw a talk in listed in the CMU Sphinx site about Speech recognition with Python and Pocket Sphinx. I have downloaded the video and replicated the experiments. It was successful . To play with Python and speech recognition we have to install the following packages. python-pocketsphinx pocketsphinx-lm-wsj pocketsphinx-hmm-wsj

Read More

Laughlin comes soon Fedora14

Fedora 14 code named as “Laughlin” is coming soon !!! Fedora 14 Laughlin released in 17 days. Fedora 14 Laughlin released in 17 days. Free and Open SourceFree SoftwareGNU/LinuxFedora

Read More

PyLucene in Action - Part I

PyLucene is a Python wrapper aroung the Java Lucene. The goal of this tool is use Lucene’s text indexing and searching capabilities from Python. It is compatible with the latest version of Java Lucene. PyLucene is embeds a Java VM with Lucene into Python process. More details on PyLucene can be found at http://lucene.apache.org/pylucene/.

Read More

സ്വതന്ത്ര മലയാളം കംപ്യൂട്ടിംഗ് പ്രാദേശികവത്കരണ ശിബിരം പാലക്കാട് 10,11 ജൂലൈ 2010

പാലക്കാടു് ജൂലൈ 8, 2010 സ്വതന്ത്ര മലയാളം കമ്പ്യൂട്ടിങ്ങിന്റെ നേതൃത്വത്തില്‍ സിക്സ്‌വെയര്‍ ടെക്ലോളജിസിന്റേയും പാലക്കാട് ലിബര്‍ സോഫ്റ്റ്‌വെയര്‍ യൂസേര്‍സ് സൊസൈറ്റിയുടേയും സ്വതന്ത്ര ജനാധിപത്യ സഖ്യത്തിന്റേയും സഹകരണത്തോടെ രണ്ടു് ദിവസത്തെ പ്രദേശികവത്കരണ ശിബിരം ബിഗ് ബസാര്‍ സ്കൂളില്‍ (വലിയങ്ങാടി സ്ക്കൂളില്‍) വച്ചു് ജൂലൈ 10, 11 (ശനി, ഞായര്‍) തിയ്യതികളില്‍ നടത്തുന്നു. സ്വതന്ത്ര സോഫ്റ്റ്‌വെയറുകള്‍ മലയാളത്തില്‍ ലഭ്യമാക്കാനുള്ള പ്രവര്‍ത്തനത്തില്‍ സാധാരണക്കാരെ പങ്കെടുപ്പിയ്ക്കാനും ആവശ്യമായ പരിശീലനം നല്‍കാനും രണ്ടു് ദിവസത്തെ ശിബിരം ലക്ഷ്യമിടുന്നു. ശിബിരം നടക്കുന്ന പള്ളിക്കൂടവും അതിനടുത്ത റോഡുകളും സ്വതന്ത്ര ഭൂപട സംരംഭമായ ഓപ്പണ്‍സ്ട്രീറ്റ്മാപ്പില്‍ ചേര്‍ക്കുവാനും ശിബിരം ലക്ഷ്യമിടുന്നു. സ്വതന്ത്ര സോഫ്റ്റ്‌വെയര്‍ അടിസ്ഥാനമാക്കി മലയാളഭാഷാ കമ്പ്യൂട്ടിങ്ങ് രംഗത്തു് പ്രവര്‍ത്തിക്കുന്ന സന്നദ്ധപ്രവര്‍ത്തകരുടെ കൂട്ടായ്മയാണു് സ്വതന്ത്ര മലയാളം കമ്പ്യൂട്ടിങ്ങ്. ശിബിരത്തില്‍ പങ്കെടുക്കാന്‍ നിബന്ധനകളൊന്നുമില്ല, മലയാളത്തില്‍ കമ്പ്യൂട്ടറുപയോഗിയ്ക്കാനും മലയാളം കമ്പ്യൂട്ടിങ്ങിന്റെ സാധ്യതകളേക്കുറിച്ചറിയാനും പ്രവര്‍ത്തനങ്ങളില്‍ പങ്കുചേരാനും താത്പര്യമുള്ള ആര്‍ക്കും പങ്കെടുക്കാം. പ്രവേശനം സൌജന്യമാണു്. പരിപാടിയില്‍ പങ്കെടുക്കുന്നവര്‍ താഴെ കൊടുത്ത വെബ്സൈറ്റില്‍ രെജിസ്റ്റര്‍ ചെയ്യുകയോ താഴെ കൊടുത്ത പ്രവര്‍ത്തകരെ വിളിച്ചറിയിയ്ക്കുകയോ ചെയ്യണം. കോഴിക്കോട്, പൂനെ, തിരുവനന്തപുരം, അങ്കമാലി, കൊച്ചി, കുറ്റിപ്പുറം തുടങ്ങി ആറിടങ്ങളില്‍ ഇതിനോടകം തന്നെ ശിബിരങ്ങള്‍ വിജയകരമായി പൂര്‍ത്തിയാക്കി. കമ്പ്യൂട്ടറില്‍ മലയാളം എങ്ങനെ ഉപയോഗിയ്ക്കാമെന്നതിന്റെ പരിശീലനവും ഇതിന്റെ സാങ്കേതിക വശങ്ങളുടെ വിശദീകരണവും മലയാളം കമ്പ്യൂട്ടിങ്ങിന്റെ പ്രധാന്യത്തെക്കുറിച്ചുള്ള ചര്‍ച്ചയും ആദ്യ ദിവസത്തെ പരിപാടിയിലുണ്ടു്. സോഫ്റ്റ്‌വെയറുകള്‍ മലയാളത്തില്‍ ലഭ്യമാക്കുന്നതിനുള്ള സംവിധാനങ്ങളും ചിട്ടവട്ടങ്ങളും പരിചയപ്പെടുത്തുന്നതിനോടൊപ്പം ചില സ്വതന്ത്ര സോഫ്റ്റ്‌വെയറുകളുടെ മലയാളം പരിഭാഷ കൂട്ടായി ചെയ്യാനും ലക്ഷ്യമിടുന്നു. കമ്പ്യൂട്ടറിന്റെ ഉപയോഗത്തിലെ ചുമര്‍ചിത്രങ്ങള്‍ (wallpapers), സ്ക്രീന്‍സേവറുകള്‍ തുടങ്ങി കലാപരമായ വിഷയങ്ങളുടെ സാംസ്കാരികമായ പ്രാദേശികവത്കരണത്തിന്റെ ആവശ്യകതയെക്കുറിച്ചും അതിന്റെ സാങ്കേതിക വശങ്ങളെക്കുറിച്ചും ചര്‍ച്ച നടക്കും. ഇരുമ്പനം വിഎച്ച്എസ്എസ് സ്കൂളിലെ വിദ്യാര്‍ത്ഥികള്‍ ടക്സ്‌പെയിന്റ് എന്ന ചിത്രം വരയ്ക്കാനുള്ള സോഫ്റ്റ്‌വെയറില്‍ കേരളത്തിലെ പൂക്കള്‍ ചേര്‍ത്തു് നേരത്തെ തന്നെ ഈ മേഖലയില്‍ മാതൃക കാട്ടിയിട്ടുണ്ടു് സ്വതന്ത്ര സോഫ്റ്റ്‌വെയര്‍ തത്വശാസ്ത്ര ലേഖനങ്ങളുടെ പരിഭാഷയും ഗുണനിലവാരം ഉറപ്പു് വരുത്തലുമായി രണ്ടാം ദിവസത്തെ പരിപാടി തുടങ്ങും. കെഡിഇ എന്ന സോഫ്റ്റ്‌വെയര്‍ ശേഖരത്തിലെ കളികളുടെ മലയാള പരിഭാഷയും രണ്ടാം ദിവസം തുടരും. സ്വതന്ത്ര മലയാളം കമ്പ്യൂട്ടിങ്ങിനെക്കുറിച്ചും ശിബിരത്തെക്കുറിച്ചുമുള്ള കൂടുതല്‍ വിവരങ്ങള്‍ക്കും ശിബിരത്തിനു് രെജിസ്റ്റര്‍ ചെയ്യാനും http://www.smc.org.in എന്ന വെബ്സൈറ്റ് സന്ദര്‍ശിയ്ക്കുകയോ താഴെ കൊടുത്ത നമ്പറുകളില്‍ ബന്ധപ്പെടുകയോ ചെയ്യുക.

Read More

What we are searching?

Recently I was discussing about some topics related to corpus and search. After having the discussion i revised some of the points and found some interesting facts related to people preference and search. I am sharing some of the thoughts for you here.

Read More

Installing PyLucene 3.x in GNU\Linux

Lucene

“Lucene is a high performance, scalable Information Retrieval (IR) library. It lets you add indexing and searching capabilities to your applications. Lucene is a mature, free, open-source project implemented in Java; it’s a member of the popular Apache Jakarta family of projects, licensed under the liberal Apache Software License.”

Read More

Redmine installation is easy now !!!

Redmine the Open Source Project Management tool installation is easy now. A Bitnami stack is available for Redmine now. You can download the stack from Bitnami site and install with in 10 minutes. It is great. I took around two days to install and configure it before. Todat I did it with 10 minutes .

Read More

Google launches multilingual dictionary service

Google launched a new language service - Google Dictionary. It is a multilingual bilingual dictionary. It provides access to English to 30 other world languages. Major Indian Languages are covered in the dictionary. English to any Language(L) and Language(L) to English search facility is there . If somebody searches a word it will show meaning in taget language , related phrases and web definitions. A sample search result for English <> Tamil is given below. I searched for ‘book’. Result

Read More

Historic paper on the origin and development of Indian Language Technologies

Recently I got an interesting research paper on Indian Language technology. Title of the paper is “A Journey from Indian Scripts Processing to Indian Language Processing “ by Dr.RMK Sinaha IIT Kanpur. The article appeared in ‘IEEE Annals of the History of Computing’ . The author is a pioneer and leading researcher in Indian Language Processing. Without his contribution Indian Language Technologies might not have been matured this much. I am not going to put any comment on his writing, because I am a humble disciple of Dr.RMK .

Read More

On social networking

I saw an interesting article on ‘social networking’ in the 6-9 June edition of ‘Engineering and Technology’ magazine. The article appeared in the ‘IT Internet’ coloum titled “social networking: the business case” . The article covers different aspects on social networking. It is a useful article for marketing executives and corporate IT managers. Business managers in customer relations and IT fields can gain lot of gyan from this article.

Read More

BioPuppy2.0 released

BioPuppy- “BioPuppy is an minimal Linux OS and electronic workbench for bio-informatics and computational biology. It has been designed to meet the needs of beginners, learners, students, staffs and Research scholars.”

Read More

Some thoughts on Tweeting

I am not a twitter user :-) :-( . But I found that twitter has an importent role in my professional area (Natural Language Processing). Lots of R&D activities are happening related to blogging and microbloging, like ‘sentiment analysis’ etc.. I think it is going to be the next generation marketing platform.

Read More

NLTK and Indian Language corpus processing Part - I

During my presentation in Indian Python conference some body asked about Indian Language corpus processing in NLTK. Some how I skipped the answer. Because I know that Indian Language corpus is there in NLTK. But I never tried to play with that. But after the conference I did some thing on that too. I am posting my experiments with results here. If it can be done in a better way please tell me so that I can improve.

Read More

Pycon India 2009 a report

Our team landed Bangalore on 25th night 9.15 for the first Indian Python Conference. Our colleague Mr.Sudharshan arranged accommodation for us in a men’s hostel near to MEI @ Bangalore(Thanks to Sudharshan and Subhash). The whether was so cool. We felt it something like reached from Sahara to Antarctica.

Read More

Wing Commercial IDE for Python.

As you know I am using free version of the freeflux.net CMS system. So they put some add with each blogpost. Most of my posts were on Python. So the freeflux.net people put add of an IDE for Python called “WingIDE”. It is not a free one. Purely commercial. But you can download a trial version from the site http://wingware.com/products?gclid=CMrp_OLQh50CFQEupAodPR1Lbw. They are calling it as “The Intelligent Development Environment for Python Developers”.

Read More

Thoughts on text mining

I was reading the book ‘Practical Text Mining with Perl’ by ‘Roger Bilisoly’, which is published by ‘Willy’. It is a nice book to learn text mining through the Perl programming language for beginners . So many practical examples are give in the text. Most of the examples are familiar to me, because I am using Perl and Python for so many years. Suddenly I thought that why cant I work out the examples in Python too!! Practical text mining with Python. I am not going to write a text book :-) . Just working out the examples in Perl and Python.

Read More

Meaning vs Interpretation the "cattle class" and other MWE.

Today(Sep. 18 2009) the word “cattle class” is the most famous compound or multi-word expression(MWE) in India. The word opened a lot of controversies in Indian Politics. I am not interested in the political discussion of the topic. Being Computational Linguists by profession I am interested on the unit “cattle class”.

Read More

Plotting wave form and spectrogram the pure Python way

In one of my earlier post we discussed how to plot spectrogram with ‘scikits audiolab’ and python. One of my friend asked me whether it is possible to do without ‘audiolab’. So I started exploring Python wave reading module and wrote another piece of code to plot spectrogram and waveform. In this program I reduced some dependency also. While using ‘audiolab’, ‘numpy’ and ‘struct’ were imported in the program. In this program only ‘pylab’ and ‘wave’ modules are imported. This program will plot both waveform and spectrogram in same window. Here is the code.

Read More

Hackett and Bankwell Issue 2 is available now

“Hackett and Bankwell is an educational comic/cartoon manual designed to teach the finer points of the GNU Linux platform using Ubuntu.” Hackett and Bankwell the linux comic issue 2 is available now from http://www.intarwebz.com/wp-content/uploads/2009/03/hackett_and_bankwell_issue_2.pdf This introduces CLI (Command Line Interface). Those who missed the Issue-1 please check it out from http://www.intarwebz.com/hackett-and-bankwell-1-free-pdf-ebook-version-11/

Read More

On English to Indian Language MT - II

Yesterday I contacted Golam the developer of “Anubadok” English to Bengali MT system. My previous post was about to start working on the system. I was talking about generating the dictionary for the MT system. Golam suggested me to look in to some more modules. “ Apart from bdict.db, you should also have a look at “lib/Anubadok/BnTable.pm” and “lib/Anubadok/BnSonshi.pm” modules. These are Bengali (Bn) specific files where you need to change stuffs. “

Read More

Generating pronunciation of English words with Perl.

Today we can discuss a Perl module called Lingua::EN::Phoneme. This module is used for generating pronunciation of words in a given English text. The lexicon which is used in the module is CMU Pronunciation lexicon. CMU have an Open Source Speech Recognition tool called ‘Sphinx’. Sphinx based Speech Recognition work for Indian languages is in progress. Some individuals, groups and organisations are engaged in this. Here we are not going to discuss about Speech Recognition.

Read More

Converting word sequence to title case in Perl and Python.

People working in Natural Language Processing often extract mutiword units. Some multiword units may be names of Organisation or departments like. Some times they may require to convert it to exact title case(department of physics to Department of Physics ). There is a perl module for performing this operation called Lingua::EN::Titlecase . If you are a Pythonist (me tooo) you may tell that it is easy in Python. Because there is a function called title() in Python. But this perl module is more intelligent than the title() in Python. The out put of title() function in Python is like this.

Read More

Fun with your name

Just I was exploring the Lingua::EN:: modules in CPAN. I saw a funny module called Lingua::EN::Namegame by ‘Tim Maher’. If you give your name as input it will generate some verses. I will tell you how to install and use it.

Read More

Sad demise of Prof. V.I. Subrahmaniom

Prof. V. I. Subrahmaniom eminent scholar in Linguistics, especially in Dravidian Linguistics passed away. He was the hon. Director of Dravidian Linguistics Association and International School of Dravidian Linguistics(ISDL), Thumba, Kerala. After retearing from service he was living in ISDL. His life was dediacted for Research and Teaching in Linguistics. He took initiatives for conducting Dialect Survey, publication of Journal of Dravidian Linguistics etc.. I taked to him only once in my life. It was a great experience. After hering my paper on Malayalam morphology he called me. We discussed different aspects of Malayalam morphology aroung one hour. It was a wonderful experience. Now he became part of the history of dravidian Linguistics.

Read More

Old Books @ thiruvanantahpuram City

There are so many book stalls in Trivandrum . You can purchase technical books to service books from those book stalls. There will be routine book fairs too. But most of the book lovers will be visting a reguler book fair. That is the Old Book Shops at Thiruvanathapuram. When I came to Thiruvananthapuram in 2000 it was scattered in many parts of the city, east fort, near secreteriate, University Office etcc. The town planners compelled the sellers to change their market place to here and there. Now those shops are located near the Public libraray and University mens hostel. From PSC guide to Medical Engineering text books, out of print books are available there. I am going to miss all these views. Because I am moving to Chennai.

Read More

Resume Building made easy

Recently I came across a resume builder tool. It is an online one. If your resume contents are decided login to Emurse.com . Fill in the details, select design and download. You can download resume in any format like M$ Word Adobe PDF, ODT, HTML and TXT. Also you will get a resume website like yourname.emurse.com.

Read More

Gambas the new free development environment

I was reading Shibu Varkalas blog. I came across posting related to anew Open Source Development Environment called “Gambas”. It can replace Visual Basic of M$. But it is not a clone of M$ VB. The gambas site narrates its features as followes “ With Gambas, you can quickly design your program GUI with QT or GTK+, access MySQL, PostgreSQL, Firebird, ODBC and SQLite databases, pilot KDE applications with DCOP, translate your program into any language, create network applications easily, make 3D OpenGL applications, make CGI web applications, and so on… “
I downloaded a beautifull open book on Gambas programming. After reading the boob my friend who is familier with Visual Studio told that it is easy to migrate from VB to Gambas. Syllabus designers who introduces VB please incorporate this tool in your syllabus. I am moving from my current organization in the next week so I am busy with finishing ded lines. Will try the tool and post more details soon. Also my friend agreed that he will try it soon. If he suceed in it I will post the result. My friend is novice in GNU/Linux :-).

Read More

Redmine the Open Source way for project management

Recently I came across a new Free and Open Source project management tool called Redmine. Very nice tool. It is web based frame work developed in Rubu on Rails architecture. It is use ful for Open Source Communities to Corporate world. The entire software production life cycle can be made tracable and automated. It is highely customisable. The installation instruction says that it will tale 15 minuted to install, if you are familier with Ruby language. But it may take more time. I treid to install it in a Fedora8 machine and integrate with apache for intranet access. Both people were novice in Ruby so we took three days to configure it. The main advantage of this tool is its customisability. Depending upon the need of community or organization you can customize it or add more features. A handful of plug-is are also available for the tool. The most interesting one is the library plug in. It can be used to anage the libraray in an organisation. I am sure it will become most favorite tool for the corporate world and open community.

Read More

Open Workbench open source project scheduling tool.

Open workbench is a project scheduler and project management tool. Again a Ruby based desktop application. Most of the project leaders and managers in the corporate world will be using M$ Project for scheduling and managing there project. Dear leaders and managers have a change!!. Here is an alternative for you “Open Workbench”. Everything which you are doing in M$ tool will be available here, some times more. The project is supported by CA. If your company would like to reduce licensing cost of M$ Project recommend it and save $$ers. I tried it. Great!!. Before getting the tool I was doing all the thing with M$ tool. (Sorry I compelled to use it. So i searched for an open tool). There was no difficulty in using it. Easy to install and use. No dependency no license ultimate freedom. I used the tool for creating schedules of two projects which i coordinated in my previous organisation. If you are an open source guy you will ask What about planer? Yes palnner is there. But Open Workbench is more and more advanced one. Have feel of it.

Read More

Paradigm

yout: post title: PARADIGM CLASS author: jagan tags: [Malayalm, Natural Language Processing] comments: true — Primarly I divided all the nouns in to following paradigm types . (on the basis of endings and similarity in inflexion)

Read More

VERB

Dear Collegues The noun part of malayalam morphological Analyser will be compleated with in a short period. Now I have to satrt the Verb part. I request the scholars to provide necessry advice regarding generating Verb paradigms.

Read More

PARADIGM SAMPLE (Noun)

noun_n maraM maraM maraffalYZ marawweV/marawwineV maraffalYeV marawwot/marawwinot maraffalYot marawwin maraffalYZkk marawwAlZ/marawwinAlZ maraffalYAlZ marawwinZrYeV/marawwinuteV maraffalYuteV marawwilZ/marawwifkalZ maraffalYilZ marame maraffalYe maramAyi marafalYAyi maramAyikoVNt marafalYAyikoVNt marawwilZninn/marawwifkalZninn marafalYilZninn marawwekkAkkAlYZ marafalYeVkkAlYZ marawwilZveVcc marafalYilZveVcc marawwekkAlYZ marafalYeVkkAlYZ marawwinatuww marafalYZkkatuww marawwinupinnilZ marafalYZkkupinnilZ marawwinumukalYilZ/marawwinzmelZ marafalYZkkumukalYilZ/marawwinzmelZ marawwinukIlYYeV/marawwinukIlYYilZ marafalYZkkukIlYYeV/marawwinukIlYYilZ marawwinatiyilZ marafalYZkkatiyilZ marawwileV marafalYileV maramoVlYYikeV marafalYoVlYYikeV maramulYZppeVteV marafalulYZppeVteV marawwilekk/marawwilekku marafalYilekk/marafalYilekku maraMwanneV marafalYZwanneV maramAN marafalYAN maramAkuM marafalAkuM marawwinoVppaM/marawwoVtoVppaM marafalYZkoVppaM/maraffalotoppaM marawwekkurYicc marafalYeVkuricc marawweVkkuricculYlYa marafaleVkkurYicculYlYa marawweVppoleV/maraMmAwiri/maraMpoleV marafaleVppoleV/marafalYZmAwiri/marafalYZpoleV marawwinallAweV marafalYZkkallAweV marawwinillAweV maraffalYkkillAwe maramillAweV maraffalYillAwe marawwinatuwwuninn marafalYZkkatuwwuninn marawwolYaM maraffalYolYaM maramAyAluM marafalAyAluM marawwinzrYeVyulYlYilZ maraffaluteVyulYlYilZ maraMkoNt/marawwekoNt maraffaleVkkoNt marawwilekk maraffalYilekk marameVnnAlZ maraffaleVnnAlZ maramulYlYa maraffalYulYlYa marawwepparYrYi maraffalYeVpparYrYi marawwilZwwanneV maraffalYilZwwaneV marawwilulYlYa maraffalYilulYlYa marawwinzrYeVyuM maraffalYuteVyuM maramO maraffalYO maramANe maraffalYANe marawwilOtt maraffalYilOtt marawwinAyikkoVNt maraffalYZkkAyikkoVNt marawwilUteV maraffalYilUteV maraMvuM maraffalYuM marawweVuM/marawwineVuM maraffalYeVuM marawwotuM/marawwinotuM maraffalYotuM marawwinuM maraffalYZkkuM marawwAluM/marawwinAluM maraffalYAluM marawwinZrYeVuM/marawwinuteVuM maraffalYuteVuM marawwilZuM/marawwifkaluM maraffalYiluM marameuM maraffalYeuM maramAyiuM marafalYAyiuM maramAyikoVNtuM marafalYAyikoVNtuM marawwilZninnuM/marawwifkalZninnuM marafalYilZninnuM marawwekkAkkAlYuM marafalYeVkkAlYuM marawwilZveVccuM marafalYilZveVccuM marawwekkAlYuM marafalYeVkkAlYuM marawwinatuwwuM marafalYZkkatuwwuM marawwinupinniluM marafalYZkkupinniluM marawwinumukalYiluM/marawwinzmeluM marafalYZkkumukalYiluM/marawwinzmeluM marawwinukIlYYeVuM/marawwinukIlYYiluM marafalYZkkukIlYYeVuM/marawwinukIlYYiluM marawwinatiyiluM marafalYZkkatiyiluM marawwileVuM marafalYileVuM maramoVlYYikeVuM marafalYoVlYYikeVuM maramulYZppeVteVuM marafalulYZppeVteVuM marawwilekkuM/marawwilekkuM marafalYilekkuM/marafalYilekkuM maraMwanneVuM marafalYZwanneVuM maramANuM marafalYANuM maramAkuMuM marafalAkuMuM marawwinoVppavuM/marawwoVtoVppavuM marafalYZkoVppavuM/maraffalotoppavuM marawwekkurYiccuM marafalYeVkuriccuM marawweVkkuricculYlYauM marafaleVkkurYicculYlYauM marawweVppoleVuM/maraMmAwiriuM/maraMpoleVuM marafaleVppoleVuM/marafalYZmAwiriuM/marafalYZpoleVuM marawwinallAweVuM marafalYZkkallAweVuM marawwinillAweVuM maraffalYkkillAweuM maramillAweVuM maraffalYillAweuM marawwinatuwwuninnuM marafalYZkkatuwwuninnuM marawwolYaMuM maraffalYolYaMuM maramAyAluMuM marafalAyAluMuM marawwinzrYeVyulYlYiluM maraffaluteVyulYlYiluM maraMkoNtuM/marawwekoNtuM maraffaleVkkoNtuM marawwilekkuM maraffalYilekkuM marameVnnAluM maraffaleVnnAluM maramulYlYauM maraffalYulYlYauM marawwepparYrYiuM maraffalYeVpparYrYiuM marawwilZwwanneVuM maraffalYilZwwaneVuM marawwilulYlYauM maraffalYilulYlYauM marawwinzrYeVyuMuM maraffalYuteVyuMuM maramOuM maraffalYOuM maramANeuM maraffalYANeuM marawwilOttuM maraffalYilOttuM marawwinAyikkoVNtuM maraffalYZkkAyikkoVNtuM marawwilUteVuM maraffalYilUteVuM

Read More

INTRODUCTION

Dear Scholars As part of my course work in M.A NLP I am trying to develoupe a Malayalam Morphological Analyser.Experts in the fields of Malayalam Language, Computational Linguists ,Linguists, Programmers can post suggestions regarding creating paradigm tables, data rules regarding Malayalam morphology and related topics. I have prepared some paradigm for noun morphology . I uploaded one Sampe paradigm in the Archive of this blog . JAGANADH.G Homepage-http://in.geocities.com/Kalapaka2003/navadipanyaya.html

Read More