Monday, July 16, 2012

Most of rules can be succefully converted

All this time I manipulated with English rules of LanguageTool. You can find my script here and obtain XML files here. The end result of this script gives right now following output:
67.471 % of rules covered (587/870)
23 rules left to cover
260 unsupported rules
When you run the script you will get the text file named unsupported.txt and it contains all unsupported rules. Most of them is "postag" rules. It's not easy to convert them to LightProof. So it requires expanding LightProof API features to go. I was planned to hack it after midterm evaluation. You can find more about LanguageTool XML API here. Also there are some very small number of corrections needed like scope, skip, match attributes and tags. However it's almost done.

Will be continued.

Tuesday, March 27, 2012

First entry

Hi, dear user! This is my first entry of the blog. It makes no sense, but echo hello world. However I want to get a little bit far. So I ought to describe myself and what I'm going to do this summer as hacker.

Who Am I?
Firstly, my name is Daniel. I'm open-source volunteer and Ukrainian localizator. I live in beautiful city called Lviv. I'm currently coordinator of both Gnome and Libreoffice Ukrainian localization. Also you can find some statistic of my commits at ohloh.net.

What the topic is?
As localizator, I understand that people make mistakes everyday as typing. It's useful when you use spellchecker. Nevertheless when it comes to non-grammar mistakes, there is no way to find out it easily, especially when you don't know this rules. I hope I shouldn't explain why it's useful to automatically find mistakes and correct it. Just image. You have huge document (e.g. 1000 pages) and you should find as much mistakes as possible and as fast as you can. Is it possible without hurting your eyes? I hope so. At least I'll try to save your eyes. (:

Most of you use office suits to deal with data, texts, pictures, etc. As open-source user I use LibreOffice to deal with that ones. Usually I'm trying to text faster to save the time and obviously I leave mistakes. And I use tools like spellcheker to detect them. However I have realized it doesn't help you in all cases. For example, commas, spaces, etc.

What am I going to do?
Probably you have guessed what I'm going to do. I'll work on Lightproof this summer. It's very flexible and pluggable system to make rules using Python language. I mailed to my mentor and he had already gave me my easyhack task. Also I've informed mail-list about my purpose. My mentor is Németh László, who created this beautiful tool. 

Why Lightproof?
I have nothing left to say, but give you the link. I mean the first reason is official support by Document foundation. The second one is a language. I'm a bit new in programming, so I want to improve my skill. I like Python, but Java is too complicated to newbies and not as interactive as Python. So that's why.

What about Language Tool?
It's a bit older and more powerful tool. This project uses XML rules wrapped by Java. It's nice and suitable, but it's too slow because of bytecoding. So there is no possibility to port it to each other. Nevertheless I hope to use some shared tools in a future.