Weblog for Costin Manolache: 02/01/2003

Tuesday, February 25, 2003

JMX console

One of the main benfits of JMX is the ability to "see" what happens at runtime and to tune the process. The admin interface is good, as it provides a nice interface - but for advanced use you need a low level console. Each JMX tool has its own console - and that may be a bit confusing if you switch them often.

An interesting fact is that most of the consoles can be used with other JMX implementations - with almost no effort. That's a pretty good proof of the benefits of low coupling.

The JBoss console is a webapp - you'll need to copy jboss-client.jar and log4j.jar in WEB-INF/lib, since it's not self-contained ( there are few utils for logging ), but besides that you should be able to use it with any servlet container that has JMX support. It may be worth precompiling the jsps.

MX4J doesn't depend on a servlet container - it uses an interesting ( but slower ) XSL and its own HTTP listener. All you need to do is load the mbeans. Commons-moder provides a one-line mechanism - I'll comment on it later ( I'll do few more enhancements and simplifications ).

JMX-RI has the fastest console - also contain its own HTTP listener and is packed as few mbeans. I have no idea what's inside - so I usually preffer one of the other two.

Another note - in tomcat ( any - if it uses jk2 and coyote ) you can just add a "mx.port=PORT" in jk2.properties and MX4J or JMX-RI console will be started. You need to copy one of the 2 in server/lib ( mx4j-tools.jar or jmxri-tools.jar ). If mx4j is used, it'll also try to enable the RMI connector. Most of the code will fail gracefully, without affecting the rest. The startup time overhead is quite small.

Saturday, February 22, 2003

Sending mail to the blog

Armed with Sam's short intro and the MT manuals - I started the second piece of my mail toy - weblog posting via email. Posting via xmlrpc is much easier than I expected - it took me a while to figure what is the blogid, but after that everything was smooth.

I'm using the metalog xml-rpc style - and I'll use the mt extensions to set the category.

One comment on Sam's "REST" alternative - I think an even better approach would be WebDAV, with the item body as plain HTML, with META tags used for the category and the extra (meta) info. This way all HTML editors would be able to edit and publish weblog entries without any modifications. WebDAV also allow editing existing entries. The "core" of a weblog entry is a piece of HTML - with extra metadata and with extra processing done to generate the weblog-style pages ( templates to generate rss, html in a specific layout, etc ).

The next item to implement is the wiki filter, so I can use pine to send wiki-style plain email and have it converted to html that I can upload via xmlrpc. The real hard part will be implementing the special "headers" for meta-info, and implementing the "special" posting forms ( comments to other postings and about web sites ). I want to just use the "mail this page" button, add the comments on top and send it to "myBlog@myDOMAIN.com".

Back from traffic school

Boring - but I got 8 hours to think about other things. Like where to put the extra information that is required to weblog by email. The logical place would be in headers - but most mailers don't make this easy, and it would be confusing to some people. In addition - if you do "send page" in a web browser, and you want to comment on the page ( a very nice way to enter this kind of weblogs ) - you have an even less functional mailer.

Another option would be the recipient address - it can be myBlogPostinEmail+extra-infor@domain.com. Unfortunately - the "+" syntax is not supported by most domains, and it assumes you are in control of the mail system - which would work great for a "mail to weblog" provider, but not that well for individual use.

So - the remaining place is the top and bottom of the post. Each post will start ( and be parsed for ) a set of "magic" headers - headers inside the body and at the end, after some -- that indicate the signature. Some mailers allow to associate a signature with a particular address - that would work pretty well.

What meta-information is required ? First, a "key" - the preffered mechanism would be to sign the message, but again one of the target for such a system is people who don't want all the technical complexity of a new application.

The other info is the "category", and also some address to send talkback to ( this can be extracted from Reply To: if a mail-based "agregator" is used to read the weblogs ).

Is this too complicated ? I don't think so, people are used to start the mails with a "Hi" and with many kinds of "forms". I think there are also "mail templates" that could help.

HTML mail would be more difficult to parse for that - but it's not impossible ( just need to look for the magic keywords delimited by < and > ).

Thursday, February 20, 2003

Wiki, Weblog and Pine

It seems I'm not the only one - James Strachan wants to use a WYSYWIG editor for wiki.

The main point of wiki is the ease of authoring. Well, learning a new set of rules (slightly different for every wiki ) and editing it in a HTML form is not "easy" by my definition.

Some web browsers have a nice menu option - "Edit this page", and you are presented with a decent HTML editor. Of course, it generates crappy HTML, but it's easy to filter it out to the same level as wiki ( i.e. only simple tags with no style attributes ). And then - you can either "publish" ( again, widely available ) or just "mail" - so you can do your stuff offline.

I see a lot of value in the wiki style - but _one_ style, chosen by the author and not by each site. And used when the author preffers to use a text editor ( a decent one - not a form ) - like pine or vi or emacs. Again - "mail this" is so trivial and available in so many editors.

As you have noticed - I am not happy with the current model for "agregators" and "authoring" for weblogs. For exactly the same reason. Even if they will create an intuitive and easy to use agregator and authoring tool ( and I heard they're not ) - why should people have to learn and install another tool ?

Last days I tought about this - as my "mail agregator" is growing and I am moving to the publishing side, I'm prioritising the list of features. Composing a weblog using wiki style from pine is very high - since I use pine a lot. I also use Evolution

or Mozilla from time to time - so authoring weblogs in one of those and publishing it via a simple "send email" is the second priority.

Back to wiki - the fundamental idea is to be VERY simple. Using a familiar tool is simpler. It is harder to implement the wiki ( you need to enable webdav or some mail filters, better locking, convery ugly html to wiki or simple html, support multiple editors, etc ).

Tuesday, February 18, 2003

First problem with mail blogs

I was parsing the Cafe Con Leche rss feed. No explicit date - that becomes ususal. Only the day can be extracted from the link reference. The real problem - only "description" is available for items,

with a short exceprt - and the link points to the day view, not the individual item.

Initially I was thinking to use a workaround for "description only" or "incomplete content" - i.e. grab the linked item. Unfortunately the link is not to the article, but to the page containing a list of articles and a lot of other markup and stuff.

Sunday, February 16, 2003

Mail aggregator

My mail aggregator is working fine - thanks Sam for the 2 pointers. The big problem is of course the "standard" XML/RSS that is used. Just like in almost all other places where XML is used - the benefit of a standard syntax is countered by the complete random and obfuscated ( and countless) schemas and variations.

First problem: many feeds don't include the date - I added few regexp to extract common patterns ( Updated: ..., Posted: ... ) from the content. If I can't get it - I'll assume the time of the collection - which would be wrong for older news, but it'll be close enough as I update.

Second problem: Since most of the time you can't tell if a "permalink" has been updated - I have to cache the MD5, so I get new mail when the link content changes.

What's next ? One think I allways liked is the multipart mime - with HTML and images in the same message. I don't remember the details ( the links have special syntax ), but it shouldn't be difficult to fecth the images and save them with the entry, for full off-line reading. And the other side - using mail to update my own

weblog.

I expect more tweaks as I read more weblogs - each weblog I add has its own (standard :-) XML style.

BTW, I (re)discovered the "trick" to include the images in the mail - the program will need to grab all , add the content as mime parts and use "cid:" magic protocol, and Content-ID header in the parts. It shouldn't be very hard to code - but for now it's not a big priority, the mailer can get the images from the web when online and only few weblogs have images.

Wednesday, February 12, 2003

mail and weblog (3)

Finally, reading weblogs starts to become easy. I modified blagg - actually rewrite it in python, I'm clearly out of touch with perl. I generate the files in maildir format, then I use KMail to read it. I usually use pine, but pine requires some patches to support maildir and it's not that nice with HTML.

There are many details to sort out - I can parse only 2-3 .rss formats so far, but the benefit is huge. I need to get the comments and fix the headers so I can see them as threads, and I'll like to do a simple scan on the content and get the images - and generate the multiplart MIME message.

The big one is getting the category and sorting in folders - but that can be done in procmail. I'm trying to preserve all the RSS data as mail headers - so procmail can do its job.

In any case - even with the ugly hacky script I'm using, KNode ( or pine or Evolution or mozilla - after I get procmail I can just push the items in regular imap ) are so much better than any of the agregators I've seen so far. Sorting, offline read, filtering, organizing data and moving interesting items in the same folders I use for interesting mails - and above all, the "familiar" feeling and having all the keys in my fingers...

The other direction is also interesting - editing HTML mail in any mailer ( offline or not ) and then having it published as a weblog entry.

Soap over SSL without signed certs

After wasting an hour on a supposedly trivial configuration - I got axis1.0 to connect with the SSL server without the stupid certificate signature that is required by the Java SSL client.

First step is to set "axis.socketSecureFactory" system property to "org.apache.axis.components.net.FakeTrustSocketFactory".

Second step - it's a workaround. The code that creates the SSL socket has an if()

that will use the fake TrustFactory only if an attribute is set. So you need to define

a handler with a dummy attribute, and set the transport to that handler. Something like:



 <handler name="httpHandler"   type="java:org.apache.axis.transport.http.HTTPSender">

   <option name="dummy" value="foo"/>

 </handler>

 <ransport name="http" pivot="httpHandler" />

I couldn't find this on any search engine - very strange, you would expect this to be used more...

Tuesday, February 11, 2003

mail blog (2)

Much better... I made some modifications to blagg to send the content as text/html, use the feed name as Sender and the feed title as subject. My perl is way too rusty, I may to write something similar in python and then get more info into headers and play with procmail.

Mail blog

Sam Ruby added mail based comments to his weblog. This allows people to send comments using email. The other missing half is to allow people to read the weblog using email - a mini mail lists, and a small form allowing people to subscribe and get each entry and the comments by email.

So far the news-based agregator seems the most intuitive and least painfull for me. Since it's very unlikely people will start adding the "mail" publishing to their site - the only option is to modify one of the scripts ( blagg ? ) to get the RDF and send mails. Then procmail or mail filters could sort them - and finally I can read them without pain...

Aparently - there is already an email plugin. Let's see how this work...

Sunday, February 09, 2003

OS problems

At least I'm not the only one... Ovidiu seems to have OS X problems too. Last week I did think about getting a mac - where everything just works and you only have to click ( and pay some extra $ ). Then I realized that I learned a lot by just trying to fix my linux box - even if I didn't succeed in the end. And besides frustration, I did get some fun...

Got my linux back

I just did a fresh install ( redhat 8 ), and kept the gentoo partition. The "java zombies" problem proved too difficult for me - and showed me once again how limited my knowledge is ( I had this a lot in the last weeks... ).

I did a backup on a separate disk and I'll try to chroot and investigate more - I don't give up easily. I'm sure I'll see this again...

To recap: under load I got java "defunct processes" ( zombies ). I suspect this is related with the garbage collection and memory management, and of course a certain combination of libraries. I did try different kernels and glibc - including the exact same combination that works without problem on my other computer. Right now the only thing I can do is get the sources ( I remember they were available under scsl ) and see what's happening. A "clean" system works just fine, it is something I did or installed that created the problem - and no amount of reinstalling or checking libraries solved that.

I'm usually very aware of my limits and how much I don't know - but last weeks

make me wonder....

Blog reader distribution

Interesting link about blog power distribution. Found it on Cafe au Lait/Leche, one of my daily reads.

I'm interested in the relation and shifts between weblogs and mail lists/news groups. This confirms that for most people like me ( without much writting talent or more private ) the mail lists remain the best way to communicate their toughts ( if they want to be heard). The weblog is great for organizing ideas and for communication in small circles. I'll probably post more on this topic, I have a list (that gets longer every day) comparing the 2 mediums. My conclusion so far is that weblog needs more topic-related agregation and subject-based mirroring into mail/news.

As I find more and more interesting people I find it harder and harder to read their postings - switching between so many subjects and mazes of links is becoming painfull.

Friday, February 07, 2003

linux and java

Frustrating... I can't get java to work reliably on my linux. Almost every time it does a GC, it creates a zombie thread - and after a day or 2 it crashes ( after

filling the process table ). I usually restart the java process often - so I didn't noticed it - until I started using idea and jboss MX ( which allows you to replace libs without restarting the VM - a feature I'm trying to make work with tomcat).

I tried changing the libc and kernel - no effect. My current assumption is that it's related with glibc2.2.93 ( that ships with RedHat8 ). Gentoo glibc has the same problem ( 2.3.1 ). I'll try to downgrade my system to glibc2.2.5 or earlier - I run a lot of long-running tests on redhat7.2 and never had this problem.

Sunday, February 02, 2003

JMX servlets

JSR77 and tomcat supports JMX for servlets ( and many other components ). This works by exposing the wrapper - and providing all kind of information. Extending this to the real servlet ( the code written by the user ) would move the servlet in the current century, and is quite easy to do.

Servlets are configured using the ServletConfig passed at init time. We could write a simple module ( mbean ) that listen for JSR77 mbean registrations. When j2eeType=Servlet is registered, it can look for the "real" servlet and check if it is an mbean ( implements the *MBean interface, etc). If so - it will just configure the servlet using JMX patterns ( including init()/start() lifecycle pattern ).

The JMX console would show what happens inside the servlet - and allow runtime tunning and reconfiguration of the servlet. Whenever we support the JMX persistence - we can extend this to the servlet and context init params.

Another extension would be to support ant-like patterns - a serlvet ( that implements single thread model ) can automatically get the request parameters transformed into setter calls, like an ant task, and then an execute() method to perform the request. This can be supported with a base class - and would allow a number of optimizations and more important - will greatly increase the readability of the servlet code.

Saturday, February 01, 2003

Gentoo (and gump)

Before leaving this morning I started the "install" of gentoo linux. It is now compiling gcc - I expect it'll take few more day until it's done. I find it amazing - it is a sort of gigantic

gump, building the entire linux distribution from sources.

Looking at .ebuild files - they are remarcably clean. I think the biggest problem in gump is mixing the HTML generation in the build files. I wish I had more time... The other obvious difference is that it builds from "stable" - it seems it can also build from HEAD, but the stable option makes it more than a build tool.

An intersting idea would be generating .ebuild files from gump - or changing gump to use an .ebuild style. By default they have ant and many other apache packages - but tomcat doesn't seem to be included ( yet ). Update: it is included, but only 4.0.x. And the ebuild gets the binary - it doesn't compile.

Being able to build a program gives you the sentiment that it is easy to contribute and make changes and fix things. That's the main reason I like gump - I can make quick changes to various projects - and then get things compiled without the usual headaches. It just works.

Many years ago - I used to build everything. Then I started to get lazy and use binaries - and only now I realize how much I missed it. You can compile SRPMs - but they rely on other binaries or packages, and in many cases is beyond complex ( just try to compile kde3.1 SRPMs from rawhide on redhat8 ). It seems maven and centipede are closer to the srpm model ( which is not completely bad - simple builds will be easier ).

Weblog for Costin Manolache