Friday, March 07, 2003

RSS for comments and ids

I got Sam's comment feed - and it imediately broke my agregator. The problem is quite simple - I use the link of the item as a key, and in the comment feed all
comments use the same key as the article. And the same title.

This changes the entire data model - for each link I'll have to store a list of MD5s, and treat it as a multi-value. The first time ( if no MD5 is found ) it'll be the "source" message, and all other occurances - including edits of the original - can be treated as "Re: " - to implement the threading in the mail reader.

Probably I'll just go for the simplest solution - and keep a .db keyed by MD5 (that should be unique enough ) - the whole idea is to avoid sending the same item multiple times.

Once again - the RSS proves to be almost completely useless.

Update: Sam changed the links. There are few problems. Some of his links are quite strange .../blog/('1247.html#c1047152121',), I suppose a small bug sneaked in.

Worse - I now lost the way to relate comments with postings. I'll have to parse the generated links and remove the ending to guess the original posting. Sam - please change back, I fixed my code to deal with the old problem and I think the workaround for the new problems is harder :-)

A better solution would be to just add a separate tag with the comment id. Of couse, that would be in the RSS-2003-03-08 "standard", and nobody else will use the same tag name. ( we should use the date of the "standard du jour")

Outlook mail aggregator

It seems there are other people who preffer the mail reader - NewsGator is specific to Outlook, so I won't pay them $29.

mail aggregator code

I cleaned up a bit and uploaded my mail aggregator. Getting rid of the pickle was a great move - db seems good enough.

For now I use simple name-value mappings and several db files - it's easier to debug and get other tools to use the same data. I was thinking of a real database - but that would make it more complex and I doubt it'll scale or work better.

I doubt too many people will find it usefull - when they have all the fine 3-panel graphical agreggators. I just use it to read weblogs in train, using my old "pine" and sometimes mozilla/evolution/kmail ( when I feel a need for real nice GUI - and go back to pine when I want to read mail ).

HelloWorld - finally !

Last night I got the first "hello world" page from tomcat5/JMX. No server.xml, no fat, only a small set of mbeans driven by ant. Things were very close for a while but never quite work completely. There is still a lot of work, mostly downhill IMO.

I can't wait to finish - I'm in overload mode for some time, doing too many things at once.

Saturday, March 01, 2003

JMX clustering

I don't like session clustering for a lot of reasons. Clustering in general is great - but the servlet session API just doesn't have enough power to allow a correct and efficient use of clustering. The use case for caching some values is compromised,

there is no transaction support, no feedback or control - users are much better served by assuming the session is transient, and using database or user-controled clustering.



By clustering I understand broadcasting and mirroring some data on multiple instances - there are many other meanings. It is a very powerful tool if used right.



There is one use of clustering that would be extremely intersting - broadcasting and clustering the JMX attributes. The idea is to have a global view - each tomcat instance could brodacast its mbeans and attributes, and may be able to take more inteligent decisions knowing the state of the other instances. And the admin console would get info from all instances.

Tuesday, February 25, 2003

JMX console

One of the main benfits of JMX is the ability to "see" what happens at runtime and to tune the process. The admin interface is good, as it provides a nice interface - but for advanced use you need a low level console. Each JMX tool has its own console - and that may be a bit confusing if you switch them often.



An interesting fact is that most of the consoles can be used with other JMX implementations - with almost no effort. That's a pretty good proof of the benefits of low coupling.



The JBoss console is a webapp - you'll need to copy jboss-client.jar and log4j.jar in WEB-INF/lib, since it's not self-contained ( there are few utils for logging ), but besides that you should be able to use it with any servlet container that has JMX support. It may be worth precompiling the jsps.



MX4J doesn't depend on a servlet container - it uses an interesting ( but slower ) XSL and its own HTTP listener. All you need to do is load the mbeans. Commons-moder provides a one-line mechanism - I'll comment on it later ( I'll do few more enhancements and simplifications ).



JMX-RI has the fastest console - also contain its own HTTP listener and is packed as few mbeans. I have no idea what's inside - so I usually preffer one of the other two.



Another note - in tomcat ( any - if it uses jk2 and coyote ) you can just add a "mx.port=PORT" in jk2.properties and MX4J or JMX-RI console will be started. You need to copy one of the 2 in server/lib ( mx4j-tools.jar or jmxri-tools.jar ). If mx4j is used, it'll also try to enable the RMI connector. Most of the code will fail gracefully, without affecting the rest. The startup time overhead is quite small.

Saturday, February 22, 2003

Sending mail to the blog

Armed with Sam's short intro and the MT manuals - I started the second piece of my mail toy - weblog posting via email. Posting via xmlrpc is much easier than I expected - it took me a while to figure what is the blogid, but after that everything was smooth.

I'm using the metalog xml-rpc style - and I'll use the mt extensions to set the category.



One comment on Sam's "REST" alternative - I think an even better approach would be WebDAV, with the item body as plain HTML, with META tags used for the category and the extra (meta) info. This way all HTML editors would be able to edit and publish weblog entries without any modifications. WebDAV also allow editing existing entries. The "core" of a weblog entry is a piece of HTML - with extra metadata and with extra processing done to generate the weblog-style pages ( templates to generate rss, html in a specific layout, etc ).



The next item to implement is the wiki filter, so I can use pine to send wiki-style plain email and have it converted to html that I can upload via xmlrpc. The real hard part will be implementing the special "headers" for meta-info, and implementing the "special" posting forms ( comments to other postings and about web sites ). I want to just use the "mail this page" button, add the comments on top and send it to "myBlog@myDOMAIN.com".

Back from traffic school

Boring - but I got 8 hours to think about other things. Like where to put the extra information that is required to weblog by email. The logical place would be in headers - but most mailers don't make this easy, and it would be confusing to some people. In addition - if you do "send page" in a web browser, and you want to comment on the page ( a very nice way to enter this kind of weblogs ) - you have an even less functional mailer.



Another option would be the recipient address - it can be myBlogPostinEmail+extra-infor@domain.com. Unfortunately - the "+" syntax is not supported by most domains, and it assumes you are in control of the mail system - which would work great for a "mail to weblog" provider, but not that well for individual use.



So - the remaining place is the top and bottom of the post. Each post will start ( and be parsed for ) a set of "magic" headers - headers inside the body and at the end, after some -- that indicate the signature. Some mailers allow to associate a signature with a particular address - that would work pretty well.



What meta-information is required ? First, a "key" - the preffered mechanism would be to sign the message, but again one of the target for such a system is people who don't want all the technical complexity of a new application.

The other info is the "category", and also some address to send talkback to ( this can be extracted from Reply To: if a mail-based "agregator" is used to read the weblogs ).



Is this too complicated ? I don't think so, people are used to start the mails with a "Hi" and with many kinds of "forms". I think there are also "mail templates" that could help.



HTML mail would be more difficult to parse for that - but it's not impossible ( just need to look for the magic keywords delimited by < and > ).

Thursday, February 20, 2003

Wiki, Weblog and Pine

It seems I'm not the only one - James Strachan wants to use a WYSYWIG editor for wiki.



The main point of wiki is the ease of authoring. Well, learning a new set of rules (slightly different for every wiki ) and editing it in a HTML form is not "easy" by my definition.



Some web browsers have a nice menu option - "Edit this page", and you are presented with a decent HTML editor. Of course, it generates crappy HTML, but it's easy to filter it out to the same level as wiki ( i.e. only simple tags with no style attributes ). And then - you can either "publish" ( again, widely available ) or just "mail" - so you can do your stuff offline.



I see a lot of value in the wiki style - but _one_ style, chosen by the author and not by each site. And used when the author preffers to use a text editor ( a decent one - not a form ) - like pine or vi or emacs. Again - "mail this" is so trivial and available in so many editors.



As you have noticed - I am not happy with the current model for "agregators" and "authoring" for weblogs. For exactly the same reason. Even if they will create an intuitive and easy to use agregator and authoring tool ( and I heard they're not ) - why should people have to learn and install another tool ?



Last days I tought about this - as my "mail agregator" is growing and I am moving to the publishing side, I'm prioritising the list of features. Composing a weblog using wiki style from pine is very high - since I use pine a lot. I also use Evolution

or Mozilla from time to time - so authoring weblogs in one of those and publishing it via a simple "send email" is the second priority.



Back to wiki - the fundamental idea is to be VERY simple. Using a familiar tool is simpler. It is harder to implement the wiki ( you need to enable webdav or some mail filters, better locking, convery ugly html to wiki or simple html, support multiple editors, etc ).

Tuesday, February 18, 2003

First problem with mail blogs

I was parsing the Cafe Con Leche rss feed. No explicit date - that becomes ususal. Only the day can be extracted from the link reference. The real problem - only "description" is available for items,

with a short exceprt - and the link points to the day view, not the individual item.



Initially I was thinking to use a workaround for "description only" or "incomplete content" - i.e. grab the linked item. Unfortunately the link is not to the article, but to the page containing a list of articles and a lot of other markup and stuff.

Sunday, February 16, 2003

Mail aggregator

My mail aggregator is working fine - thanks Sam for the 2 pointers. The big problem is of course the "standard" XML/RSS that is used. Just like in almost all other places where XML is used - the benefit of a standard syntax is countered by the complete random and obfuscated ( and countless) schemas and variations.



First problem: many feeds don't include the date - I added few regexp to extract common patterns ( Updated: ..., Posted: ... ) from the content. If I can't get it - I'll assume the time of the collection - which would be wrong for older news, but it'll be close enough as I update.



Second problem: Since most of the time you can't tell if a "permalink" has been updated - I have to cache the MD5, so I get new mail when the link content changes.



What's next ? One think I allways liked is the multipart mime - with HTML and images in the same message. I don't remember the details ( the links have special syntax ), but it shouldn't be difficult to fecth the images and save them with the entry, for full off-line reading. And the other side - using mail to update my own

weblog.



I expect more tweaks as I read more weblogs - each weblog I add has its own (standard :-) XML style.



BTW, I (re)discovered the "trick" to include the images in the mail - the program will need to grab all , add the content as mime parts and use "cid:" magic protocol, and Content-ID header in the parts. It shouldn't be very hard to code - but for now it's not a big priority, the mailer can get the images from the web when online and only few weblogs have images.

Wednesday, February 12, 2003

mail and weblog (3)

Finally, reading weblogs starts to become easy. I modified blagg - actually rewrite it in python, I'm clearly out of touch with perl. I generate the files in maildir format, then I use KMail to read it. I usually use pine, but pine requires some patches to support maildir and it's not that nice with HTML.



There are many details to sort out - I can parse only 2-3 .rss formats so far, but the benefit is huge. I need to get the comments and fix the headers so I can see them as threads, and I'll like to do a simple scan on the content and get the images - and generate the multiplart MIME message.



The big one is getting the category and sorting in folders - but that can be done in procmail. I'm trying to preserve all the RSS data as mail headers - so procmail can do its job.



In any case - even with the ugly hacky script I'm using, KNode ( or pine or Evolution or mozilla - after I get procmail I can just push the items in regular imap ) are so much better than any of the agregators I've seen so far. Sorting, offline read, filtering, organizing data and moving interesting items in the same folders I use for interesting mails - and above all, the "familiar" feeling and having all the keys in my fingers...



The other direction is also interesting - editing HTML mail in any mailer ( offline or not ) and then having it published as a weblog entry.

Soap over SSL without signed certs

After wasting an hour on a supposedly trivial configuration - I got axis1.0 to connect with the SSL server without the stupid certificate signature that is required by the Java SSL client.



First step is to set "axis.socketSecureFactory" system property to "org.apache.axis.components.net.FakeTrustSocketFactory".



Second step - it's a workaround. The code that creates the SSL socket has an if()

that will use the fake TrustFactory only if an attribute is set. So you need to define

a handler with a dummy attribute, and set the transport to that handler. Something like:





<handler name="httpHandler" type="java:org.apache.axis.transport.http.HTTPSender">

<option name="dummy" value="foo"/>

</handler>

<ransport name="http" pivot="httpHandler" />





I couldn't find this on any search engine - very strange, you would expect this to be used more...

Tuesday, February 11, 2003

mail blog (2)

Much better... I made some modifications to blagg to send the content as text/html, use the feed name as Sender and the feed title as subject. My perl is way too rusty, I may to write something similar in python and then get more info into headers and play with procmail.

Mail blog

Sam Ruby added mail based comments to his weblog. This allows people to send comments using email. The other missing half is to allow people to read the weblog using email - a mini mail lists, and a small form allowing people to subscribe and get each entry and the comments by email.



So far the news-based agregator seems the most intuitive and least painfull for me. Since it's very unlikely people will start adding the "mail" publishing to their site - the only option is to modify one of the scripts ( blagg ? ) to get the RDF and send mails. Then procmail or mail filters could sort them - and finally I can read them without pain...



Aparently - there is already an email plugin. Let's see how this work...

Sunday, February 09, 2003

OS problems

At least I'm not the only one... Ovidiu seems to have OS X problems too. Last week I did think about getting a mac - where everything just works and you only have to click ( and pay some extra $ ). Then I realized that I learned a lot by just trying to fix my linux box - even if I didn't succeed in the end. And besides frustration, I did get some fun...

Got my linux back

I just did a fresh install ( redhat 8 ), and kept the gentoo partition. The "java zombies" problem proved too difficult for me - and showed me once again how limited my knowledge is ( I had this a lot in the last weeks... ).

I did a backup on a separate disk and I'll try to chroot and investigate more - I don't give up easily. I'm sure I'll see this again...



To recap: under load I got java "defunct processes" ( zombies ). I suspect this is related with the garbage collection and memory management, and of course a certain combination of libraries. I did try different kernels and glibc - including the exact same combination that works without problem on my other computer. Right now the only thing I can do is get the sources ( I remember they were available under scsl ) and see what's happening. A "clean" system works just fine, it is something I did or installed that created the problem - and no amount of reinstalling or checking libraries solved that.



I'm usually very aware of my limits and how much I don't know - but last weeks

make me wonder....

Blog reader distribution

Interesting link about blog power distribution. Found it on Cafe au Lait/Leche, one of my daily reads.



I'm interested in the relation and shifts between weblogs and mail lists/news groups. This confirms that for most people like me ( without much writting talent or more private ) the mail lists remain the best way to communicate their toughts ( if they want to be heard). The weblog is great for organizing ideas and for communication in small circles. I'll probably post more on this topic, I have a list (that gets longer every day) comparing the 2 mediums. My conclusion so far is that weblog needs more topic-related agregation and subject-based mirroring into mail/news.



As I find more and more interesting people I find it harder and harder to read their postings - switching between so many subjects and mazes of links is becoming painfull.

Friday, February 07, 2003

linux and java

Frustrating... I can't get java to work reliably on my linux. Almost every time it does a GC, it creates a zombie thread - and after a day or 2 it crashes ( after

filling the process table ). I usually restart the java process often - so I didn't noticed it - until I started using idea and jboss MX ( which allows you to replace libs without restarting the VM - a feature I'm trying to make work with tomcat).



I tried changing the libc and kernel - no effect. My current assumption is that it's related with glibc2.2.93 ( that ships with RedHat8 ). Gentoo glibc has the same problem ( 2.3.1 ). I'll try to downgrade my system to glibc2.2.5 or earlier - I run a lot of long-running tests on redhat7.2 and never had this problem.

Sunday, February 02, 2003

JMX servlets

JSR77 and tomcat supports JMX for servlets ( and many other components ). This works by exposing the wrapper - and providing all kind of information. Extending this to the real servlet ( the code written by the user ) would move the servlet in the current century, and is quite easy to do.



Servlets are configured using the ServletConfig passed at init time. We could write a simple module ( mbean ) that listen for JSR77 mbean registrations. When j2eeType=Servlet is registered, it can look for the "real" servlet and check if it is an mbean ( implements the *MBean interface, etc). If so - it will just configure the servlet using JMX patterns ( including init()/start() lifecycle pattern ).



The JMX console would show what happens inside the servlet - and allow runtime tunning and reconfiguration of the servlet. Whenever we support the JMX persistence - we can extend this to the servlet and context init params.



Another extension would be to support ant-like patterns - a serlvet ( that implements single thread model ) can automatically get the request parameters transformed into setter calls, like an ant task, and then an execute() method to perform the request. This can be supported with a base class - and would allow a number of optimizations and more important - will greatly increase the readability of the servlet code.

Saturday, February 01, 2003

Gentoo (and gump)

Before leaving this morning I started the "install" of gentoo linux. It is now compiling gcc - I expect it'll take few more day until it's done. I find it amazing - it is a sort of gigantic

gump, building the entire linux distribution from sources.



Looking at .ebuild files - they are remarcably clean. I think the biggest problem in gump is mixing the HTML generation in the build files. I wish I had more time... The other obvious difference is that it builds from "stable" - it seems it can also build from HEAD, but the stable option makes it more than a build tool.



An intersting idea would be generating .ebuild files from gump - or changing gump to use an .ebuild style. By default they have ant and many other apache packages - but tomcat doesn't seem to be included ( yet ). Update: it is included, but only 4.0.x. And the ebuild gets the binary - it doesn't compile.



Being able to build a program gives you the sentiment that it is easy to contribute and make changes and fix things. That's the main reason I like gump - I can make quick changes to various projects - and then get things compiled without the usual headaches. It just works.



Many years ago - I used to build everything. Then I started to get lazy and use binaries - and only now I realize how much I missed it. You can compile SRPMs - but they rely on other binaries or packages, and in many cases is beyond complex ( just try to compile kde3.1 SRPMs from rawhide on redhat8 ). It seems maven and centipede are closer to the srpm model ( which is not completely bad - simple builds will be easier ).

Wednesday, January 29, 2003

Authorization and security

I was surprised to find that people assume java.security requires the use of a SecurityManager ( == sandbox ).



The authorization security in java is based on Permissions. You can associate Permissions with any object - even if most of the time it is associated with code bases ( jars ). There is a very simple interface ( a single method: implies ) that verifies if a collection of Permissions that an object has allows a particular operation.



The rules are a bit tricky when computing the effective permission in a call - that's the most complex part, and I suspect this requires a security manager to be enabled. But you don't need this part in order to use the base objects.



For example ( what's beeing discussed on tomcat-dev ): all security constraints in a webapp, as well as additional rules ( context /foo can access context /bar, etc ) can be expressed as Permissions. A request will just check if a desired operation is allowed against the set of permissions - the stack trace or the association of

permissions with codebases is not used. All you need is the collection of permissions of the webapp, and the desired permissions.



Since this is a very critical component - it makes all the sense to use a set of simple and well tested interfaces and semantics. There is a problem in mapping what you want into the Permissions ( which carries limited information ) - but IMO it is well worth it.

Tuesday, January 28, 2003

Kde3.1 on RedHat 8

I got it to work - and decided it's time to move to GenToo. Compiling it from sources on my system was almost impossible - too many dependencies. Rawhide was painfull and in the end would have replaced essential libraries.



What I did - just used Slackware binary packages. Since they use tar.gz, it was trivial to install - just few changes in one of their postinst scripts. They use /opt/kde - so they don't interfere with any other system. After RPM hell, that was an amazing experience.



KDE3.1 works great, konqueror is fast - tab browsing works. The only thing it lacks is remembering passwords. What I've "lost" - the startup menu. Kappfinder

was able to create one - with _all_ the apps it found ( amazing how many apps were hidden ).

Monday, January 27, 2003

Java security

Interesting article on security, thanks Jon for posting it.



It shows ( again ) that java without sandbox is not safe - private fields, classloaders or any other tricks are just dangerous illussions of security. The security is inside the (sand)box.

Saturday, January 25, 2003

Tahoe

I'll be in Tahoe this weekend. No programming or internet.

Friday, January 24, 2003

JMX-enable a bean in 3 easy steps

Another java trivia:



1. Add jmx.jar and commons-modeler.jar in your CLASSPATH

2. Insert at the top of the file:

import org.apache.commons.modeler.Registry;

3. Add in constructor ( or somewhere ):

Registry reg=Registry.getRegistry();

reg.registerComponent( this, DOMAIN, TYPE, OBJECTNAME );



The type doesn't matter too much unless you have the optional mbeans-descriptors.xml in the same package. Start the JMX console ( that's a bit more complicated - another 3 steps ) and enjoy viewing/setting attributes at runtime.



Why would you want to do that ? For fun and to monitor what's happening inside your app, or control it "life".



A HEAD build of modeler is required ( 1.0 won't work ).

TLD listeners and context initialization

Updated and reformated:



JSP spec allows TLDs in any .jar file and in WEB-INF. TLD may contain application listeners. That means tomcat must scan all the .jars, as well as all .TLDs. Even if the app has no JSP or taglib - we still must can all .jars.



This is a pretty expensive operation ( some small test show 2/3 of the startup time on some apps with a lot of jars going in .tld processing ). If you have a lot of .jars or tags - it will be visible.



The simple solution that I checked in will just cache the result in a .ser file in work, and avoid re-scaning if no .jar or .tld has changed. It's "ant-like" ( setters and exec) so it could be used in the deploy tool ( if we had one :-).



Someone could also add a flag to turn off tld scanning - for example if no JSPs are

used ( cocoon, velocity, etc ).



In future this should be extended to include a cache of the extension list ( scanned by the classloader ) and packages ( used to optimize loading ).



Rant:

This is one of the big errors in the JSP spec (IMO) - other java APIs use META-INF/services or the manifest so at least you read a single entry in a JAR. In JSP - you have to scan the entire META-INF dir, and look for all files with .tld extension. It is nice idea from developer perspective to be able to just drop jars - how you specify the descriptors is a problem.

int[] versus Integer or IntHolder

It's a simple ( and old ) trick. Instead of storing Integers or IntHolder objects in tables or doing weird things to return multiple primitive values - just use an int[1].



I don't know if such trivia is apropriate for a weblog, everyone probably knows it - but I've seen - and even written - code doing increment on Integers in a map ( which involves a lot of garbage and pretty ugly code ).

Thursday, January 23, 2003

Weblog and wiki

I wonder if anyone has combined Weblog/wiki. It would be pretty nice

* WikiNames for links - you need the name of the weblog and the title of the entry, no long URLs or numbers

* you can easily type lists, bold, etc

* wiki already has a lot of nice feature related to changes/updates to entries.



I find it very usefull to update previous weblog entries - and pretty hard to navigate through the changes ( if I read someone's log, and he changed few things - I'll have to search for it ).



Of course, wiki-style names (including category ) for the weblog entries would make the weblog easier to navigate.



That won't solve the biggest problem I have with weblogs - the complete mess of informations. I can read few high-traffic lists very fast - but I can hardly manage to read more than 5-6 weblogs. I'm just lost when I have to switch too many contexts - in a mail reader I just ignore threads I don't care about, I read related issues one after another ( thread view ). With weblog - while I find fascinating and very usefull to hear all the personal and generic informations, it costs me a lot of energy to click all the links and understand what's happening.



Update: Many thanks for the pointers. While navigating the links, I found a nice

weblog search engine. That's by far my worst problem with reading weblogs - reading by subject instead of by author.







* Test wiki

** Test1 '''test'''









* Test2

** Test1 ''test''



Load balancing in jk

Mladen Turk has proposed to use os load in jk2 load balancing. Finally. ( it has been proposed in the past - never implemented )

There are few details to clarify ( use backchannel or normal ajp or native RPC,

etc ), I'll update later.

Wednesday, January 22, 2003

Configurable TagPool in 50

The initial commit is in - seems to work. For configuration - I didn't have too many

choices, the JspServlet config was out of question, so this is the first use of

context init params .

is using precompiled JSPs - or knows how to set init params with jsp-page

The idea is to plug a different TagPool implementation and customize its parameters. The TagPool has a significant impact on JSP performance - and the user must be able to fine tune it and choose the right balance between memory use and speed.



I did few small optimizations in the default pool, and checked in the ThreadLocalPool - which has no sync, but keeps one pool per thread. If you have a JSP that has many tags and is very frequently accessed - it's worth trying it.



Configuration is done with init params for the context and servlet. Need to document it better - if it proves useful. I would try some JNDI - but it's too complex for my free time.

It's already done - nntp/rss gateway

I was thinking about the effects of blogs on existing mailing lists and news. And how to bridge the two worlds - I'm used to get information by a specific subjects, not by author.



Of course, someone else already did that on sourceforge - http://www.methodize.org/nntprss



Update: it works pretty well - not perfect, but far better than any other agregator I've

tried so far. Using a familiar interface ( the news reader ) is very nice, but the real potential is in bridging the blog and news/mail worlds. More on that later - I feel this is extremely important to explore - the relation between weblog and mail/news and the effect on communities.

Using context params or JNDI for configuration

Jasper is configured using servlet init params, specified in web.xml. You can

customize per context - by defining a jsp servlet using a special name.

The problem: that would make your app unportable, it won't work with other jsp engines.



An alternative would be to use context params and init params of the

compiled servlet - with some special names that would be used by jasper.



Jasper would read global options from the context init params, then override

with local init params defined in the servlet (with jsp-page) tag. Options

names can be generic "jsp.keepgenerated" or "jasper.keepgenerated".



As API, we should _stop_ the use of Options interface ( and add a method for

every option we add ) and start using a simple getInitParam( String ). Or even better -

use JMX to push config options into the JSP engine ( which can be jasper or something else ).



There are many benefits:

* simpler configuration

* relatively portable - other engines can just ignore jasper settings. Right now

if you make per-context settings - the app won't work with other engines

* other engines may support same options

* it works very well for runtime options ( like what kind of thread pool to use )

* it works very well for precompiled JSPs ( which bypass JspServlet )

* general config tools can be used ( anything that supports web.xml config ), you

don't need jasper-specific GUI

* very easy and flexible



Another (probably) alternative is to use JNDI env entries. This is more powerfull -

as it allows the most flexibility in configuration, and is also supported

in the standard web.xml.



Possible naming conventions:

* java:env/PREFIX/SERVLET_NAME/JSP_OPTION - per page option, affects only a specific page

* java:env/PREFIX/global/JSP_OPTION - per context option, affects all pages ( unless overriden)

* /PREFIX2/jsp/JSP_OPTION - per server ( defined in server.xml ).



Another option - more powerfull if preffer a single config store ( LDAP, JDK1.4 prefs, etc ):

* /PREFIX1/INSTANCEID/VHOST/WEBAPP/PREFIX2/SERVLET_NAME/OPTIONS -> that will configure a specific instance

* global settings by using "Default" for different components.



PREFIX will avoid conflicts with other jndi entries. ( "options" ? )



Same mechanism can be used for the DefaultServlet, and also to set per context attributes

that are typically set in server.xml.



Future versions of the servlet spec - or de-facto standards - may define some cross-container

options ( jsp_keep_generated seems like a good one ).



The container should have control on allowing some options to be specified by individual apps -

probably the policy would do it ( it's already available if JNDI is used ).

Tuesday, January 21, 2003

SingleThreadedModel may be usefull

While doing the per/thread hacks on the JSP tag pool - and same for collecting statistics on per thread basis, to avoid sync - I just realized - the so bashed STM servlet may be actually one of the best things...



Of course, STM doesn't avoid the complexity of programing in a multithreaded environment - and it is probably even more vulnerable to errors. It is also much more complex to implement and harder to use. If you use STM, you must find ways to agregate the data from the multiple instances, and it may use more memory.



The big benefit is that with STM you basically get a per/thread worker - and do a lot of stuff with no sync calls. A good implementation would do a sync to get the STM - but that can be improved by using ThreadLocal or the new ThreadWithAttributes.



Not doing sync calls at runtime - or at least during the critical path - will have a visible impact on loaded sites ( if done right ). You have to pay for that in terms of complexity when you want to take the fields from all those servlet instances and

use them - that's where you may need sync and a lot of complex code.

Precompile the JSPs

All JSPs should be compiled. Using normal JSPs in development mode is normal - but I couldn't find any good reason why an application would not compile all JSPs.



Besides speed ( which is important ! ), you would also know about syntax errors

or missing dependencies. Most of the "jsp is slow" stuff is based on the first experience with a JSP page - where you have to wait for the HelloWorld.jsp to compile. A lot of "jsp is hard" is based on the same experience - when the compiler is not found.



It would be nice if a future version of the JSP spec would require ( or strongly recommend that ) as well as include a way to use the jsp-file tag togheter with servlet-class. I would like to be able to keep the jsp-file declaration next to the precompiled class name ( now only one can be specified - AFAIK ).



Update: I've got the admin/ precompiled - now you get a decent experience when using it. In process we found few bugs in jspc - I fixed it, but Hans had a better one.



As a bonus for using JSPC, the jsps will be visible in the JMX console - you'll

know the number of requets, average and max times and all the fun stuff. And you bypass the JspServlet - another (small according to Remy ) improvment. I did a small attempt to JMX-enable regular servlets - but it is too complicated and its

really not worth it - I'm more convinced than ever that JspServlet is the wrong way

to support Jsps.



JspInterceptor has its own problems - but now there is a better/cleaner way to abstract the JSP engine - just use JMX. The JSP engine should be just a JMX mbean, with a simple operation "compile( contextPath, jsp )". This can be crafted

on top of the existing JspC - and will insure consistent behavior. Regular JMX attributes will remove the confusion in jasper configuration, and it may even provide nice statistics ( compile time, number of errors, etc).



Probably I should split this entry in two ( or 3 ).

Friday, January 17, 2003

Classloader fun with JDK1.4

The issue is that with JDK1.4.1, if endorsed.dirs is set you may run into many kinds of weird classloading problems. Some classes in the CLASSPATH or in a loader will not be found, even if you place them in jre/lib/ext.



The problem goes away as soon as you remove endorsed.dirs. I still need to find out why is this happening - but for now I fixed jasper to work with crimson ( the parser in JDK1.4 ) and I use tomcat without endorsed.



Update: Remy has the same problem, this time JMX can't be loaded. There is only one solution - stop using endorsed.dirs until Sun fixes it.



Update: It seems one ClassPath attribute in a jar included in endorsed was referencing other jars, and removing that fixes some of the problems. Still - endorsed is supposed to override a number of packages, it shouldn't affect the others.

Thursday, January 16, 2003

More JMX in tomcat5

Finally added thread information ( including what each thread is doing ), request info, servlet execution times, etc. The context and servlet now use JSR77 names.



I'm not sure if the naming for the other components is very good - but at least all jk and coyote components are exposed and may be controlled/viewed. I started to switch to NamingListener for callback inside jk - it should be easy to do, but I want to also get rid of JkMain and use JMX to put the components togheter.

Tuesday, January 14, 2003

Extension points in tomcat5

My search for the ideal extension/hook/callback mechanism is over. After many months ( or years - if I count the many variations on Interceptors and 3.2 refactorings ) of search, I give up on the combined Valve/Interceptor and any other variation.



All tomcat developers seem to agree that JMX is the "right" solution to configure and manage the components inside tomcat. I believe JMX notification is the "right" solution for extension points and callbacks.



The benefits seem clear:

* a clean, simple, well documented and unlikely to change model.

* it'll be easy to know what notification a component sends and what notifications a component handles ( using JMX metadata )

* consistent configuration

* one simple and neutral interface - completely independent of tomcat internals



The only problem is that the notification model doesn't support the Valve - but instead of trying to support both Valve and Interceptor, we should just keep Valves around for all modules that need this kind of functionality.

Monday, January 13, 2003

UserDatabase needs changes

Every time I look at the JMX console I remind myself that UserDatabase needs to be fixed before Tomcat5 is shiped. It is just very wrong - you can't scale if all users are in memory, and JMX will become useless if you have 1 mbean for each user.



If UserDatabase returns a simple Hashtable with the user info - it would be quite easy to use JSP-EL to display and manipulate the info. The ideal solution is to turn UserDatabase into a JNDI DirContext - but I'm not even 1/2 way with the JNDI proposal.

Ser2xml

A nice thing to have is an ant task to convert from .ser to .xml and back.



* good for debugging

* it could be used for axis - too many times existing classes have problem with

the serialization scheme based on getters.

* this would allow same code to work with both RMI/IIOP and soap

* it may solve some of the .ser problem - like versioning and unreadability.



The serial file does not include a lot of metadata - that is part of the class file. Storing this metadata ( only the field descriptors, to allow interpreting the .ser file ) would also be nice.



The speed of reading/writting .ser file is almost 2x of SAX - and involves almost no garbage ( compared with hundreds of objects per request ). The size is also much smaller - assuming transient is used correctly.



There are many bad things in RMI - and in java.io.Serialization, but some things are also very nice.

Starting to "blog"

Few days ago I decided it's time to do what everyone else is doing. It seems most people I know have blogs - and it's getting pretty hard to know what they think without reading the weblogs. Mailing lists are very quiet lately, and the substance seems to move to dozens of weblogs.



I can see the benefits ( at least for the weblog authors ) - you get to organize the information, you don't have to google mail archives to find your own postings. There is a lot of noise in the mailing lists - and even if I just seen the first inter-blog flame war, it seems less likely to involve more that a few (2?) people.



Ovidiu is the one who convinced me - and provided the space. As a new blogger, I have a lot of open questions about this.



First, how do people put so many links in their blogs ? Is it just cut&paste, or some magic ? What about the "talkback" feature ? I understand how it works, but not how to do it.



Another question is how to get content back into mailing lists ( and if ). A lot of people are using the lists to exchange info - the blog shouldn't take this awa



What is the correct way to update entries ? Just change the date ?



With mail lists, you can filter out or skip things. With blogs I need to find the relevant entries and people - and then read them. RSS seems the answer - I found blagg and I am able to get a feed, but that includes all the postings by a particular person. I need to sort them on categories - or I won't be able to track more than a few people.



Using blogs instead of mailing lists to exchange ideas seems possible - I can see how a proposal could be posted in a blog, feedback and votes accumulated - and development status updated. I made many proposals that get almost no feedback ( and never got implemented ) - and many that I implemented, sometimes in a very different way than I originally expected.



Update: For updating, it seems people just change the date - so it gets up in RSS.

For links - it seems cut&paste is the solution. I'm pretty worried about the relation between weblog and mail lists - and the information chaos, but that will be a separate entry.



Many thanks to all who read and commented on this.

Saturday, January 11, 2003

Ant: delayed task creation

One of the big changes for 1.6 will be the ComponentHelper, and the creation of tasks and types just before their use. In ant1.5 some of the tasks were created before use, but an attempt to create the task was made when the task was read. If a task definition

was available, then the task would be created at parse time. If not - it'll be created before use.



In addition, all core and optional tasks were instantiated ( Class.forName ) at startup. That resulted in 2-3 seconds wasted in creation of tasks we'll never use. Most of the time was actually spent processing stack traces for the optional tasks with missing dependencies.



There are few benefits of doing Class.forName() just before executing:

* we only create the tasks that we need. 2-3 seconds out of each build may look very little.

* with help from the classloader task, we can push jars into the loader ( like junit.jar) and then optional tasks would work even if their deps are not in ant/lib



The ComponentHelper provides 2 hook mechanisms: you can replace the default helper by setting ant.componentHelper reference ( preferably before calling ant), and you can chain a custom helper. This can be done from withing a normal ant task.



The hooks allow arbitrary "antlibs" to be hooked into ant.



What's missing:

* pass the namespace to the task, so antlibs can use it

* implement a default antlib using the namespace as a java package.



So far I've seen no major backward compatibility problems - gump seems happy

and all projects I use build without problems ( well, many times they don't - but not because of ant ).

Friday, January 10, 2003

import in ant

Now that the basic <import> has been added, there are few "small" details to settle down - before people start using it.



My opinion is that import should work well for existing build files - that would mean basedir must be relative to the build file ( by default ). It is obviously possible to customize the basedir for the imported.



In any case - if we want different imported files to use different basedirs, we need to keep track of the imported file location in each element. So regardless of the final decision, that's something to work on.



Jose Alberto seems to believe that only files specifically written for import would work - or that the default should support this use case.



In parallel, Alexey Solofnenko posted info about a preprocessor that supports a similar behavior. It's very intersting, but I don't think it'll help us too much - import is going to be a core task.



Properties are as usual a special case. It is possible to rename properties and support a prefix to avoid naming conflicts. This could be done with some special code in PropertyHelper or in the Property task itself - by looking at the current import environment.



The common use case of multiple build files working togheter doesn't seem to benefit from that - you want properties to be consistent, without big surprises. The imutability of properties is already a surprise for many people.

JMX support in ThreadPool

Few notes on the JMX-enabled ThreadPool effort.



The goal is to see what the pool and each thread is doing - and maybe stop hunged threads.



First, the thread pool itself should provide basic info - how many thread are active, how many threads died. Changing parameters ( increasing maxthreads ) at runtime is tricky - we avoid sync for performance, so we'll probably need to create a new array and switch.



The second issue is information about what's each thread doing. Currently I'm using ThreadWithAttributes - so we can just cast and have O(1) access to attributes. I already changed Http and Jk connectors to provide info about the stage and current request. I implemented the changes - seems ok, but need to cleanup before committing it.



One issue is security. Right now a guard is used ( the ThreadPool for now ), and only components that have access to it can read/write. JMX policy protection will be vital - and we need to find ways to protect if the SecurityManager is not enabled. With the current code, if user code gets access to JMX it can see what

all other threads are doing - a problem for web hosting environments.





Update:



Done. You can use it with any tomcat version ( if j-t-c is built from HEAD ). The name is "DOMAIN:type=ThreadPool,worker=http,name=http%8080" and

"DOMAIN:type=ThreadPool,worker=jk,name=0.0.0.0%8009" ( the local part needs cleanup ).



What is available is the string version (threadStatusString) and 2 String[] with the

current status ( parsing, service, ended, etc ) and last request. More info is collected per request, in RequestProcessors object ( bytes sent, etc ).

Since we may have a lot of threads, there is a single manageable object that controls all threads in a pool ( corresponding to the ThreadPoolMX )



Open issue: should we treat "status" as pure informative and let servlets set it ? Long running servlets could provide some debugging info.

Blog Archive