Weblog for Costin Manolache: 01/01/2003

Wednesday, January 29, 2003

Authorization and security

I was surprised to find that people assume java.security requires the use of a SecurityManager ( == sandbox ).

The authorization security in java is based on Permissions. You can associate Permissions with any object - even if most of the time it is associated with code bases ( jars ). There is a very simple interface ( a single method: implies ) that verifies if a collection of Permissions that an object has allows a particular operation.

The rules are a bit tricky when computing the effective permission in a call - that's the most complex part, and I suspect this requires a security manager to be enabled. But you don't need this part in order to use the base objects.

For example ( what's beeing discussed on tomcat-dev ): all security constraints in a webapp, as well as additional rules ( context /foo can access context /bar, etc ) can be expressed as Permissions. A request will just check if a desired operation is allowed against the set of permissions - the stack trace or the association of

permissions with codebases is not used. All you need is the collection of permissions of the webapp, and the desired permissions.

Since this is a very critical component - it makes all the sense to use a set of simple and well tested interfaces and semantics. There is a problem in mapping what you want into the Permissions ( which carries limited information ) - but IMO it is well worth it.

Tuesday, January 28, 2003

Kde3.1 on RedHat 8

I got it to work - and decided it's time to move to GenToo. Compiling it from sources on my system was almost impossible - too many dependencies. Rawhide was painfull and in the end would have replaced essential libraries.

What I did - just used Slackware binary packages. Since they use tar.gz, it was trivial to install - just few changes in one of their postinst scripts. They use /opt/kde - so they don't interfere with any other system. After RPM hell, that was an amazing experience.

KDE3.1 works great, konqueror is fast - tab browsing works. The only thing it lacks is remembering passwords. What I've "lost" - the startup menu. Kappfinder

was able to create one - with _all_ the apps it found ( amazing how many apps were hidden ).

Monday, January 27, 2003

Java security

Interesting article on security, thanks Jon for posting it.

It shows ( again ) that java without sandbox is not safe - private fields, classloaders or any other tricks are just dangerous illussions of security. The security is inside the (sand)box.

Saturday, January 25, 2003

Tahoe

I'll be in Tahoe this weekend. No programming or internet.

Friday, January 24, 2003

JMX-enable a bean in 3 easy steps

Another java trivia:

1. Add jmx.jar and commons-modeler.jar in your CLASSPATH

2. Insert at the top of the file:

import org.apache.commons.modeler.Registry;

3. Add in constructor ( or somewhere ):

Registry reg=Registry.getRegistry();

reg.registerComponent( this, DOMAIN, TYPE, OBJECTNAME );

The type doesn't matter too much unless you have the optional mbeans-descriptors.xml in the same package. Start the JMX console ( that's a bit more complicated - another 3 steps ) and enjoy viewing/setting attributes at runtime.

Why would you want to do that ? For fun and to monitor what's happening inside your app, or control it "life".

A HEAD build of modeler is required ( 1.0 won't work ).

TLD listeners and context initialization

Updated and reformated:

JSP spec allows TLDs in any .jar file and in WEB-INF. TLD may contain application listeners. That means tomcat must scan all the .jars, as well as all .TLDs. Even if the app has no JSP or taglib - we still must can all .jars.

This is a pretty expensive operation ( some small test show 2/3 of the startup time on some apps with a lot of jars going in .tld processing ). If you have a lot of .jars or tags - it will be visible.

The simple solution that I checked in will just cache the result in a .ser file in work, and avoid re-scaning if no .jar or .tld has changed. It's "ant-like" ( setters and exec) so it could be used in the deploy tool ( if we had one :-).

Someone could also add a flag to turn off tld scanning - for example if no JSPs are

used ( cocoon, velocity, etc ).

In future this should be extended to include a cache of the extension list ( scanned by the classloader ) and packages ( used to optimize loading ).

Rant:

This is one of the big errors in the JSP spec (IMO) - other java APIs use META-INF/services or the manifest so at least you read a single entry in a JAR. In JSP - you have to scan the entire META-INF dir, and look for all files with .tld extension. It is nice idea from developer perspective to be able to just drop jars - how you specify the descriptors is a problem.

int[] versus Integer or IntHolder

It's a simple ( and old ) trick. Instead of storing Integers or IntHolder objects in tables or doing weird things to return multiple primitive values - just use an int[1].

I don't know if such trivia is apropriate for a weblog, everyone probably knows it - but I've seen - and even written - code doing increment on Integers in a map ( which involves a lot of garbage and pretty ugly code ).

Thursday, January 23, 2003

Weblog and wiki

I wonder if anyone has combined Weblog/wiki. It would be pretty nice

* WikiNames for links - you need the name of the weblog and the title of the entry, no long URLs or numbers

* you can easily type lists, bold, etc

* wiki already has a lot of nice feature related to changes/updates to entries.

I find it very usefull to update previous weblog entries - and pretty hard to navigate through the changes ( if I read someone's log, and he changed few things - I'll have to search for it ).

Of course, wiki-style names (including category ) for the weblog entries would make the weblog easier to navigate.

That won't solve the biggest problem I have with weblogs - the complete mess of informations. I can read few high-traffic lists very fast - but I can hardly manage to read more than 5-6 weblogs. I'm just lost when I have to switch too many contexts - in a mail reader I just ignore threads I don't care about, I read related issues one after another ( thread view ). With weblog - while I find fascinating and very usefull to hear all the personal and generic informations, it costs me a lot of energy to click all the links and understand what's happening.

Update: Many thanks for the pointers. While navigating the links, I found a nice

weblog search engine. That's by far my worst problem with reading weblogs - reading by subject instead of by author.

* Test wiki

** Test1 '''test'''

* Test2

** Test1 ''test''

Load balancing in jk

Mladen Turk has proposed to use os load in jk2 load balancing. Finally. ( it has been proposed in the past - never implemented )

There are few details to clarify ( use backchannel or normal ajp or native RPC,

etc ), I'll update later.

Wednesday, January 22, 2003

Configurable TagPool in 50

The initial commit is in - seems to work. For configuration - I didn't have too many

choices, the JspServlet config was out of question, so this is the first use of

context init params .

is using precompiled JSPs - or knows how to set init params with jsp-page

The idea is to plug a different TagPool implementation and customize its parameters. The TagPool has a significant impact on JSP performance - and the user must be able to fine tune it and choose the right balance between memory use and speed.

I did few small optimizations in the default pool, and checked in the ThreadLocalPool - which has no sync, but keeps one pool per thread. If you have a JSP that has many tags and is very frequently accessed - it's worth trying it.

Configuration is done with init params for the context and servlet. Need to document it better - if it proves useful. I would try some JNDI - but it's too complex for my free time.

It's already done - nntp/rss gateway

I was thinking about the effects of blogs on existing mailing lists and news. And how to bridge the two worlds - I'm used to get information by a specific subjects, not by author.

Of course, someone else already did that on sourceforge - http://www.methodize.org/nntprss

Update: it works pretty well - not perfect, but far better than any other agregator I've

tried so far. Using a familiar interface ( the news reader ) is very nice, but the real potential is in bridging the blog and news/mail worlds. More on that later - I feel this is extremely important to explore - the relation between weblog and mail/news and the effect on communities.

Using context params or JNDI for configuration

Jasper is configured using servlet init params, specified in web.xml. You can

customize per context - by defining a jsp servlet using a special name.

The problem: that would make your app unportable, it won't work with other jsp engines.

An alternative would be to use context params and init params of the

compiled servlet - with some special names that would be used by jasper.

Jasper would read global options from the context init params, then override

with local init params defined in the servlet (with jsp-page) tag. Options

names can be generic "jsp.keepgenerated" or "jasper.keepgenerated".

As API, we should _stop_ the use of Options interface ( and add a method for

every option we add ) and start using a simple getInitParam( String ). Or even better -

use JMX to push config options into the JSP engine ( which can be jasper or something else ).

There are many benefits:

* simpler configuration

* relatively portable - other engines can just ignore jasper settings. Right now

if you make per-context settings - the app won't work with other engines

* other engines may support same options

* it works very well for runtime options ( like what kind of thread pool to use )

* it works very well for precompiled JSPs ( which bypass JspServlet )

* general config tools can be used ( anything that supports web.xml config ), you

don't need jasper-specific GUI

* very easy and flexible

Another (probably) alternative is to use JNDI env entries. This is more powerfull -

as it allows the most flexibility in configuration, and is also supported

in the standard web.xml.

Possible naming conventions:

* java:env/PREFIX/SERVLET_NAME/JSP_OPTION - per page option, affects only a specific page

* java:env/PREFIX/global/JSP_OPTION - per context option, affects all pages ( unless overriden)

* /PREFIX2/jsp/JSP_OPTION - per server ( defined in server.xml ).

Another option - more powerfull if preffer a single config store ( LDAP, JDK1.4 prefs, etc ):

* /PREFIX1/INSTANCEID/VHOST/WEBAPP/PREFIX2/SERVLET_NAME/OPTIONS -> that will configure a specific instance

* global settings by using "Default" for different components.

PREFIX will avoid conflicts with other jndi entries. ( "options" ? )

Same mechanism can be used for the DefaultServlet, and also to set per context attributes

that are typically set in server.xml.

Future versions of the servlet spec - or de-facto standards - may define some cross-container

options ( jsp_keep_generated seems like a good one ).

The container should have control on allowing some options to be specified by individual apps -

probably the policy would do it ( it's already available if JNDI is used ).

Tuesday, January 21, 2003

SingleThreadedModel may be usefull

While doing the per/thread hacks on the JSP tag pool - and same for collecting statistics on per thread basis, to avoid sync - I just realized - the so bashed STM servlet may be actually one of the best things...

Of course, STM doesn't avoid the complexity of programing in a multithreaded environment - and it is probably even more vulnerable to errors. It is also much more complex to implement and harder to use. If you use STM, you must find ways to agregate the data from the multiple instances, and it may use more memory.

The big benefit is that with STM you basically get a per/thread worker - and do a lot of stuff with no sync calls. A good implementation would do a sync to get the STM - but that can be improved by using ThreadLocal or the new ThreadWithAttributes.

Not doing sync calls at runtime - or at least during the critical path - will have a visible impact on loaded sites ( if done right ). You have to pay for that in terms of complexity when you want to take the fields from all those servlet instances and

use them - that's where you may need sync and a lot of complex code.

Precompile the JSPs

All JSPs should be compiled. Using normal JSPs in development mode is normal - but I couldn't find any good reason why an application would not compile all JSPs.

Besides speed ( which is important ! ), you would also know about syntax errors

or missing dependencies. Most of the "jsp is slow" stuff is based on the first experience with a JSP page - where you have to wait for the HelloWorld.jsp to compile. A lot of "jsp is hard" is based on the same experience - when the compiler is not found.

It would be nice if a future version of the JSP spec would require ( or strongly recommend that ) as well as include a way to use the jsp-file tag togheter with servlet-class. I would like to be able to keep the jsp-file declaration next to the precompiled class name ( now only one can be specified - AFAIK ).

Update: I've got the admin/ precompiled - now you get a decent experience when using it. In process we found few bugs in jspc - I fixed it, but Hans had a better one.

As a bonus for using JSPC, the jsps will be visible in the JMX console - you'll

know the number of requets, average and max times and all the fun stuff. And you bypass the JspServlet - another (small according to Remy ) improvment. I did a small attempt to JMX-enable regular servlets - but it is too complicated and its

really not worth it - I'm more convinced than ever that JspServlet is the wrong way

to support Jsps.

JspInterceptor has its own problems - but now there is a better/cleaner way to abstract the JSP engine - just use JMX. The JSP engine should be just a JMX mbean, with a simple operation "compile( contextPath, jsp )". This can be crafted

on top of the existing JspC - and will insure consistent behavior. Regular JMX attributes will remove the confusion in jasper configuration, and it may even provide nice statistics ( compile time, number of errors, etc).

Probably I should split this entry in two ( or 3 ).

Friday, January 17, 2003

Classloader fun with JDK1.4

The issue is that with JDK1.4.1, if endorsed.dirs is set you may run into many kinds of weird classloading problems. Some classes in the CLASSPATH or in a loader will not be found, even if you place them in jre/lib/ext.

The problem goes away as soon as you remove endorsed.dirs. I still need to find out why is this happening - but for now I fixed jasper to work with crimson ( the parser in JDK1.4 ) and I use tomcat without endorsed.

Update: Remy has the same problem, this time JMX can't be loaded. There is only one solution - stop using endorsed.dirs until Sun fixes it.

Update: It seems one ClassPath attribute in a jar included in endorsed was referencing other jars, and removing that fixes some of the problems. Still - endorsed is supposed to override a number of packages, it shouldn't affect the others.

Thursday, January 16, 2003

More JMX in tomcat5

Finally added thread information ( including what each thread is doing ), request info, servlet execution times, etc. The context and servlet now use JSR77 names.

I'm not sure if the naming for the other components is very good - but at least all jk and coyote components are exposed and may be controlled/viewed. I started to switch to NamingListener for callback inside jk - it should be easy to do, but I want to also get rid of JkMain and use JMX to put the components togheter.

Tuesday, January 14, 2003

Extension points in tomcat5

My search for the ideal extension/hook/callback mechanism is over. After many months ( or years - if I count the many variations on Interceptors and 3.2 refactorings ) of search, I give up on the combined Valve/Interceptor and any other variation.

All tomcat developers seem to agree that JMX is the "right" solution to configure and manage the components inside tomcat. I believe JMX notification is the "right" solution for extension points and callbacks.

The benefits seem clear:

* a clean, simple, well documented and unlikely to change model.

* it'll be easy to know what notification a component sends and what notifications a component handles ( using JMX metadata )

* consistent configuration

* one simple and neutral interface - completely independent of tomcat internals

The only problem is that the notification model doesn't support the Valve - but instead of trying to support both Valve and Interceptor, we should just keep Valves around for all modules that need this kind of functionality.

Monday, January 13, 2003

UserDatabase needs changes

Every time I look at the JMX console I remind myself that UserDatabase needs to be fixed before Tomcat5 is shiped. It is just very wrong - you can't scale if all users are in memory, and JMX will become useless if you have 1 mbean for each user.

If UserDatabase returns a simple Hashtable with the user info - it would be quite easy to use JSP-EL to display and manipulate the info. The ideal solution is to turn UserDatabase into a JNDI DirContext - but I'm not even 1/2 way with the JNDI proposal.

Ser2xml

A nice thing to have is an ant task to convert from .ser to .xml and back.

* good for debugging

* it could be used for axis - too many times existing classes have problem with

the serialization scheme based on getters.

* this would allow same code to work with both RMI/IIOP and soap

* it may solve some of the .ser problem - like versioning and unreadability.

The serial file does not include a lot of metadata - that is part of the class file. Storing this metadata ( only the field descriptors, to allow interpreting the .ser file ) would also be nice.

The speed of reading/writting .ser file is almost 2x of SAX - and involves almost no garbage ( compared with hundreds of objects per request ). The size is also much smaller - assuming transient is used correctly.

There are many bad things in RMI - and in java.io.Serialization, but some things are also very nice.

Starting to "blog"

Few days ago I decided it's time to do what everyone else is doing. It seems most people I know have blogs - and it's getting pretty hard to know what they think without reading the weblogs. Mailing lists are very quiet lately, and the substance seems to move to dozens of weblogs.

I can see the benefits ( at least for the weblog authors ) - you get to organize the information, you don't have to google mail archives to find your own postings. There is a lot of noise in the mailing lists - and even if I just seen the first inter-blog flame war, it seems less likely to involve more that a few (2?) people.

Ovidiu is the one who convinced me - and provided the space. As a new blogger, I have a lot of open questions about this.

First, how do people put so many links in their blogs ? Is it just cut&paste, or some magic ? What about the "talkback" feature ? I understand how it works, but not how to do it.

Another question is how to get content back into mailing lists ( and if ). A lot of people are using the lists to exchange info - the blog shouldn't take this awa

What is the correct way to update entries ? Just change the date ?

With mail lists, you can filter out or skip things. With blogs I need to find the relevant entries and people - and then read them. RSS seems the answer - I found blagg and I am able to get a feed, but that includes all the postings by a particular person. I need to sort them on categories - or I won't be able to track more than a few people.

Using blogs instead of mailing lists to exchange ideas seems possible - I can see how a proposal could be posted in a blog, feedback and votes accumulated - and development status updated. I made many proposals that get almost no feedback ( and never got implemented ) - and many that I implemented, sometimes in a very different way than I originally expected.

Update: For updating, it seems people just change the date - so it gets up in RSS.

For links - it seems cut&paste is the solution. I'm pretty worried about the relation between weblog and mail lists - and the information chaos, but that will be a separate entry.

Many thanks to all who read and commented on this.

Saturday, January 11, 2003

Ant: delayed task creation

One of the big changes for 1.6 will be the ComponentHelper, and the creation of tasks and types just before their use. In ant1.5 some of the tasks were created before use, but an attempt to create the task was made when the task was read. If a task definition

was available, then the task would be created at parse time. If not - it'll be created before use.

In addition, all core and optional tasks were instantiated ( Class.forName ) at startup. That resulted in 2-3 seconds wasted in creation of tasks we'll never use. Most of the time was actually spent processing stack traces for the optional tasks with missing dependencies.

There are few benefits of doing Class.forName() just before executing:

* we only create the tasks that we need. 2-3 seconds out of each build may look very little.

* with help from the classloader task, we can push jars into the loader ( like junit.jar) and then optional tasks would work even if their deps are not in ant/lib

The ComponentHelper provides 2 hook mechanisms: you can replace the default helper by setting ant.componentHelper reference ( preferably before calling ant), and you can chain a custom helper. This can be done from withing a normal ant task.

The hooks allow arbitrary "antlibs" to be hooked into ant.

What's missing:

* pass the namespace to the task, so antlibs can use it

* implement a default antlib using the namespace as a java package.

So far I've seen no major backward compatibility problems - gump seems happy

and all projects I use build without problems ( well, many times they don't - but not because of ant ).

Friday, January 10, 2003

import in ant

Now that the basic <import> has been added, there are few "small" details to settle down - before people start using it.

My opinion is that import should work well for existing build files - that would mean basedir must be relative to the build file ( by default ). It is obviously possible to customize the basedir for the imported.

In any case - if we want different imported files to use different basedirs, we need to keep track of the imported file location in each element. So regardless of the final decision, that's something to work on.

Jose Alberto seems to believe that only files specifically written for import would work - or that the default should support this use case.

In parallel, Alexey Solofnenko posted info about a preprocessor that supports a similar behavior. It's very intersting, but I don't think it'll help us too much - import is going to be a core task.

Properties are as usual a special case. It is possible to rename properties and support a prefix to avoid naming conflicts. This could be done with some special code in PropertyHelper or in the Property task itself - by looking at the current import environment.

The common use case of multiple build files working togheter doesn't seem to benefit from that - you want properties to be consistent, without big surprises. The imutability of properties is already a surprise for many people.

JMX support in ThreadPool

Few notes on the JMX-enabled ThreadPool effort.

The goal is to see what the pool and each thread is doing - and maybe stop hunged threads.

First, the thread pool itself should provide basic info - how many thread are active, how many threads died. Changing parameters ( increasing maxthreads ) at runtime is tricky - we avoid sync for performance, so we'll probably need to create a new array and switch.

The second issue is information about what's each thread doing. Currently I'm using ThreadWithAttributes - so we can just cast and have O(1) access to attributes. I already changed Http and Jk connectors to provide info about the stage and current request. I implemented the changes - seems ok, but need to cleanup before committing it.

One issue is security. Right now a guard is used ( the ThreadPool for now ), and only components that have access to it can read/write. JMX policy protection will be vital - and we need to find ways to protect if the SecurityManager is not enabled. With the current code, if user code gets access to JMX it can see what

all other threads are doing - a problem for web hosting environments.

Update:

Done. You can use it with any tomcat version ( if j-t-c is built from HEAD ). The name is "DOMAIN:type=ThreadPool,worker=http,name=http%8080" and

"DOMAIN:type=ThreadPool,worker=jk,name=0.0.0.0%8009" ( the local part needs cleanup ).

What is available is the string version (threadStatusString) and 2 String[] with the

current status ( parsing, service, ended, etc ) and last request. More info is collected per request, in RequestProcessors object ( bytes sent, etc ).

Since we may have a lot of threads, there is a single manageable object that controls all threads in a pool ( corresponding to the ThreadPoolMX )

Open issue: should we treat "status" as pure informative and let servlets set it ? Long running servlets could provide some debugging info.

Weblog for Costin Manolache