This substantial paper will be very useful for researchers working in automated information retrieval (IR), but not for a general audience. It describes, in great detail, techniques for both monolingual IR in English and Japanese, and Japanese-English cross-language IR (Japanese queries, English documents).
The paper reports on retrieval experiments in the context of NTCIR-4, a Japanese retrieval testing program run by the National Institute of Informatics (NII), much like the Text Retrieval Conference (TREC) run by the National Institute for Standards and Technology (NIST) in the US. It describes retrieval systems developed in a collaboration between Justsystem Corporation (JSC) and Clairvoyance Corporation (CC).
The system uses natural language processing (NLP) techniques, including noun-phrase detection, with language-specific extensions, and rich translation resources. It explores issues of noun-phrase weighting, translation weighting, pseudo-relevance feedback, and term-weight merging. The experiments are carefully set up, exploring the interactions of variables through analysis of variance (ANOVA) and reporting statistical significance. A particularly welcome feature is error analysis that uses a typology of errors to gain insight into the contribution of various system components to the end result. The results are presented in many tables.
The system, testing procedures, and results are all well explained. There are no earth-shattering results here, but that is true for most papers reporting on IR experiments. There are too many variables influencing retrieval performance; results are often specific to a given context, and grand generalizations are hard to come by. What sets this paper apart is the clear framework used for testing various configurations of system components, and the carefully worked out testing methodology, especially the typology of errors for the failure analysis.