Page 1 of 1

Indexing not working due failure in PDF Document

Posted: Sun Feb 25, 2007 10:47 am
by kurtbe
At least I think this is the problem.

Upgrading from scalix 10.0.5 to Scalix 11.0.1 worked like a charm.
But i think my SIS-index is not building up correctly.

I did a omscan -Aa and omtidallu -M without errors. When I start the indexing using sxmkindex I have a 100% working Java process for some time and then the process stops. The load is around ~1.

I have the following error in my scalix-sis-indexer.log:

Code: Select all

2007-02-25 15:33:30,177 ERROR [TP-Processor12] [InternalIndexerServlet.errorResponse:286] Index failed
com.scalix.index.api.IndexerException: Exception extracting PDF document text
        at com.scalix.index.message.PDFExtractor.extract(PDFExtractor.java:31)
        at com.scalix.index.message.IndexableMimeMessage.addContent(IndexableMimeMessage.java:296)
        at com.scalix.index.message.IndexableMimeMessage.addContent(IndexableMimeMessage.java:282)
        at com.scalix.index.message.IndexableMimeMessage.generateDocument(IndexableMimeMessage.java:64)
        at com.scalix.index.manager.IndexManager.createDocument(IndexManager.java:467)
        at com.scalix.index.manager.IndexManager.indexMessage(IndexManager.java:212)
        at com.scalix.index.web.InternalIndexerServlet.doIndex(InternalIndexerServlet.java:197)
        at com.scalix.index.web.InternalIndexerServlet.doPost(InternalIndexerServlet.java:174)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:709)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
        at com.scalix.index.web.IndexerFilter.doFilter(IndexerFilter.java:39)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
        at org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java:199)
        at org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:282)
        at org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:754)
        at org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:684)
        at org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:876)
        at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684)
        at java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException: expected='endobj' firstReadAttempt='endobjobj' secondReadAttempt='1441' org.pdfbox.io.PushBackInputStream@14cf1b9
        at org.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:479)
        at org.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:176)
        at com.scalix.index.message.PDFExtractor.parseDocument(PDFExtractor.java:40)
        at com.scalix.index.message.PDFExtractor.extract(PDFExtractor.java:22)
        ... 27 more
2007-02-25 15:33:43,691  INFO [QueueManager] [BatchUpdater.processMods:219] User 0441000034575054-442.48.891.88: added 0 content, added 2605 refs, deleted 0 content, deleted 2605 refs in 13325 ms


I rebooted the server and checked all services (omstat -a and -s shows all OK, (/etc/initi.d/pscalix-postgres show up and running and /etc/init.d/scalix-tomcat show stopped but ps aux | grep tomcat shows tomcat up. Bug is known to me) . everytime I run mkindex I encounter this error and it looks like the indexing stops.

After this error messages I have some lines showing

Code: Select all

2007-02-25 15:35:08,224  INFO [QueueManager] [BatchUpdater.processMods:219] User 0441000034575054-442.48.891.88: added 0 content, added 1603 refs, deleted 0 content, deleted 1603 refs in 3233 ms


but I can't see any index progress which I saw ono other Servers I updated to Scalix 11.

Searching in Outlook works but in SWA I always get 0 results. I have no message in error.log in apache like error 404 on api file.

Anybody have the same problems and indexing goes on or did the indexing service stop?

Posted: Wed Feb 28, 2007 8:49 am
by ScalixSupport
Hi!

Could you run the Scalix installer and re-configure the following components:
Scalix DB
Scalix Tomcat
Scalix Messaging Services
Scalix Search and Index Service
Scalix Mobile Client

Now, check if you are able to search in SWA.

Thanks,
Subir

Posted: Wed Feb 28, 2007 8:51 am
by kurtbe
Hello Subir,

will do this tonight and give reply tomorrow.

Thanks for the advise.

Posted: Wed Feb 28, 2007 4:39 pm
by kurtbe
Hello Subir,

I reconfigured the scalix components without erros in the logfile.

After doing a mkindex on a user with two messages in his inbox I went to swa and tested the search function with no positive result.

"No results found"

Also I some users complained about the "Flag Message" Folder and Search is not working properly and finds not all flagged messages.

When Searching for flagged messages in Outlook also not all mails are showing up. The user is running the News V11 Connector with Smartcache enabled.

I did not tested yet to disable smartcache or to test it on a v10-connector test-PC

I started mkindex now and will see the results tomorrow...

Posted: Wed Feb 28, 2007 4:52 pm
by kurtbe
Forgot to say:

I added the follwing IPs to the Hosts allowed to connect to the SIS:

internalIP,externalIP,127.0.0.1

and not

internalIP,127.0.0.1,externalIP

what will be the result if I add *?


Also my co-worker tryed to figure out the "Flagged" Mailbox using Connector 10.0.5 on another client-PC but also with no luck and only some of all flagged messages....

Posted: Wed Feb 28, 2007 5:22 pm
by kanderson
Does this describe the situation?

http://bugzilla.scalix.com/show_bug.cgi?id=14664

There's a workaround at the bottom...

Kev.