Indexing not working due failure in PDF Document
Posted: Sun Feb 25, 2007 10:47 am
At least I think this is the problem.
Upgrading from scalix 10.0.5 to Scalix 11.0.1 worked like a charm.
But i think my SIS-index is not building up correctly.
I did a omscan -Aa and omtidallu -M without errors. When I start the indexing using sxmkindex I have a 100% working Java process for some time and then the process stops. The load is around ~1.
I have the following error in my scalix-sis-indexer.log:
I rebooted the server and checked all services (omstat -a and -s shows all OK, (/etc/initi.d/pscalix-postgres show up and running and /etc/init.d/scalix-tomcat show stopped but ps aux | grep tomcat shows tomcat up. Bug is known to me) . everytime I run mkindex I encounter this error and it looks like the indexing stops.
After this error messages I have some lines showing
but I can't see any index progress which I saw ono other Servers I updated to Scalix 11.
Searching in Outlook works but in SWA I always get 0 results. I have no message in error.log in apache like error 404 on api file.
Anybody have the same problems and indexing goes on or did the indexing service stop?
Upgrading from scalix 10.0.5 to Scalix 11.0.1 worked like a charm.
But i think my SIS-index is not building up correctly.
I did a omscan -Aa and omtidallu -M without errors. When I start the indexing using sxmkindex I have a 100% working Java process for some time and then the process stops. The load is around ~1.
I have the following error in my scalix-sis-indexer.log:
Code: Select all
2007-02-25 15:33:30,177 ERROR [TP-Processor12] [InternalIndexerServlet.errorResponse:286] Index failed
com.scalix.index.api.IndexerException: Exception extracting PDF document text
at com.scalix.index.message.PDFExtractor.extract(PDFExtractor.java:31)
at com.scalix.index.message.IndexableMimeMessage.addContent(IndexableMimeMessage.java:296)
at com.scalix.index.message.IndexableMimeMessage.addContent(IndexableMimeMessage.java:282)
at com.scalix.index.message.IndexableMimeMessage.generateDocument(IndexableMimeMessage.java:64)
at com.scalix.index.manager.IndexManager.createDocument(IndexManager.java:467)
at com.scalix.index.manager.IndexManager.indexMessage(IndexManager.java:212)
at com.scalix.index.web.InternalIndexerServlet.doIndex(InternalIndexerServlet.java:197)
at com.scalix.index.web.InternalIndexerServlet.doPost(InternalIndexerServlet.java:174)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:709)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
at com.scalix.index.web.IndexerFilter.doFilter(IndexerFilter.java:39)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
at org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java:199)
at org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:282)
at org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:754)
at org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:684)
at org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:876)
at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684)
at java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException: expected='endobj' firstReadAttempt='endobjobj' secondReadAttempt='1441' org.pdfbox.io.PushBackInputStream@14cf1b9
at org.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:479)
at org.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:176)
at com.scalix.index.message.PDFExtractor.parseDocument(PDFExtractor.java:40)
at com.scalix.index.message.PDFExtractor.extract(PDFExtractor.java:22)
... 27 more
2007-02-25 15:33:43,691 INFO [QueueManager] [BatchUpdater.processMods:219] User 0441000034575054-442.48.891.88: added 0 content, added 2605 refs, deleted 0 content, deleted 2605 refs in 13325 ms
I rebooted the server and checked all services (omstat -a and -s shows all OK, (/etc/initi.d/pscalix-postgres show up and running and /etc/init.d/scalix-tomcat show stopped but ps aux | grep tomcat shows tomcat up. Bug is known to me) . everytime I run mkindex I encounter this error and it looks like the indexing stops.
After this error messages I have some lines showing
Code: Select all
2007-02-25 15:35:08,224 INFO [QueueManager] [BatchUpdater.processMods:219] User 0441000034575054-442.48.891.88: added 0 content, added 1603 refs, deleted 0 content, deleted 1603 refs in 3233 msbut I can't see any index progress which I saw ono other Servers I updated to Scalix 11.
Searching in Outlook works but in SWA I always get 0 results. I have no message in error.log in apache like error 404 on api file.
Anybody have the same problems and indexing goes on or did the indexing service stop?