Indexing not working due failure in PDF Document

Discuss the Scalix Server software

Moderators: ScalixSupport, admin

kurtbe
Posts: 74
Joined: Sun Aug 13, 2006 11:39 am
Location: Germany/Berlin
Contact:

Indexing not working due failure in PDF Document

Postby kurtbe » Sun Feb 25, 2007 10:47 am

At least I think this is the problem.

Upgrading from scalix 10.0.5 to Scalix 11.0.1 worked like a charm.
But i think my SIS-index is not building up correctly.

I did a omscan -Aa and omtidallu -M without errors. When I start the indexing using sxmkindex I have a 100% working Java process for some time and then the process stops. The load is around ~1.

I have the following error in my scalix-sis-indexer.log:

Code: Select all

2007-02-25 15:33:30,177 ERROR [TP-Processor12] [InternalIndexerServlet.errorResponse:286] Index failed
com.scalix.index.api.IndexerException: Exception extracting PDF document text
        at com.scalix.index.message.PDFExtractor.extract(PDFExtractor.java:31)
        at com.scalix.index.message.IndexableMimeMessage.addContent(IndexableMimeMessage.java:296)
        at com.scalix.index.message.IndexableMimeMessage.addContent(IndexableMimeMessage.java:282)
        at com.scalix.index.message.IndexableMimeMessage.generateDocument(IndexableMimeMessage.java:64)
        at com.scalix.index.manager.IndexManager.createDocument(IndexManager.java:467)
        at com.scalix.index.manager.IndexManager.indexMessage(IndexManager.java:212)
        at com.scalix.index.web.InternalIndexerServlet.doIndex(InternalIndexerServlet.java:197)
        at com.scalix.index.web.InternalIndexerServlet.doPost(InternalIndexerServlet.java:174)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:709)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
        at com.scalix.index.web.IndexerFilter.doFilter(IndexerFilter.java:39)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
        at org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java:199)
        at org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:282)
        at org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:754)
        at org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:684)
        at org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:876)
        at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684)
        at java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException: expected='endobj' firstReadAttempt='endobjobj' secondReadAttempt='1441' org.pdfbox.io.PushBackInputStream@14cf1b9
        at org.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:479)
        at org.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:176)
        at com.scalix.index.message.PDFExtractor.parseDocument(PDFExtractor.java:40)
        at com.scalix.index.message.PDFExtractor.extract(PDFExtractor.java:22)
        ... 27 more
2007-02-25 15:33:43,691  INFO [QueueManager] [BatchUpdater.processMods:219] User 0441000034575054-442.48.891.88: added 0 content, added 2605 refs, deleted 0 content, deleted 2605 refs in 13325 ms


I rebooted the server and checked all services (omstat -a and -s shows all OK, (/etc/initi.d/pscalix-postgres show up and running and /etc/init.d/scalix-tomcat show stopped but ps aux | grep tomcat shows tomcat up. Bug is known to me) . everytime I run mkindex I encounter this error and it looks like the indexing stops.

After this error messages I have some lines showing

Code: Select all

2007-02-25 15:35:08,224  INFO [QueueManager] [BatchUpdater.processMods:219] User 0441000034575054-442.48.891.88: added 0 content, added 1603 refs, deleted 0 content, deleted 1603 refs in 3233 ms


but I can't see any index progress which I saw ono other Servers I updated to Scalix 11.

Searching in Outlook works but in SWA I always get 0 results. I have no message in error.log in apache like error 404 on api file.

Anybody have the same problems and indexing goes on or did the indexing service stop?

ScalixSupport
Scalix
Scalix
Posts: 5503
Joined: Thu Mar 25, 2004 8:15 pm

Postby ScalixSupport » Wed Feb 28, 2007 8:49 am

Hi!

Could you run the Scalix installer and re-configure the following components:
Scalix DB
Scalix Tomcat
Scalix Messaging Services
Scalix Search and Index Service
Scalix Mobile Client

Now, check if you are able to search in SWA.

Thanks,
Subir

kurtbe
Posts: 74
Joined: Sun Aug 13, 2006 11:39 am
Location: Germany/Berlin
Contact:

Postby kurtbe » Wed Feb 28, 2007 8:51 am

Hello Subir,

will do this tonight and give reply tomorrow.

Thanks for the advise.

kurtbe
Posts: 74
Joined: Sun Aug 13, 2006 11:39 am
Location: Germany/Berlin
Contact:

Postby kurtbe » Wed Feb 28, 2007 4:39 pm

Hello Subir,

I reconfigured the scalix components without erros in the logfile.

After doing a mkindex on a user with two messages in his inbox I went to swa and tested the search function with no positive result.

"No results found"

Also I some users complained about the "Flag Message" Folder and Search is not working properly and finds not all flagged messages.

When Searching for flagged messages in Outlook also not all mails are showing up. The user is running the News V11 Connector with Smartcache enabled.

I did not tested yet to disable smartcache or to test it on a v10-connector test-PC

I started mkindex now and will see the results tomorrow...

kurtbe
Posts: 74
Joined: Sun Aug 13, 2006 11:39 am
Location: Germany/Berlin
Contact:

Postby kurtbe » Wed Feb 28, 2007 4:52 pm

Forgot to say:

I added the follwing IPs to the Hosts allowed to connect to the SIS:

internalIP,externalIP,127.0.0.1

and not

internalIP,127.0.0.1,externalIP

what will be the result if I add *?


Also my co-worker tryed to figure out the "Flagged" Mailbox using Connector 10.0.5 on another client-PC but also with no luck and only some of all flagged messages....

kanderson

Postby kanderson » Wed Feb 28, 2007 5:22 pm

Does this describe the situation?

http://bugzilla.scalix.com/show_bug.cgi?id=14664

There's a workaround at the bottom...

Kev.


Return to “Scalix Server”



Who is online

Users browsing this forum: No registered users and 10 guests

cron