Indexer doesn't seems to index file content


#1

Dear Support,

We are running RhodeCode Community Edition, version 4.12.4.
It seems that the indexer is not processing file contents. The search page has no hits when you select ‘file contents’.
‘Commits’ and ‘File names’ working fine.

This is the mapping.ini file:

[__DEFAULT__]

index_types = commits,files

# default patterns for indexing files and content of files.
# Binary files are skipped by default.

# Files to index
index_files = *

# Do not index these file types
skip_files = *.svg, *.log, *.dump, *.txt

# Index both file types and their content
index_files_content = *.c, *.h, *.cpp, *.ini, *.py, *.cs

# Index file names, but not file content
skip_files_content = *.svg

# Force rebuilding an index from scratch. Each repository will be rebuild
# from scratch with a global flag. Use local flag to rebuild single repos
force = false

# Do not index files larger than 385KB
max_filesize = 512KB

# Limit commit indexing to 500 per batch
commit_parse_limit = 500

# Limit each index run to 25 repos
repo_limit = -1

# __INCLUDE__ is more important that __EXCLUDE__.

[__INCLUDE__]
# Include all repos with these names

[__EXCLUDE__]
# Do not include the following repo in index

external/* = 1
binary/* = 1
archived/* = 1
deprecated-dotnet/* = 1
forks/* = 1
linux* = 1
kvgm_externals/* = 1
external* = 1
*kernel* = 1

# Each repo that needs special indexing is a separate section below.
# In each section set the options to override the global configuration
# parameters above.
# If special settings are not configured, the global configuration values
# above are inherited. If no special repositories are
# defined here RhodeCode will use the API to ask for all repositories

; Example configuration for RhodeCode Indexer
; Each repo name that needs indexing is a section, in each section there's
; an option to override the global configuration parameters. If local configuration
; is not give, global configuration values are inherited. If no reositories are defined
; here RhodeCode will use API to ask for all repositories
;[repo_name]
;local_conf1=
;local_conf2=
;
;[repo_name2]
;local_conf1=
;local_conf2=

Edit: a fresh new indexer db was created this morning, just to be sure. But still there is no way to find file contents.

Thanks and regards,
Tamas


#2

Might be related to this entry above, please try index_files = *,

Also did you try without mapping file ?, the indexing log should indiciate any skip/allow please take a look into it as well for some further debugging.


#3

Marcin,

I’m working together with tamas on this issue.

Adding the “,” to the line did not make a difference. Documentation is not clear about if the “,” is necessary on line endings. Can you explain?

File contents do get indexed when i run the indexer without a mapping file.


#4

I thought maybe the missing , was a bug in the config.

Could you provide the log output of indexer ? It should state why certain files were skipped for content.


#5

You mean the messages the indexer puts to stdout? I can’t find a log file?


#6

How do you run the indexer ?

You can put --logfile=/path/to/log to the indexer to write the logs into a file.


#7

Example output below. You can see it skips all files for re-indexing

2018-08-17 11:39:41,039 - rhodecode_tools - INFO - FETCH     [xxxx] repository configuration
2018-08-17 11:39:41,513 - rhodecode_tools - INFO - FETCH     [xxxx] repository configuration, done [0.474s]
2018-08-17 11:39:41,514 - rhodecode_tools - DEBUG - Initialized rhodecode_tools.lib.fts_index.whoosh_schema index called COMMIT_INDEX at /home/gr_sadmin/.rccontrol/community-1/data/index
2018-08-17 11:39:41,521 - rhodecode_tools - INFO - FETCH     [xxxx] commit info (1000 commits per call. Server reported 51 total commits, parse limit is -1)
2018-08-17 11:39:41,926 - rhodecode_tools - INFO -           Fetched 1000 commits [51 total][fetch:0001]
2018-08-17 11:39:41,926 - rhodecode_tools - INFO - FETCH     [xxxx] commit info, done [0.406s]
2018-08-17 11:39:41,927 - rhodecode_tools - DEBUG - Initialized rhodecode_tools.lib.fts_index.whoosh_schema index called COMMIT_INDEX at /home/gr_sadmin/.rccontrol/community-1/data/index
2018-08-17 11:39:41,928 - rhodecode_tools - DEBUG - BUILD     [xxxx] building commits index; starting at rev: 8da4e2a650da
2018-08-17 11:39:41,929 - rhodecode_tools - DEBUG -     >> <hg_commit at 000000:8da4e2a650da> 1/51
2018-08-17 11:39:41,931 - rhodecode_tools - DEBUG -     >> <hg_commit at 000001:65a2d6fa8168> 2/51
2018-08-17 11:39:41,932 - rhodecode_tools - DEBUG -     >> <hg_commit at 000002:decaba841d4a> 3/51
2018-08-17 11:39:41,932 - rhodecode_tools - DEBUG -     >> <hg_commit at 000003:b4422955c974> 4/51
2018-08-17 11:39:41,933 - rhodecode_tools - DEBUG -     >> <hg_commit at 000004:cd4177b4dd6f> 5/51
2018-08-17 11:39:41,934 - rhodecode_tools - DEBUG -     >> <hg_commit at 000005:a949f5aebef3> 6/51
2018-08-17 11:39:41,935 - rhodecode_tools - DEBUG -     >> <hg_commit at 000006:9cc3ce288c32> 7/51
2018-08-17 11:39:41,936 - rhodecode_tools - DEBUG -     >> <hg_commit at 000007:335cbd593924> 8/51
2018-08-17 11:39:41,937 - rhodecode_tools - DEBUG -     >> <hg_commit at 000008:93c5d2dc3d57> 9/51
2018-08-17 11:39:41,938 - rhodecode_tools - DEBUG -     >> <hg_commit at 000009:ecc57c354640> 10/51
2018-08-17 11:39:41,938 - rhodecode_tools - DEBUG -     >> <hg_commit at 000010:2ef369114daf> 11/51
2018-08-17 11:39:41,939 - rhodecode_tools - DEBUG -     >> <hg_commit at 000011:dbd399224895> 12/51
2018-08-17 11:39:41,940 - rhodecode_tools - DEBUG -     >> <hg_commit at 000012:1a99b4e8836f> 13/51
2018-08-17 11:39:41,941 - rhodecode_tools - DEBUG -     >> <hg_commit at 000013:ab81c8709d21> 14/51
2018-08-17 11:39:41,942 - rhodecode_tools - DEBUG -     >> <hg_commit at 000014:4681537fc9a8> 15/51
2018-08-17 11:39:41,942 - rhodecode_tools - DEBUG -     >> <hg_commit at 000015:91de6c934615> 16/51
2018-08-17 11:39:41,943 - rhodecode_tools - DEBUG -     >> <hg_commit at 000016:430dec96ba61> 17/51
2018-08-17 11:39:41,944 - rhodecode_tools - DEBUG -     >> <hg_commit at 000017:2da06b0d4c37> 18/51
2018-08-17 11:39:41,945 - rhodecode_tools - DEBUG -     >> <hg_commit at 000018:25ab0809e9b3> 19/51
2018-08-17 11:39:41,946 - rhodecode_tools - DEBUG -     >> <hg_commit at 000019:3d485cf46487> 20/51
2018-08-17 11:39:41,947 - rhodecode_tools - DEBUG -     >> <hg_commit at 000020:e07d798a8dae> 21/51
2018-08-17 11:39:41,947 - rhodecode_tools - DEBUG -     >> <hg_commit at 000021:a0fd91d004e7> 22/51
2018-08-17 11:39:41,948 - rhodecode_tools - DEBUG -     >> <hg_commit at 000022:85c6f63e1334> 23/51
2018-08-17 11:39:41,949 - rhodecode_tools - DEBUG -     >> <hg_commit at 000023:300db2a3cc34> 24/51
2018-08-17 11:39:41,950 - rhodecode_tools - DEBUG -     >> <hg_commit at 000024:803907a70882> 25/51
2018-08-17 11:39:41,951 - rhodecode_tools - DEBUG -     >> <hg_commit at 000025:b593dc49a0e3> 26/51
2018-08-17 11:39:41,951 - rhodecode_tools - DEBUG -     >> <hg_commit at 000026:a60e789d7142> 27/51
2018-08-17 11:39:41,952 - rhodecode_tools - DEBUG -     >> <hg_commit at 000027:977d960c01b3> 28/51
2018-08-17 11:39:41,953 - rhodecode_tools - DEBUG -     >> <hg_commit at 000028:8f742083b491> 29/51
2018-08-17 11:39:41,954 - rhodecode_tools - DEBUG -     >> <hg_commit at 000029:053fe27bdf23> 30/51
2018-08-17 11:39:41,954 - rhodecode_tools - DEBUG -     >> <hg_commit at 000030:e565901d1573> 31/51
2018-08-17 11:39:41,955 - rhodecode_tools - DEBUG -     >> <hg_commit at 000031:f9761bf34e57> 32/51
2018-08-17 11:39:41,956 - rhodecode_tools - DEBUG -     >> <hg_commit at 000032:b90e94c3953f> 33/51
2018-08-17 11:39:41,957 - rhodecode_tools - DEBUG -     >> <hg_commit at 000033:98f1fce43d7b> 34/51
2018-08-17 11:39:41,958 - rhodecode_tools - DEBUG -     >> <hg_commit at 000034:21c6fd56c9e4> 35/51
2018-08-17 11:39:41,959 - rhodecode_tools - DEBUG -     >> <hg_commit at 000035:12e5d553cbc6> 36/51
2018-08-17 11:39:41,960 - rhodecode_tools - DEBUG -     >> <hg_commit at 000036:348c60e984c0> 37/51
2018-08-17 11:39:41,962 - rhodecode_tools - DEBUG -     >> <hg_commit at 000037:032db6d2aad1> 38/51
2018-08-17 11:39:41,964 - rhodecode_tools - DEBUG -     >> <hg_commit at 000038:2a5f84280e8b> 39/51
2018-08-17 11:39:41,966 - rhodecode_tools - DEBUG -     >> <hg_commit at 000039:3177dc8d3a4a> 40/51
2018-08-17 11:39:41,967 - rhodecode_tools - DEBUG -     >> <hg_commit at 000040:4441b2a3d5e2> 41/51
2018-08-17 11:39:41,969 - rhodecode_tools - DEBUG -     >> <hg_commit at 000041:f190f2e820ea> 42/51
2018-08-17 11:39:41,970 - rhodecode_tools - DEBUG -     >> <hg_commit at 000042:13cf6eec4bf9> 43/51
2018-08-17 11:39:41,971 - rhodecode_tools - DEBUG -     >> <hg_commit at 000043:8f7e8316c787> 44/51
2018-08-17 11:39:41,973 - rhodecode_tools - DEBUG -     >> <hg_commit at 000044:863e34d659ec> 45/51
2018-08-17 11:39:41,974 - rhodecode_tools - DEBUG -     >> <hg_commit at 000045:10faa8c51d78> 46/51
2018-08-17 11:39:41,975 - rhodecode_tools - DEBUG -     >> <hg_commit at 000046:f3cda15541ee> 47/51
2018-08-17 11:39:41,976 - rhodecode_tools - DEBUG -     >> <hg_commit at 000047:3963310104a8> 48/51
2018-08-17 11:39:41,977 - rhodecode_tools - DEBUG -     >> <hg_commit at 000048:88bbca6ff524> 49/51
2018-08-17 11:39:41,978 - rhodecode_tools - DEBUG -     >> <hg_commit at 000049:1ef958b1cee7> 50/51
2018-08-17 11:39:41,978 - rhodecode_tools - DEBUG -     >> <hg_commit at 000050:431865a524e4> 51/51
2018-08-17 11:39:41,979 - rhodecode_tools - DEBUG - SUMMARY   [xxxx] indexed 51 commits
2018-08-17 11:39:41,979 - rhodecode_tools - DEBUG - >> COMMITTING COMMIT CHANGES <<
2018-08-17 11:39:42,122 - rhodecode_tools - INFO - FETCH     [xxxx] file tree info (@commit_id: `431865a524e4cbb324a742f3662b48a872c6c67f`)
2018-08-17 11:39:43,198 - rhodecode_tools - INFO - FETCH     [xxxx] file tree info, done [1.076s]
2018-08-17 11:39:43,199 - rhodecode_tools - DEBUG - Initialized rhodecode_tools.lib.fts_index.whoosh_schema index called FILE_INDEX at /home/gr_sadmin/.rccontrol/community-1/data/index
2018-08-17 11:39:43,200 - rhodecode_tools - DEBUG - building file tree index for `/repos/xxxx` @commit_id:431865a524e4cbb324a742f3662b48a872c6c67f
2018-08-17 11:39:43,200 - rhodecode_tools - DEBUG - checking repo `xxxx` for file changes 
2018-08-17 11:39:43,204 - rhodecode_tools - DEBUG - files (38) marked for indexing
2018-08-17 11:39:43,204 - rhodecode_tools - DEBUG - add_to_index: set([u'xxxx/debian/control', u'xxxx/INSTALL', u'xxxx/src/implement/ldr_info.c', u'xxxx/xxxx.includes', u'xxxx/debian/isbpro/DEBIAN/control', u'xxxx/.hgsub', u'xxxx/README', u'xxxx/inc/implement/iss_defs.h', u'xxxx/.hgignore', u'xxxx/.project', u'xxxx/src/implement/main.cc', u'xxxx/src/implement/iss_defs.cc', u'xxxx/debian/isbpro/DEBIAN/md5sums', u'xxxx/debian/COPYING', u'xxxx/.cproject', u'xxxx/debian/isbpro.substvars', u'xxxx/autogen.sh', u'xxxx/inc/implement/diag_cfg.h', u'xxxx/inc/implement/sysconf.h', u'xxxx/bin/Makefile.am', u'xxxx/debian/rules', u'xxxx/.hgsubstate', u'xxxx/debian/isbpro/usr/share/doc/isbpro/changelog.gz', u'xxxx/debian/isbpro/usr/bin/isbpro', u'xxxx/src/implement/diag_cfg.cc', u'xxxx/Makefile.am', u'xxxx/debian/changelog', u'xxxx/xxxx.creator', u'xxxx/configure.ac', u'xxxx/inc/implement/ldr_info.h', u'xxxx/COPYING', u'xxxx/xxxx.files', u'xxxx/ChangeLog', u'xxxx/xxxx.config', u'xxxx/compile', u'xxxx/debian/compat', u'xxxx/AUTHORS', u'xxxx/NEWS'])
2018-08-17 11:39:43,204 - rhodecode_tools - DEBUG - files (0) marked for re-indexing
2018-08-17 11:39:43,204 - rhodecode_tools - DEBUG - re_index: set([])
2018-08-17 11:39:43,204 - rhodecode_tools - DEBUG - files (0) marked for deletion
2018-08-17 11:39:43,204 - rhodecode_tools - DEBUG - delete_from_index: set([])
2018-08-17 11:39:43,205 - rhodecode_tools - DEBUG - BUILD     [xxxx] Now starting file indexing 
2018-08-17 11:39:43,205 - rhodecode_tools - DEBUG -     >> [! SKIP   ] xxxx/debian/control
2018-08-17 11:39:43,205 - rhodecode_tools - DEBUG -     >> [! SKIP   ] xxxx/INSTALL
2018-08-17 11:39:43,206 - rhodecode_tools - DEBUG -     >> [! SKIP   ] xxxx/src/implement/ldr_info.c
2018-08-17 11:39:43,207 - rhodecode_tools - DEBUG -     >> [! SKIP   ] xxxx/xxxx.includes
2018-08-17 11:39:43,208 - rhodecode_tools - DEBUG -     >> [! SKIP   ] xxxx/debian/isbpro/DEBIAN/control
2018-08-17 11:39:43,208 - rhodecode_tools - DEBUG -     >> [! SKIP   ] xxxx/.hgsub
2018-08-17 11:39:43,209 - rhodecode_tools - DEBUG -     >> [! SKIP   ] xxxx/README
2018-08-17 11:39:43,210 - rhodecode_tools - DEBUG -     >> [! SKIP   ] xxxx/debian/COPYING
2018-08-17 11:39:43,210 - rhodecode_tools - DEBUG -     >> [! SKIP   ] xxxx/.hgignore
2018-08-17 11:39:43,211 - rhodecode_tools - DEBUG -     >> [! SKIP   ] xxxx/.project
2018-08-17 11:39:43,212 - rhodecode_tools - DEBUG -     >> [! SKIP   ] xxxx/src/implement/main.cc
2018-08-17 11:39:43,213 - rhodecode_tools - DEBUG -     >> [! SKIP   ] xxxx/src/implement/iss_defs.cc
2018-08-17 11:39:43,213 - rhodecode_tools - DEBUG -     >> [! SKIP   ] xxxx/debian/isbpro/DEBIAN/md5sums
2018-08-17 11:39:43,214 - rhodecode_tools - DEBUG -     >> [! SKIP   ] xxxx/inc/implement/iss_defs.h
2018-08-17 11:39:43,215 - rhodecode_tools - DEBUG -     >> [! SKIP   ] xxxx/debian/isbpro.substvars
2018-08-17 11:39:43,215 - rhodecode_tools - DEBUG -     >> [! SKIP   ] xxxx/autogen.sh
2018-08-17 11:39:43,216 - rhodecode_tools - DEBUG -     >> [! SKIP   ] xxxx/inc/implement/diag_cfg.h
2018-08-17 11:39:43,217 - rhodecode_tools - DEBUG -     >> [! SKIP   ] xxxx/inc/implement/sysconf.h
2018-08-17 11:39:43,217 - rhodecode_tools - DEBUG -     >> [! SKIP   ] xxxx/xxxx.creator
2018-08-17 11:39:43,218 - rhodecode_tools - DEBUG -     >> [! SKIP   ] xxxx/debian/rules
2018-08-17 11:39:43,219 - rhodecode_tools - DEBUG -     >> [! SKIP   ] xxxx/.hgsubstate
2018-08-17 11:39:43,219 - rhodecode_tools - DEBUG -     >> [! SKIP   ] xxxx/xxxx.config
2018-08-17 11:39:43,220 - rhodecode_tools - DEBUG -     >> [! SKIP   ] xxxx/debian/isbpro/usr/bin/isbpro
2018-08-17 11:39:43,221 - rhodecode_tools - DEBUG -     >> [! SKIP   ] xxxx/src/implement/diag_cfg.cc
2018-08-17 11:39:43,221 - rhodecode_tools - DEBUG -     >> [! SKIP   ] xxxx/Makefile.am
2018-08-17 11:39:43,222 - rhodecode_tools - DEBUG -     >> [! SKIP   ] xxxx/debian/changelog
2018-08-17 11:39:43,224 - rhodecode_tools - DEBUG -     >> [! SKIP   ] xxxx/bin/Makefile.am
2018-08-17 11:39:43,225 - rhodecode_tools - DEBUG -     >> [! SKIP   ] xxxx/configure.ac
2018-08-17 11:39:43,226 - rhodecode_tools - DEBUG -     >> [! SKIP   ] xxxx/inc/implement/ldr_info.h
2018-08-17 11:39:43,226 - rhodecode_tools - DEBUG -     >> [! SKIP   ] xxxx/COPYING
2018-08-17 11:39:43,227 - rhodecode_tools - DEBUG -     >> [! SKIP   ] xxxx/xxxx.files
2018-08-17 11:39:43,228 - rhodecode_tools - DEBUG -     >> [! SKIP   ] xxxx/ChangeLog
2018-08-17 11:39:43,228 - rhodecode_tools - DEBUG -     >> [! SKIP   ] xxxx/debian/isbpro/usr/share/doc/isbpro/changelog.gz
2018-08-17 11:39:43,229 - rhodecode_tools - DEBUG -     >> [! SKIP   ] xxxx/debian/compat
2018-08-17 11:39:43,230 - rhodecode_tools - DEBUG -     >> [! SKIP   ] xxxx/compile
2018-08-17 11:39:43,231 - rhodecode_tools - DEBUG -     >> [! SKIP   ] xxxx/.cproject
2018-08-17 11:39:43,231 - rhodecode_tools - DEBUG -     >> [! SKIP   ] xxxx/AUTHORS
2018-08-17 11:39:43,232 - rhodecode_tools - DEBUG -     >> [! SKIP   ] xxxx/NEWS
2018-08-17 11:39:43,233 - rhodecode_tools - DEBUG - SUMMARY   [xxxx] indexed 38 files, 0 files with content, 38 files without content
2018-08-17 11:39:43,233 - rhodecode_tools - DEBUG - >> COMMITTING FILE CHANGES <<
2018-08-17 11:39:43,298 - rhodecode_tools - INFO - 
2018-08-17 11:39:43,298 - rhodecode_tools - INFO - PROCESSED [xxxx] 6/861 repo process limit:-1

#8

Hmm SKIP marker is set when a filepath should not be indexed, most of those files match, except the *.h files.
What is your repo name instead of xxxx, does it MAYBE match any of the exlucude patterns?

Not sure how to proceed with this except trying to comment out sections from your mapping file to see which actually produces the SKIP


#9

The problem seems to be the following line:

index_files_content = *.c, *.h, *.cpp, *.ini, *.py, *.cs

This fails:

index_files_content = *.c, *.h,

This works:

index_files_content = *.c,

#10

The repo name does not match the exclude patterns. Any suggestions how to fix the index_files_content setting?


#11

We’ll do some checking, maybe it’s a problem in the indexer. We could release a new version to fix it. I’ll update here once we know something,


#12

We can offer a patch for this problem if you wish to receive it ASAP.

Best,


#13

Marcin,

I ran the indexer with:

index_files_content = *

Previously it would fail due to high memory usage but this time it finished without problems. I may have excluded some big repositories since the last time it failed.

The reason i wanted to selectively index file contents was the high memory usage, but now all file contents are indexed which is only better.

So i guess we are fine now. We will receive the patched indexer with regular rhodecode updates.

Thanks for the support!

Paul