Unable to create full text index using Whoosh

Ian · September 5, 2023, 2:18am

Hello

I use the directive rhodecode-index --instance-name=community-1, create Index, Kill will be displayed at the end, and Index cannot be completely created.

What additional information do I need to provide ?

Any ideas what the problem might be?

SYSTEM INFO

CentOS Linux release 8.0.1905 (Core)

RHODECODE CONTROL VERSION: 1.24.4

NAME : community-1
STATUS : RUNNING
logs: /root/.rccontrol/community-1/community.log
VERSION : 4.27.1 Community
VCS : vcsserver-1
URL : http://0.0.0.0:10020
CONFIG: /root/.rccontrol/community-1/rhodecode.ini
NAME : vcsserver-1
STATUS : RUNNING
log : /root/.rccontrol/vcsserver-1/vcsserver.log
VERSION : 4.27.1 VCSServer
URL : http://127.0.0.1:10010
CONFIG : /root/.rccontrol/vcsserver-1/vcsserver.ini

rhodecode-support · September 5, 2023, 7:19am

Most likely this is due to not enough RAM available to build the full text search

Ian · September 6, 2023, 2:21am

Hello RhodeCode Team

My server RAM is 8GB.
How much RAM is recommended to add?

rhodecode-support · September 6, 2023, 8:58am

Depends on the size of repositories, maybe try doubling it for indexing ?

Ian · September 7, 2023, 3:27am

Hello RhodeCode Team

I increased the RAM to 16GB and still killed the process halfway through the execution.

Is it because the amount of data is too large?
Or the index can be created in segments.
For example: Create svn Index r01~r1000 this time and r1001~r2000 next time.

Ian · September 12, 2023, 4:01am

Hello RhodeCode Team

set

My search_mapping.ini settings

commin_process_limit = 200
repo_limit = 1

The instructions I set:
rhodecode-index --instance-name=community-1 --mapping=path/search_mapping.ini

But doesn’t work,Is there an error in the command?

rhodecode-support · September 12, 2023, 9:34am

Hi,

look like there’s a typo in the config. Please make sure this has this keys and structure:

https://code.rhodecode.com/rhodecode-tools-ce/files/default/rhodecode_tools/commands/configs/mapping.ini?at=default

Ian · September 13, 2023, 6:08am

Hello RhodeCode Team

I copied rhodecode-tools-ce Files · rhodecode_tools/commands/configs/mapping.ini · RhodeCode Free Hosting 1 into my Linux search mapping.ini

Set commit_fetch_limit = 100
commit_process_limit = 100
repo_limit = 1

But still cannot get the settings of search mapping.ini

commit info still shows process_limit = 10000 and repo process limit:-1

Ian · September 13, 2023, 6:15am

Hello RhodeCode Team

Supplement: Attached pictures

Ian · September 13, 2023, 8:16am

Hello RhodeCode Team

Is this related to the python version?
My python is 2.7.15

Is there a recommended python version?

rhodecode-support · September 13, 2023, 9:00pm

python version is ok, we’re not sure what it can be but you can also specify the settings via run, please se --help to see options for invocation.

Ian · September 15, 2023, 8:06am

Hello RhodeCode Team

I used the parameters --commit fetch limit=100 --commit process_limit=100 --repo limit=1 to successfully set the rhodecode-index, but the scan still does not release the memory.

For example, if I set scan commit_process_limit=100, after version 001~100, the memory will not be released and will continue to accumulate.

I’m truly sorry, but I need to trouble you once more.

justinmassiot · December 6, 2023, 10:04am

Hello,

Same thing here, I have repos with big files. Even if I skip those with skip_files= and skip_files_content=, the rhodecode-index is still killed prematurely.
Does skip_files really prevent RAM usage? Or is there a way to not kill the process even if out of RAM? (use swap for example)

justinmassiot · December 13, 2023, 1:25pm

I personally gave up on indexing file contents (commits only) because of the low amount of RAM on my server. I did this with a command switch instead of the mapping file:

rhodecode-index --index-types=commits

Source: Force WHOOSH to re-index through rcstack - #12 by justinmassiot