Mudanças entre as edições de "Dataverse:Solr"

De BrapciWiki
Ir para navegação Ir para pesquisar
Linha 82: Linha 82:
  
 
para que o script seja rodado e ativado durante o boot
 
para que o script seja rodado e ativado durante o boot
 +
 +
== Indexação ==
 +
=== Full Reindex ===
 +
There are two ways to perform a full reindex of the Dataverse installation search index. Starting with a “clear” ensures a completely clean index but involves downtime. Reindexing in place doesn’t involve downtime but does not ensure a completely clean index.
 +
 +
=== Clear and Reindex ===
 +
Index and Database Consistency
 +
Get a list of all database objects that are missing in Solr, and Solr documents that are missing in the database:
 +
 +
curl http://localhost:8080/api/admin/index/status
 +
 +
Remove all Solr documents that are orphaned (ie not associated with objects in the database):
 +
 +
curl http://localhost:8080/api/admin/index/clear-orphans
 +
 +
Clearing Data from Solr
 +
Please note that the moment you issue this command, it will appear to end users looking at the root Dataverse installation page that all data is gone! This is because the root Dataverse installation page is powered by the search index.
 +
 +
curl http://localhost:8080/api/admin/index/clear
 +
 +
Start Async Reindex
 +
Please note that this operation may take hours depending on the amount of data in your system. This known issue is being tracked at https://github.com/IQSS/dataverse/issues/50
 +
 +
curl http://localhost:8080/api/admin/index
 +
 +
Reindex in Place
 +
An alternative to completely clearing the search index is to reindex in place.
 +
 +
Clear Index Timestamps
 +
curl -X DELETE http://localhost:8080/api/admin/index/timestamps
 +
 +
Start or Continue Async Reindex
 +
If indexing stops, this command should pick up where it left off based on which index timestamps have been set, which is why we start by clearing these timestamps above. These timestamps are stored in the dvobject database table.
 +
 +
curl http://localhost:8080/api/admin/index/continue

Edição das 23h12min de 13 de agosto de 2021

useradd -m solr
su solr
cd /usr/local/solr
wget https://archive.apache.org/dist/lucene/solr/8.8.1/solr-8.8.1.tgz
tar xvzf solr-8.8.1.tgz
cd solr-8.8.1
cp -r server/solr/configsets/_default server/solr/collection1

You should already have a “dvinstall.zip” file that you downloaded from https://github.com/IQSS/dataverse/releases . Unzip it into /tmp. Then copy the files into place:

cp dvinstall/schema*.xml /usr/local/solr/solr-8.8.1/server/solr/collection1/conf cp dvinstall/solrconfig.xml /usr/local/solr/solr-8.8.1/server/solr/collection1/conf


<Set name="requestHeaderSize"><Property name="solr.jetty.request.header.size" default="102400" /></Set>

Collections

cd /home/dataverse/
cp dvinstall/schema*.xml /usr/local/solr/solr-8.8.1/server/solr/collection1/conf
cp dvinstall/solrconfig.xml /usr/local/solr/solr-8.8.1/server/solr/collection1/conf

File solr.service

pico /etc/systemd/system/solr.service
[Unit]
Description = Apache Solr
After = syslog.target network.target remote-fs.target nss-lookup.target
[Service]
User = solr
Type = forking
WorkingDirectory = /usr/local/solr/solr-8.8.1
ExecStart = /usr/local/solr/solr-8.8.1/bin/solr start -m 1g -j "jetty.host=127.0.0.1"
ExecStop = /usr/local/solr/solr-8.8.1/bin/solr stop
LimitNOFILE=65000
LimitNPROC=65000
Restart=on-failure
[Install]
WantedBy = multi-user.target

Você não deve rodar o Solr como root. Crie um usuario chamado Solr um diretorio no qual instalar o mesmo.

useradd solr -m
mkdir /usr/local/solr
chown solr:solr /usr/local/solr
su - solr
cd /usr/local/solr
wget https://archive.apache.org/dist/lucene/solr/7.7.2/solr-7.7.2.tgz
tar xvzf solr-7.7.2.tgz
cd solr-7.7.2
cp -r server/solr/configsets/_default server/solr/collection1

Utilizando o arquivo "dvinstall.zip" baixado na etapa de pre-requisitos. extraia ele em /tmp se ainda não o tiver feito. Então copie os arquivos nos seguintes diretórios.

cp /home/dataverse/dvinstall/schema*.xml /usr/local/solr/solr-7.7.2/server/solr/collection1/conf
cp /home/dataverse/dvinstall/solrconfig.xml /usr/local/solr/solr-7.7.2/server/solr/collection1/conf

O Dataverse requer uma mudança no jetty.xml que vem junto com o Solr. Edite e aumentando requestHeaderSize de 8192 para 102400

nano /usr/local/solr/solr-7.7.2/server/etc/jetty.xml 

<Set name="requestHeaderSize"><Property name="solr.jetty.request.header.size" default="102400" /></Set>

O Solr vai avisar sobre precisar aumentar o numero de descritores de arquivos e processos maximos em um ambiente de produção mas ainda vai rodar com os padrões. O dataverse ja aumenta esses padrões para os niveis recomentados ao adicionar a linha ulimit -n 65000 ao script de inicialização, mas para maior eficiencia, coloque o seguinte no arquivo

nano /etc/security/limits.conf
solr soft nproc 65000
solr hard nproc 65000
solr soft nofile 65000
solr hard nofile 65000

Criar a coleção collection1 no Solr

echo "name=collection1" > /usr/local/solr/solr-7.7.2/server/solr/collection1/core.properties

Usando o Solr como servico

cp /home/dataverse/dataverse-5.3/doc/sphinx-guides/source/_static/installation/files/etc/systemd/solr.service /etc/systemd/system/.
systemctl daemon-reload
systemctl start solr.service
systemctl enable solr.service

para que o script seja rodado e ativado durante o boot

Indexação

Full Reindex

There are two ways to perform a full reindex of the Dataverse installation search index. Starting with a “clear” ensures a completely clean index but involves downtime. Reindexing in place doesn’t involve downtime but does not ensure a completely clean index.

Clear and Reindex

Index and Database Consistency Get a list of all database objects that are missing in Solr, and Solr documents that are missing in the database:

curl http://localhost:8080/api/admin/index/status

Remove all Solr documents that are orphaned (ie not associated with objects in the database):

curl http://localhost:8080/api/admin/index/clear-orphans

Clearing Data from Solr Please note that the moment you issue this command, it will appear to end users looking at the root Dataverse installation page that all data is gone! This is because the root Dataverse installation page is powered by the search index.

curl http://localhost:8080/api/admin/index/clear

Start Async Reindex Please note that this operation may take hours depending on the amount of data in your system. This known issue is being tracked at https://github.com/IQSS/dataverse/issues/50

curl http://localhost:8080/api/admin/index

Reindex in Place An alternative to completely clearing the search index is to reindex in place.

Clear Index Timestamps curl -X DELETE http://localhost:8080/api/admin/index/timestamps

Start or Continue Async Reindex If indexing stops, this command should pick up where it left off based on which index timestamps have been set, which is why we start by clearing these timestamps above. These timestamps are stored in the dvobject database table.

curl http://localhost:8080/api/admin/index/continue