README.md: Which elasticsearch/binary client to point to for testing?
On the README.md testing section where it reads:
4. Set the environment variable DSS_TEST_ES_PATH to the path of the elasticsearch binary
on your machine.
Ideally that (elasticsearch client?) should have been installed previously? Perhaps point to the official package/install instructions?
I.e, there are a few to choose from on the latest Ubuntu distribution/AMI:
$ apt-cache search elasticsearch | grep client
golang-gopkg-olivere-elastic.v2-dev - Elasticsearch client for Golang
golang-gopkg-olivere-elastic.v3-dev - Elasticsearch client for Golang
golang-gopkg-olivere-elastic.v5-dev - Elasticsearch client for Golang
libsearch-elasticsearch-perl - Perl client for Elasticsearch
php-horde-elasticsearch - Horde ElasticSearch client
python-elasticsearch - Python client for Elasticsearch
python-elasticsearch-doc - Python client for Elasticsearch (Documentation)
python3-elasticsearch - Python client for Elasticsearch (Python3 version)
ruby-elasticsearch - Ruby client for connecting to an Elasticsearch cluster
ruby-elasticsearch-transport - low-level Ruby client for connecting to Elasticsearch
Fun side notes from Ubuntu-land w.r.t the official elastic.co elasticsearch client:
Following the elastic.co official instructions will result in a cryptic dpkg preinst error:
# apt-get install elasticsearch
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
elasticsearch
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 0 B/97.8 MB of archives.
After this operation, 151 MB of additional disk space will be used.
debconf: delaying package configuration, since apt-utils is not installed
(Reading database ... 56572 files and directories currently installed.)
Preparing to unpack .../elasticsearch_6.4.3_all.deb ...
dpkg: error processing archive /var/cache/apt/archives/elasticsearch_6.4.3_all.deb (--unpack):
new elasticsearch package pre-installation script subprocess returned error exit status 1
Errors were encountered while processing:
/var/cache/apt/archives/elasticsearch_6.4.3_all.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)
Which upon examining the .deb package (after installing binutils and ar x elasticsearch*.deb), will reveal that java is needed for the package... but for some reason it does not install it as a dependency. Therefore installing default-jre package should do the trick.
But then, after successfully installing java and elasticsearch, one would think that the elasticsearch client lives on the standard /usr/bin path? Nope! dpkg -L elasticsearch reveals that it's hiding in there instead:
/usr/share/elasticsearch/bin/elasticsearch
🤦♂️ 🤦♀️
Not strictly related to DSS, but would be good to document, getting all sorts of permission issues:
(hca-venv) ubuntu@ip-172-31-21-87:~/data-store$ tests/fixtures/populate.py --s3-bucket $DSS_S3_BUCKET_TEST_FIXTURES --gs-bucket $DSS_GS_BUCKET_TEST_FIXTURES
Fixtures populated. Run tests to ensure fixture integrity!
(hca-venv) ubuntu@ip-172-31-21-87:~/data-store$ echo $DSS_TEST_ES_PATH
/usr/share/elasticsearch/bin/elasticsearch
(hca-venv) ubuntu@ip-172-31-21-87:~/data-store$ /usr/share/elasticsearch/bin/elasticsearch --version
/usr/share/elasticsearch/bin/elasticsearch-env: line 70: /etc/default/elasticsearch: Permission denied
(hca-venv) ubuntu@ip-172-31-21-87:~/data-store$ sudo /usr/share/elasticsearch/bin/elasticsearch --version
OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
Version: 6.4.3, Build: default/deb/fe40335/2018-10-30T23:17:19.084789Z, JVM: 11.0.1
When overcoming those, tests seem to fail because the local elasticsearch instance doesn't seem to have the right permissions?
Why is the previously enacted ElasticSearchDomain on AWS not used for testing and instead a localhost elasticsearch server needs to be needed?:
======================================================================
ERROR: setUpClass (__main__.TestCollections)
----------------------------------------------------------------------
Traceback (most recent call last):
File "tests/test_collections.py", line 34, in setUpClass
cls.uuid, cls.version = cls._put(cls, cls.contents)
File "tests/test_collections.py", line 282, in _put
res.raise_for_status()
File "/home/ubuntu/hca-venv/lib/python3.7/site-packages/requests/models.py", line 939, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: http://127.0.0.1:43311/v1/collections?uuid=098bb36e-9428-4483-a177-62c06f905735&version=2018-11-09T06%3A17%3A19.064391&replica=aws
----------------------------------------------------------------------
Ran 0 tests in 5.827s
FAILED (errors=1)
make: *** [Makefile:45: tests/test_collections.py] Error 1
Why is the previously enacted ElasticSearchDomain on AWS not used for testing and instead a localhost elasticsearch server needs to be needed?:
We do use the ES on AWS for integration testing but for unit tests we use a local ES server. This is to prevent the shape of the ES indexes from changing out from under the developers when running tests for new branches. Optionally each dev could have their own dss deployment which is still possible, but testing would requires redeploying and takes longer.
@Bento007 Gotcha, but can't you just define "test" ES indexes (or just another AWS ES domain) and run the tests there instead of an unreliable, tricky to deploy, local ES instance?
@kozbo I wouldn't close this issue since it's not resolved (following the README does not lead to a functioning HCA dss-store system). Also I would reopen https://github.com/HumanCellAtlas/data-store/issues/1356 accordingly, IMHO. Here the "definition of done" should be:
- I follow the README.md to the letter.
- I get a working (test) system.
Related with https://github.com/HumanCellAtlas/data-store/issues/1659
Few thoughts:
- Related ticket on elasticsearch binary/client to use for local testing on Mac: https://github.com/HumanCellAtlas/data-store/issues/2459
- This question seems to be Ubuntu-specific (asking which aptitude package to install). (@DailyDreaming might have suggestions here), but I think in general it's a good idea to install elasticsearch manually, from source. Look at
.gitlab-ci.ymlandallspark.Dockerfileto see how Elasticsearch is downloaded, installed, and set up for tests - The data store Readme has some parts that are good and some parts that need more detail (in some cases, a few helpful hints, in other cases, whole steps that are missing and never mentioned). Which parts are which is not always clear until a new person tries to deploy a data store by following the Readme instructions.
@chmreid , on your last bullet point: In my experience deploying data-store and filing bugs for this project, most of the the data store steps on the README can be reasonably automated away (via terraform, scripts or a combination of them both).
None of those low level (confusing sometimes) details pointed out would be really needed if deployment bugs were ironed out as a focused refactoring/cleanup effort.
This is a clear case where automation would greatly help documentation (by reducing the latter to a barebones and effortless 1,2,3 deployment steps ladder).
The plan is to remove elasticsearch