Adventures in Ansible and Vagrant with Trellis

Was getting some frequent nginx failures on local dev. Having to reload the vagrant box. Gets old. Already ran vagrant destroy and reimported all the WP data once tonight.

We were using a dot dev domain, which I was recently told is soon to break modern browsers, and the the Roots discourse, someone suggested the error I was getting might be tied in to that.

I didn’t destroy the machine and am hoping to manually restore after having changes all the instances of example.dev to example.test in the group_vars/development directory.

Running this:

$ systemctl start nginx.service

Yields:

==== AUTHENTICATING FOR org.freedesktop.systemd1.manage-units ===
Authentication is required to start 'nginx.service'.
Authenticating as: vagrant,,, (vagrant)
Password: 
==== AUTHENTICATION COMPLETE ===
Job for nginx.service failed because the control process exited with error code. See "systemctl status nginx.service" and "journalctl -xe" for details.

And

systemctl status nginx.service

Yields

Mar 05 05:21:15 nydems systemd[1]: Starting A high performance web server and a reverse proxy server...
Mar 05 05:21:15 nydems nginx[2191]: nginx: [emerg] open() "/srv/www/nydems.dev/logs/access.log" failed (2: No such file 
Mar 05 05:21:15 nydems nginx[2191]: nginx: configuration file /etc/nginx/nginx.conf test failed
Mar 05 05:21:15 nydems systemd[1]: nginx.service: Control process exited, code=exited status=1
Mar 05 05:21:15 nydems systemd[1]: Failed to start A high performance web server and a reverse proxy server.
Mar 05 05:21:15 nydems systemd[1]: nginx.service: Unit entered failed state.
Mar 05 05:21:15 nydems systemd[1]: nginx.service: Failed with result 'exit-code'

You can run a test on the config files with this command:

sudo nginx -t -c /etc/nginx/nginx.conf

Which shows the same thing:

nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: [emerg] open() "/srv/www/site.dev/logs/access.log" failed (2: No such file or directory)
nginx: configuration file /etc/nginx/nginx.conf test failed

There’s a High Sierra issue. I’m just on Sierra so might not help. It’s an edit to Trellis/Vagrantfile.

Provisioning the Staging Server

Provisioning the server has proven to be challenging. First mistake I made is that I inadvertently blocked the only port that was accepting ssh connections in either the Fail2ban or Ferm. And looking back it looks like Ferm is a firewall, whereas Fail2ban scans log files and blocks IP addresses who’s activity appears of dubious intent. I did this because I wasn’t sure where the port configuration were being set.

If this was on Digital Ocean or something I probably would have destroyed the droplet and started over, but Silicon Valley Web Host went in and opened up port 22, and I reverted the provisioning settings for the port back to 22 all around.

Then Ansible was failing to install MariaDB. I forget what the error was, but I ended up manually installing mariadb-server and mariadb-client via apt the installer prompted me for a password, for which I entered the DB password from the group_vars directory for staging. I think it’s in the vault.yml file, which is of course encrypted via ansible-vault. I think Xenial is the version name for like x12 or so.

Then I was getting the error:

(1054, "Unknown column 'password' in 'field list'")
The full traceback is:
Traceback (most recent call last):
[SO}(https://stackoverflow.com/a/31122246/2223106), I logged in to MariaDB manually and added a password field:

update user set authentication_string=password('1111') where user='root';

Then Ansible couldn’t create the mysql root accounts for the four ports: localhost, ::1, 129.0.0.1, {{ host_name}} (which here was the IP address).

So we thought maybe we need to configure so that sudo users don’t need to enter a password. This is done–in ubuntu16–with an entry in the /ect/sudoers.d/ directory. The file can be named arbitrarily. Not sure if there are specifications to not having an extension, but mine was called myusername. What I entered was wrong at first and blocked us from sudoing at all:

ubuntu ALL=(ALL) 
NOPASSWD:ALL

Whoops! Sudo got snagged on that one. It’s supposed to be on a single line. Not only that (ubuntu ALL=(ALL) NOPASSWD:ALL) but the correct version of that only gives a user named ubuntu sudo access without a password.

There’s a tool you can use to edit sudo configurations that will check your edits for errors before committing:

sudoedit /etc/sudoers.d/mpsadmin

Once the tutor we got from UpWork (only because we’re in a hurry), Andrey and I had access again, “we” poked around in the server for config files that might offer some clues. Andrey first spun up a test server in an Amazon cloud account and ran the provisioning script there so that he could compare the two installations.

He configured MySql to work without password authentication.

# /etc/mysql/conf.d/mariadb.cnf
in [mysqld] section 
#skip-grant-tables

Just temporary of course, while provisioning through the complications.

Now when Ansible tried the Python2 install task:

usr/bin/python\r\nE: Could not get lock /var/lib/dpkg/lock - open (11: Resource temporarily unavailable)\r\nE: Unable to lock the administration directory (/var/lib/dpkg/), is another process using it?

SO I poked around:

ps aux | grep apt
root     22594  0.0  0.1  52700  3768 ?        S    06:32   0:00 sudo apt-get install -qq -y python-simplejson
root     22595  0.1  3.7 115160 76804 ?        S    06:32   0:37 apt-get install -qq -y python-simplejson
mpsadmin 23677  0.0  0.0  14224  1088 pts/0    S+   14:09   0:00 grep --color=auto apt

And killed the first two processes. Running Ansible again:

"stdout": "/usr/bin/python\r\nE: dpkg was interrupted, you must manually run 'sudo dpkg --configure -a' to correct the problem.

So I did: sudo dpkg --configure -a, which put me back into the prompt to setup MariaDB. It said that if I leave the password field blank it will not be updated, which is what I did. Password is still working as it had been.

Now it’s back to an error I had seen previously:

(1054, "Unknown column 'password' in 'field list'")

Which I can confirm:

Database changed
MariaDB [mysql]> show tables;
+---------------------------+
| Tables_in_mysql           |
+---------------------------+
| column_stats              |
| columns_priv              |
| db                        |
| engine_cost               |
| event                     |
| func                      |
| general_log               |
| gtid_executed             |
| gtid_slave_pos            |
| help_category             |
| help_keyword              |
| help_relation             |
| help_topic                |
| host                      |
| index_stats               |
| innodb_index_stats        |
| innodb_table_stats        |
| plugin                    |
| proc                      |
| procs_priv                |
| proxies_priv              |
| roles_mapping             |
| server_cost               |
| servers                   |
| slave_master_info         |
| slave_relay_log_info      |
| slave_worker_info         |
| slow_log                  |
| table_stats               |
| tables_priv               |
| time_zone                 |
| time_zone_leap_second     |
| time_zone_name            |
| time_zone_transition      |
| time_zone_transition_type |
| user                      |
+---------------------------+

One suggested SOlution was to create that field manually:

update user set authentication_string=password('1111') where user='root';

However I don’t think that worked last time.

I ended up hacking into the Ansible module for MySQL itself.

This file was located at

/Library/Python/2.7/site-packages/ansible/modules/database/mysql/mysql_user.py

One way to find it is locate mysql_user.py, and I actually forget how I figured out where the ansible package was stared. Wait.. simple: ansible --version. And the output:

ansible 2.4.3.0
  config file = None
  configured module search path = [u'/Users/mikekilmer/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /Library/Python/2.7/site-packages/ansible
  executable location = /usr/local/bin/ansible
  python version = 2.7.10 (default, Feb  7 2017, 00:08:15) [GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)]

I opened the mysql_user.py file in a text editor and added some debugging output like print("###################### password ") around line 326 where it tests to see if it’s an older or newer version of MySQL. The test was returning that this version of MySQL, mariadb-client-core-10.2 should have a password field.

I posted an issue on the Ansible github page and decided to remove MySQL entirely:

$ sudo systemctl stop mysql
$ sudo dpkg --configure -a
$ sudo apt purge "mysql*"

Answering YES to remove all data.

Now I was able to successfully install and configure it with Ansible.

Another dozen or so tasks completed up to TASK [nginx : Install Nginx], which is returning this error

'/usr/bin/apt-get -y -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options
::=--force-confold"     install 'nginx'' failed: No apport report written
because the error message indicates its a followup error from a previous
failure.
E: Sub-process /usr/bin/dpkg returned an error code (1)

I think the answer might be on the server:

$ sudo dpkg --configure -a
dpkg: error processing package nginx-full (--configure):
 subprocess installed post-installation script returned error exit status 1
dpkg: dependency problems prevent configuration of nginx:
 nginx depends on nginx-full (<< 1.13.6-0+xenial0.1~) | nginx-light (<< 1.13.6-0+xenial0.1~) | nginx-extras (<< 1.13.6-0+xenial0.1~); however:
  Package nginx-full is not configured yet.
  Package nginx-light is not installed.
  Package nginx-extras is not installed.
 nginx depends on nginx-full (>= 1.13.6-0+xenial0) | nginx-light (>= 1.13.6-0+xenial0) | nginx-extras (>= 1.13.6-0+xenial0); however:
  Package nginx-full is not configured yet.
  Package nginx-light is not installed.
  Package nginx-extras is not installed.

dpkg: error processing package nginx (--configure):
 dependency problems - leaving unconfigured
Errors were encountered while processing:
 nginx-full
 nginx

Hmmm.

sudo apt-get install -f

Same output, basically. This is also insightful about apt-get is a wrapper for dpkg.

Looks like (backing up and) manually edition the /var/lib/dpkg/status is the next step, removing:

 9861 Package: nginx-full
 9862 Status: install ok half-configured
 9863 Priority: optional
 9864 Section: httpd
 9865 Installed-Size: 1127
 9866 Maintainer: Debian Nginx Maintainers <pkg-nginx-maintainers@lists.alioth.debian.org>
 9867 Architecture: amd64
 9868 Source: nginx
 9869 Version: 1.13.6-0+xenial0
 9870 Provides: httpd, httpd-cgi, nginx
 9871 Depends: libnginx-mod-http-auth-pam (= 1.13.6-0+xenial0), libnginx-mod-http-dav-ext (= 1.13.6-0+xenial0), libnginx-mod-http-      echo (= 1.13.6-0+xenial0), libnginx-mod-http-geoip (= 1.13.6-0+xenial0), libnginx-mod-http-image-filter (= 1.13.6-0+xenial0)      , libnginx-mod-http-subs-filter (= 1.13.6-0+xenial0), libnginx-mod-http-upstream-fair (= 1.13.6-0+xenial0), libnginx-mod-htt      p-xslt-filter (= 1.13.6-0+xenial0), libnginx-mod-mail (= 1.13.6-0+xenial0), libnginx-mod-stream (= 1.13.6-0+xenial0), nginx-      common (= 1.13.6-0+xenial0), libc6 (>= 2.14), libpcre3, libssl1.0.0 (>= 1.0.2~beta3), zlib1g (>= 1:1.1.4)
 9872 Suggests: nginx-doc (= 1.13.6-0+xenial0)
 9873 Breaks: nginx (<< 1.4.5-1)
 9874 Conflicts: nginx-extras, nginx-light
 9875 Description: nginx web/proxy server (standard version)
 9876  Nginx ("engine X") is a high-performance web and reverse proxy server
 9877  created by Igor Sysoev. It can be used both as a standalone web server
 9878  and as a proxy to reduce the load on back-end HTTP or mail servers.
 9879  .
 9880  This package provides a version of nginx with the complete set of
 9881  standard modules included (but omitting some of those included in
 9882  nginx-extra).
 9883  .
 9884  STANDARD HTTP MODULES: Core, Access, Auth Basic, Auto Index, Browser, Empty
 9885  GIF, FastCGI, Geo, Limit Connections, Limit Requests, Map, Memcached, Proxy,
 9886  Referer, Rewrite, SCGI, Split Clients, UWSGI.
 9887  .
 9888  OPTIONAL HTTP MODULES: Addition, Auth Request, Charset, WebDAV, GeoIP, Gunzip,
 9889  Gzip, Gzip Precompression, Headers, HTTP/2, Image Filter, Index, Log, Real IP,
 9890  Slice, SSI, SSL, Stream, SSL Preread, Stub Status, Substitution, Thread  Pool,
 9891  Upstream, User ID, XSLT.
 9892  .
 9893  MAIL MODULES: Mail Core, Auth HTTP, Proxy, SSL, IMAP, POP3, SMTP.
 9894  .
 9895  THIRD PARTY MODULES: Auth PAM, DAV Ext, Echo, HTTP Substitutions, Upstream
 9896  Fair Queue.
 9897 Homepage: http://nginx.net

Let’s run Ansible again…

This time it failed at Task 4, trying to install Python.

As far as NGINX, this might be the solution:

sudo apt-get remove nginx* && sudo apt-get install nginx-full

Ah. Wait…

Mar 13 22:20:15 mps nginx[27407]: nginx: [emerg] listen() to [::]:80, backlog 511 failed (98: Address already in use)
Mar 13 22:20:15 mps nginx[27407]: nginx: [emerg] listen() to 0.0.0.0:80, backlog 511 failed (98: Address already in use)
Mar 13 22:20:15 mps nginx[27407]: nginx: [emerg] listen() to [::]:80, backlog 511 failed (98: Address already in use)
Mar 13 22:20:16 mps nginx[27407]: nginx: [emerg] listen() to 0.0.0.0:80, backlog 511 failed (98: Address already in use)
Mar 13 22:20:16 mps nginx[27407]: nginx: [emerg] listen() to [::]:80, backlog 511 failed (98: Address already in use)

Who?

# netstat -tlnp | fgrep -w :80
tcp6       0      0 :::80                   :::*                    LISTEN      1133/apache2 

Someone’s using that port. Apache2!

~# systemctl status -l apache2.service
● apache2.service - LSB: Apache2 web server
   Loaded: loaded (/etc/init.d/apache2; bad; vendor preset: enabled)
  Drop-In: /lib/systemd/system/apache2.service.d
           └─apache2-systemd.conf
   Active: active (running) since Thu 2018-03-01 16:45:23 UTC; 1 weeks 5 days ago
     Docs: man:systemd-sysv-generator(8)
    Tasks: 7
   Memory: 26.7M
      CPU: 1min 18.542s
   CGroup: /system.slice/apache2.service
           ├─ 1133 /usr/sbin/apache2 -k start
           ├─22159 /usr/sbin/apache2 -k start
           ├─22160 /usr/sbin/apache2 -k start
           ├─22161 /usr/sbin/apache2 -k start
           ├─22162 /usr/sbin/apache2 -k start
           ├─22163 /usr/sbin/apache2 -k start
           └─22942 /usr/sbin/apache2 -k start

Mar 11 14:25:01 mps apache2[7947]:  *
Mar 11 14:25:01 mps systemd[1]: Reloaded LSB: Apache2 web server.
Mar 12 14:25:01 mps systemd[1]: Reloading LSB: Apache2 web server.
Mar 12 14:25:01 mps apache2[19931]:  * Reloading Apache httpd web server apache2
Mar 12 14:25:01 mps apache2[19931]:  *
Mar 12 14:25:01 mps systemd[1]: Reloaded LSB: Apache2 web server.
Mar 13 14:25:02 mps systemd[1]: Reloading LSB: Apache2 web server.
Mar 13 14:25:02 mps apache2[22107]:  * Reloading Apache httpd web server apache2
Mar 13 14:25:02 mps apache2[22107]:  *
Mar 13 14:25:02 mps systemd[1]: Reloaded LSB: Apache2 web server.

Let’s stop it and try again.

systemctl stop apache2.service
$  apt install nginx-full
Reading package lists... Done
Building dependency tree       
Reading state information... Done
nginx-full is already the newest version (1.13.6-0+xenial0).
0 upgraded, 0 newly installed, 0 to remove and 21 not upgraded.
1 not fully installed or removed.
After this operation, 0 B of additional disk space will be used.
Do you want to continue? [Y/n] 
Setting up nginx-full (1.13.6-0+xenial0) ...

Success, perhaps. Why was Apache installed? No reference to Apache in the Trellis codebase.

 systemctl status nginx.service
● nginx.service - A high performance web server and a reverse proxy server
   Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2018-03-13 22:24:21 UTC; 2min 0s ago
     Docs: man:nginx(8)
  Process: 27527 ExecStart=/usr/sbin/nginx -g daemon on; master_process on; (code=exited, status=0/SUCCESS)
  Process: 27522 ExecStartPre=/usr/sbin/nginx -t -q -g daemon on; master_process on; (code=exited, status=0/SUCCESS)
 Main PID: 27529 (nginx)
    Tasks: 3
   Memory: 2.3M
      CPU: 61ms
   CGroup: /system.slice/nginx.service
           ├─27529 nginx: master process /usr/sbin/nginx -g daemon on; master_process on
           ├─27530 nginx: worker process                           
           └─27531 nginx: worker process                           

Mar 13 22:24:20 mps systemd[1]: Starting A high performance web server and a reverse proxy server...
Mar 13 22:24:21 mps systemd[1]: Started A high performance web server and a reverse proxy server.

Nginx is running. Let’s see if we can get past the TASK 4 python2 install. Yes! Whole provisioning finally completed!