Monday 28 August 2023

Family Tree investigations #1: Discovery

Situation: family with roots across England, Wales, Scotland, New England.

Known information: family records, printed genealogies, family tree maintained in RootsMagic and generated mostly via Ancestry in 2019-2021.

Objectives: extend currently available information, validate family records, try to identify tools and methods that are of general usefulness, focus on data quality.

Initial question as ‘pipe clean’: identify given name of Tinkham 5 greats grandfather: Amanda Ruth Fisher (self) > (my mother) > Cleaveland > Aldrich > Darling > Mercy Adeline Greene > Sarah Ann Tinkham, b.1799 > ??? Tinkham (m. Sarah Eddy) (all this information from family records).

High level findings: I would say the above lineage was backed up by online records to a just about adequate extent, although with a high reliance on secondary sources.

The main published Eddy genealogy is the key secondary source, though it admits to its own doubts about the relevant Tinkham/Eddy marriage. Adin Ballou’s History of Milford, Massachusetts contains useful and interesting information about Mercy’s sisters Harriet Newell Greene and Abbie Greene Comstock. But neither of these provides a given name for Sarah Ann’s father, nor any evidence (beyond Harriet and Abbie having had a sister now deceased) that Mercy was in fact Sarah Ann's daughter.

The Tinkham Biographical Index (which took me ages to find online) does not cast any light; indeed it potentially confuses the issue in relation to Sarah Ann’s brother Welcome Eddy Tinkham, stating a possible paternity which (when combined with census data) contradicts the Eddy genealogy; this spurious paternity was subsequently picked up and published elsewhere as fact. (The Biographical Index does contain interesting information about Welcome’s eldest son Z.B., who was an early postmaster in California.)

The FamilySearch Family Tree is questionable in many relevant cases: persons/dates, linkages, sourcing.

I am still investigating census records. The 1790-1840 US censuses are indexed by Head of Family only, and no attempt appears to have been made anywhere to digitise the relevant table information (numbers of household members by sex and age) for easy cross-checking purposes. The only printed information that is available at the right level appears to be for the 1790 census.

The required given name was identified in the FamilySearch Family Tree (and elsewhere on the internet) as Enoch. The ‘source’ for this appears to be 2 entries in the International Genealogical Index ‘contributed’ section, both of which are pretty dodgy in other ways; relevant primary source images may exist (the relevant microfilm appears to have been digitised) but I have not yet been able to view them (working on that). The 2 Enoch Tinkhams of whom there is any online record at all (in both cases well sourced) were born far too late. I have removed the Enoch given name from the relevant FS Family Tree entry.

Sarah Ann Tinkham appears to have no primary sources available online under her maiden name except for one relating to her daughter Harriet’s death. The vital records of Mendon, Mass. do show Sarah Ann's death under, obviously, her married name. (They also show Mercy's marriage, which I suppose provides a bit of a circumstantial link between Sarah Ann and Mercy.)

Making things worse, Sarah Eddy has no primary sources available online, and her brother Thomas Jenckes Eddy has very limited ones, except in both cases for their father Thomas’ will; her older siblings, meanwhile, have plenty of primary sources beyond Thomas’ will.

The fact that many of the key events appear to have taken place in New York State, where record keeping was minimal at the relevant time, is probably not helping at all.

Thomas Jenckes Eddy can be traced via a town history and the 1810 census to Kinderhook, Columbia County, NY, where his father died that year (evidence in Providence Gazette and in probate records). Local militia records (backed up by the same town history) show Thomas Jenckes Eddy as an Ensign, and that he died in 1812. (Also, FamilySearch unhelpfully had him indexed on the 1810 census as Thomas I., despite the handwriting of the rest of the page clearly indicating that Thomas J. would have been more accurate. I have corrected this.)

FamilySearch also has two of the Eddy Genealogy's Thomases conflated into one, despite the Eddy Genealogy and Providence Gazette providing clear evidence to the contrary; I am working on correcting this.

Data Quality: Many of the published secondary sources clearly take enormous care, although primary sources are not usually cited.

FamilySearch is very clear about the level of data quality of the different types of information it makes available.

Many researchers appear to take ‘contributed’ sources as gospel. In many cases information is posted to the web without any sources at all. You can track erroneous facts as they make their way wider and wider.

There appears to be very little curation of collaborative data stores, and the whole focus feels like it’s on quantity rather than quality. I was expecting the FamilySearch Family Tree to work like Wikipedia with strong moderation and source checking, but very clearly not.

Services used: American Ancestors (3 month subscription): useful for access to the Mayflower Descendant periodical, Rhode Island Cemeteries, and some ‘Vital Records’ type secondary sources that don’t appear elsewhere.

Brief flirtations with Ancestry, MyHeritage, and FindMyPast, none of whose UIs gave me anything like the functionality of FamilySearch, and all of which would have cost money to take forward, though FMP has the ability to ‘pay as you go’ for some records. (Disclaimer: I was heavily involved, pre sales and as solution architect, with FMP when it first launched in 2003 as 1837online.)

FamilySearch: extensive use of Family Tree, Records, Catalog, Books. Some searching of Genealogies (especially ‘contributed’ IGI entries as mentioned above). Considerable trawling of Images.

Other (for secondary sources): various digital libraries, e.g. US Census publications, HathiTrust; Google.

Some fundamental problems with FamilySearch online search: It doesn’t let you query the indexes properly. Some really basic things I couldn’t do: omit rather than select; sort the results in specific ways; use ‘or’ rather than ‘and’; wildcards beyond terminal * or ?.

Obviously, incorrect or incomplete indexing makes information hard or impossible to retrieve.

FamilySearch allows Export of Records search results, but only 100 at a time, and not beyond 5,000. These exports do not show the Collection involved, and nor do they show whether or not this Record is aligned with the Family Tree as a source (both of these key pieces of information are shown on the Records search UI).

FamilySearch does provide very useful APIs (some documented more clearly than others), a lot of which are available to the general public (to be covered in post #3: How I use the FamilySearch APIs), though these, like the UI, don’t allow you to retrieve search results (whether Records or Family Tree) beyond 5,000.

The APIs allow identification of the Collection for an exported result, and also automated Records export. (I have found that a little simple screen scraping is required for the latter, but for record counting purposes only.) Matching of Records against Family Tree Person Sources is possible via the presence of the unique Record Arkid on the Source.

What I did next: Having decided that it was unlikely that an Enoch Tinkham existed with the right dates, clearly the next stage was to see which of the other Tinkhams of which records exist could at the right time potentially have married Sarah Eddy and/or fathered Lydia (1797), Sarah Ann (1799), Jeremiah (1800), and Welcome (1805); I could potentially then use census data to narrow things down further.

So I needed to be able to run some complex queries over the data (to be covered in post #2: Querying FamilySearch data via SQL).

Saturday 26 August 2023

Retirement projects update

General tidying up:

1. Using Google Calendar as my day to day diary (plus Google Contacts as my main address book, and Google mail for more and more purposes).
  • This is finally working adequately.
  • Calendar integration with Outlook was hard.
  • I ended up having to use a free Outlook plugin called CalDav Synchronizer. Outlook's OOTB 'integration' allows you to subscribe to a Google calendar (one way sync from Google to Outlook) but does not support two-way sync (doh).
  • I still haven't got category colours synchronising properly for mail or calendar.
  • Outlook won't support category colours at all for Google mail because it's accessed via IMAP. That means that the UI has no 'categorisation' elements present at all unless you really dig for them.
  • I had to implement another free 3rd party solution (GO Contact Sync Mod) to synchronise contacts; Outlook doesn't support this at all OOTB.
  • Getting distribution lists to synchronise and using them effectively, for which I find you have to use hidden category colours and spurious meeting drafts, is basically a magic trick (I shall write a blog post on this). 
  • Google mail's wacky lack of folders and realistic lack of any proper Archive function don't help at all. Why design it differently from every other email system on the planet? It's not as if it provided any fab new functionality (and Google's implementation of mail rules is a joke). Plus, Outlook hides Google mail items with no labels from view, presumably because IMAP can't see them.
  • I have to say that the real problem with becoming more Google-centric is the support situation. When I think how much I have complained in the past about Microsoft Community, I am embarrassed. Google don't even pay lip service; their 'Help Communities' appear to be moribund and uncurated. (Having said which, Google Analytics does have a lively Discord channel with active and effective, if under-resourced, Google input, but I only found out about it by accident on Twitter.)
2. Regaining access to Teams.
  • Done, by paying Microsoft an extra approx. £3 a month, but since I have used it a total of once I am not sure why I bothered.
3. Moving my cloud storage from Dropbox to OneDrive.
  • Big success. Bye bye Dropbox, just too many annoyances.
  • For less than my previous Dropbox subscription I can get Office 365 (including on the desktop), Teams (as above), and 1TB of OneDrive space.
  • OneDrive integrates far better into Windows, e.g. you can set a folder to 'online only' automatically via PowerShell.
  • It has effective support channels that don't assume you know nothing and/or that your question is daft.
  • Plus, search in OneDrive is on another planet compared with Dropbox, which can't even see inside a .msg file, while OneDrive even indexes text on JPGs.
  • And a shout out to the robocopy Low Disk Space Mode which allowed me to shift nearly 200GB of data from Dropbox to OneDrive fairly painlessly. 
4. Automation of mySQL backups.
  • Done (for my locally hosted databases, anyway) via mysqldump.
5. Tidy up my cpanel hosting environment (deletion of loads of ancient experiments, etc.)
  • Done.
6. Sort out non-mySQL backups for my hosting (I have never yet found a fully usable solution for this)
  • Sync'd via winscp, which is slow but does a far better job than FTP.
i-Community:

7. Transferring the website across to Meridian to look after.
  • Done.
  • End of an era ... I had been involved in i-Community, and its predecessor the Catalyst/Notability/Logicalis IT Forum, ever since the summer of 2000. Lots of nice messages from members remembering IT Forum and i-Community Rochester trips.
www.notamos.co.uk:

8. Updating Google Analytics usage to use their new offering.
  • Done. I shall be publishing a specific blogpost on this. (To say that Google don't make it easy is an understatement.)

Thursday 20 April 2023

Retirement Projects

 I've now retired from active customer work (on 30th March), but still have a fair few techie projects on the go, so I thought I would document them here.

General tidying up:

  • Using Google Calendar as my day to day diary
  • Regaining access to Teams
  • Moving my cloud storage from Dropbox to OneDrive
  • Digitising a lot of family photographs
  • Audio and video
  • Using Visual Studio and Azure DevOps properly
  • Cleaning up various horrible scripting mechanisms to use PowerShell
  • Automation of mySQL backups
  • Tidy up my cpanel hosting environment  (deletion of loads of ancient experiments, etc.)
  • Sort out non-mySQL backups for my hosting (I have never yet found a fully usable solution for this)

i-Community:

  • Transferring the website across to Meridian to look after

www.notamos.co.uk:

  • Updating Google Analytics usage to use their new offering

Genealogy:

  • Various proposals concerning research and data quality aids (a lot of thinking and discussion still needed here)

Saturday 26 May 2018

Thoughts on StackOverflow

I've been having an interesting time lately engaging with the StackOverflow community at https://stackoverflow.com/.

I got involved because I had found lots of useful code on there when rewriting notamos.co.uk, and thought I should try to give something back.

It's been a mixed experience, mainly because I expected it to work like Microsoft Community, an environment with which I am very familiar from past Windows Phone forum involvement; wrong, because StackOverflow is explicitly aimed at building a knowledge base rather than at helping people work through problems. On StackOverflow you are supposed to ask an intelligent question, get an intelligent answer, and move on, with discussion/teaching discouraged (though by no means absent in practice).

The most assumption-challenging question I have encountered is (almost literally) 'I have written a complex Java-based web application, I want to put it live, what server should I put it on?'. But, far worse, so many developers appear never to have been taught exception handling, problem determination, basic relational database design, web application security, or even how to articulate a problem clearly. No wonder (to reiterate a regular rant) there is so much bad production software out there.

Sunday 4 February 2018

Using Blogger as a source of website content

For the last few years I have been webmaster for a local residents' association.

I put together a basic website using what I knew, which was html and shtml, and I created and maintained all the content (about 250 entries altogether) using that well known development tool Notepad.

Now it is time to hand the task over to others.

I realised a while ago that I had to find a better method of content editing, for speed, accuracy, and consistency, and also so that I could share the load with others with less technical skills.

Most of the entries are very short; a lot of them involve images; and many contain links to uploaded PDFs. Entries usually start off as 'highlights', linked to from the home page and notified to members via email and Twitter. Over time they lose their 'highlight' status; finally they are moved to the archive page.

I had a play with MODX, which we use for the i-Community website, but it's really far too complicated for the purpose (it's unnecessarily complicated for i-Community, really).

I had spent a bit of time in late 2017 learning some PHP (forced into it by the sudden demise of the Twitter feed mechanism on the i-Community home page) and a bit of Javascript (in order to implement an urgently needed non-Flash MP3 player for www.notamos.co.uk).

It occurred to me that I might solve the content editing problem via a Blogger blog like this one, pulling the content into the website via the RSS feed that is automatically provided for any Blogger blog.

Here's an example blog post (for a 'highlight'):
Here's its rendition on the website:









And here's the link on the home page:



The basics turned out to be pretty easy to do, using an open source tool called Feed2JS. This uses a second open source tool, Magpie, to read the RSS feed; the shipped version of Feed2JS, as the name suggests, then exposes the content via Javascript, with various formatting options.

I separated the content into the many necessary categories (News 'highlight', older News, News archive, Local Events, etc.) using Blogger's tags called Labels (each of which is available as an individual RSS feed), and automated the generation of the Feed2JS scripts (one per content category) via batch files.

So far so good.

The three remaining problems were: migrating all the existing content; implementing the 'highlight' links on the home page; and automating the 'last amended' date on each page.

Migration was a long and boring task. Eventually (and absolutely no thanks to Google's Blogger documentation or forum) I found some sensible import xml examples and succeeded in getting the content migrated, including an appropriate publication date for each entry, via a combination of manual editing, much changing of relative to absolute paths, SQL Server csv-xml conversion, Blogger import (not helped by some ridiculous daily import attempt limits), and manual application of Labels.

Here's an example (containing 2 entries) of the xml import.

The other two requirements turned out to be easiest to achieve by installing my own instance of Feed2JS. (I did have to put a redirect in place, effectively interpreting the Blogger site as a subfolder of the main website, as otherwise the website's PHP server was not prepared to access the RSS feed, for completely appropriate security reasons.)

I eventually used two much simplified copies of the Feed2JS code, with no Javascript; en route I took a considerable liking to PHP - a very friendly and easily learned language, it seems to me (I particularly admire the array handling).

The only remaining problem was the need to wait for Magpie's hour-long cache period to elapse before blog changes were reflected on the main website. In practice we just wait, although occasionally I do set the cache period to 1 minute temporarily (which has pretty awful effects on website performance, which is usually acceptable, if not as fast as the pre-blog version).

Finally I created a couple of little PHP utilities permitting the upload of PDFs and full size images to the website without FTP. (I subsequently had problems with these being used for malware purposes, which I should have anticipated - I have now moved them into a password-protected directory.)

Anyway it's all had the required effect: editing is now much easier, and my colleagues are happy to share the load.

Saturday 21 October 2017

Hints and Tips update (aka, SQL on #ibmi and #sqlserver: divided by a common language)

Decided to take a nice 'split string' table function I'd borrowed from some kind person on the internet for the Coordinate My Care data warehouse (SQL Server), and use it as a basis for a similar function to split multi-value attributes on IBM i, e.g. list of special authorities on a user profile. Cue a frustrating, if eventually successful, afternoon.

I really don't like #ibmi table functions: for starters, why can't I INSERT direct into the table to be returned, as in #sqlserver ?

However I did learn a few useful things that are worth sharing I think:

#ibmi Hints and Tips #18: create/replace table in QTEMP from SQL:
DECLARE GLOBAL TEMPORARY TABLE [table] ([col] [type], ... ) WITH REPLACE
(You can then use the table absolutely normally as QTEMP.[table].)

#ibmi Hints and Tips #19: FOR var1 as cur1 CURSOR FOR [SELECT statement] DO
[whatever, using column names direct from result set]; END FOR;
(I don't know whether you can do this in SQL Server, but I am definitely going to investigate.)

#ibmi Hints and Tips #20: table function usage: esp. note the final identifier (x here):
SELECT * FROM TABLE(MyTableFunction(parm1, ...)) x
(Don't like this super complicated syntax, either!)