Template talk:COVID-19 pandemic data/California medical cases by county

Source for population numbers?

edit

Could someone post a source for the population numbers used to produce "Cases / 10,000"? Better still, is there a way to copy those numbers to this page? That will make it much easier to update the "Cases / 10,000" with the daily updates. — Preceding unsigned comment added by 24.6.53.222 (talk) 04:45, 16 April 2020 (UTC)Reply

It's from List of counties in California. I have a spreadsheet where I update them daily, which is a bit more efficient though I wish it would happen automatically (like, that Wikimedia had a spreadsheet function for its tables...) Or maybe someone could make a bot to do it? platypeanArchcow (talk) 01:52, 5 May 2020 (UTC)Reply
@PlatypeanArchcow: See #Tabular data for a start. Let's figure out how to streamline and eventually automate the rest of the counties using either Commons tabular data or Wikidata statements. – Minh Nguyễn 💬 20:58, 21 May 2020 (UTC)Reply

San Diego: Include Federal Quarantine or not?

edit

PlatypeanArchcow - thanks for the data updates. It looks like there's some inconsistency in whether we include the "Federal Quarantine" headcount from SanDiegoCounty.gov in the San Diego total; is there any consensus about whether those should be counted as part of SD's total? — Preceding unsigned comment added by Michaelrhanson (talkcontribs) 15:44, 21 March 2020 (UTC)Reply

Yeah, I decided originally not to include "federal quarantine" but now think it's easier to just put in the topline number. I originally thought "federal quarantine" was people from Diamond Princess etc but it may just include all travel related cases. In the end, a lot of the data isn't entirely comparable since counties are reporting slightly different things. Might as well keep it simple.
By the way, thanks for the updates on your end as well! Make sure to update the topline when you do. platypeanArchcow (talk) 01:08, 23 March 2020 (UTC)Reply

Why not all counties?

edit

I noticed that Tuolumne county is not listed. (As of March 27 (perhaps earlier), it has 1 non-resident case it reported). Colusa is not listed and has a case too. Kings, also This template only lists 44 of 58 counties. At this point, wouldn't it make sense to include all counties, so the daily update wouldn't miss counties that report their first case? I could put them in, but my Wikipedia skills are not well honed. Currently missing counties and best page I found for current status:

  1. Alpine http://alpinecountyca.gov/AlertCenter.aspx?AID=COVID19-INFORMATION-AND-UPDATES-11
  2. Colusa http://www.countyofcolusa.org/99/Public-Health
  3. Del Norte http://www.co.del-norte.ca.us/departments/health-human-services/public-health
  4. Glenn https://www.countyofglenn.net/dept/health-human-services/public-health/covid-19
  5. Kings https://www.countyofkings.com/departments/health-welfare/public-health/coronavirus-disease-2019-covid-19/-fsiteid-1
  6. Lake http://health.co.lake.ca.us/Coronavirus.htm (then click on latest pdf dashboard)
  7. Lassen http://www.lassencounty.org/dept/public-health/public-health
  8. Mariposa http://www.mariposacounty.org/1592/COVID-19-Information
  9. Modoc https://www.modocsheriff.us/modoc-covid-19-incident-updates
  10. Plumas https://www.plumascounty.us/2669/Novel-Coronavirus-2019-COVID-19
  11. Sierra http://sierracounty.ca.gov/582/Coronavirus-COVID-19
  12. Tehama https://www.tehamacohealthservices.net/services/communicable-diseases/ (then click on latest update)
  13. Trinity https://www.trinitycounty.org/COVID-19
  14. Tuolumne https://www.tuolumnecounty.ca.gov/250/Public-Health

MabryTyson (talk) 21:09, 28 March 2020 (UTC)Reply

Thanks for this list!

Remaining as of 4/6:

  1. Lassen https://lassencares.org/ (new site!)
  2. Mariposa http://www.mariposacounty.org/1592/COVID-19-Information
  3. Modoc https://www.modocsheriff.us/modoc-covid-19-incident-updates
  4. Sierra http://sierracounty.ca.gov/582/Coronavirus-COVID-19
  5. Trinity https://www.trinitycounty.org/COVID-19

platypeanArchcow (talk) 02:06, 7 April 2020 (UTC)Reply

Disclosure: paid editing

edit

I was paid by my employer (Google) while doing the edits which added and edited the table of county-level statistics. (Nature of edit: explicitly added the five counties for which there is an official count of zero cases.) Tal Cohen (talk) 12:08, 13 April 2020 (UTC)Reply

Update of totals, cases/10K, and update date

edit

I see many edits to individual case numbers without updates to the total line at the top, the cases/10K value, or the updated date at the bottom, even by major contributors to the page. Any thoughts on what to do about this? —[AlanM1 (talk)]— 08:01, 16 April 2020 (UTC)Reply

Sorry -- new editor here! I'll (try to remember to) update the total line at the top every time.

As for the others: If I update some but not all counties, is the best practice to leave the updated date unchanged, to show updates aren't complete? And about the cases/10K, please see my question at the top of the page. What is the source for population numbers?

24.6.53.222 (talk) 05:44, 17 April 2020 (UTC)Reply

Clarify meaning of "recovered" numbers?

edit

The meaning of "recovered" seems to vary by county. For example, Marin county assumes that all cases more than 2 weeks old are recovered. I'm afraid it might be misleading to show these numbers in the table with no further explanation. But some counties offer no explanation at all. Thoughts?

24.6.53.222 (talk) 05:48, 17 April 2020 (UTC)Reply

Filter by region

edit

I added an optional |region=bayarea parameter that filters the table down to just the counties that comprise the San Francisco Bay Area, for the table at COVID-19 pandemic in the San Francisco Bay Area#Prevalence. Currently it uses TemplateStyles to hide other counties using CSS, as a hacky workaround until we can make this template more structured. (Ideally, I think we should store these numbers in Commons as a data table or in Wikidata as statements, but we'd probably want to address automation at the same time.) – Minh Nguyễn 💬 10:21, 21 May 2020 (UTC)Reply

Tabular data

edit

The figures for Santa Clara County and San Francisco are now drawn automatically from c:Data:COVID-19 Cases in Santa Clara County, California.tab and c:Data:COVID-19 cases in San Francisco.tab, respectively, via {{#invoke:Tabular data|cell}}, while population figures are drawn from the counties' Wikidata items. This hopefully makes two counties easier to keep up-to-date. Please do not update the two data tables by manually copying values from the county dashboards; instead, see the data tables' talk pages for instructions on how to run a script that updates the whole table consistently. This matters for presenting accurate time series charts at COVID-19 pandemic in the San Francisco Bay Area#Statistics. – Minh Nguyễn 💬 20:55, 21 May 2020 (UTC)Reply

Per Capita math make no sense

edit

Shouldn’t the cases per 100K simply be 10x that of the cases per 10K? A few counties are, but many are wildly off?

Also, what’s the point of having both per 10K & per 100K? One or the other would suffice.

Gecko GMobile (talk) 19:16, 1 June 2020 (UTC)Reply

@Gecko GMobile: Thanks for catching and reporting. You're correct, this is wildly off. It looks like these errors were introduced by @Dan Arthur Gross:. Dan, can you take a thorough look? I suspect you made a mistake in ordering the column wrongly or something. I reverted all changes back to the last version that did not have this problem: it's better to have slightly outdated data than these errors.
I also agree that having both numbers is confusing, especially because they seem mildly off. Can't we just get rid of the /10k number and replace that with new cases, or something else more useful? effeietsanders 01:38, 3 June 2020 (UTC)Reply
While we're at it, can we replace the January 2018 population estimates with July 2019 population estimates? – Minh Nguyễn 💬 21:07, 3 June 2020 (UTC)Reply
Does anyone know who that IP is? If they could log in, we could actually have a conversation about this :) effeietsanders 22:17, 3 June 2020 (UTC)Reply
Originally, this template had 10k numbers. 100k seems to be the standard for other states, so someone added that column. I presume the 10k was left for compatibility with something? At this point, someone should be bold and remove the 10k column. And I support switching to July 2019 population. EphemeralErrata (talk) 03:17, 6 June 2020 (UTC)Reply

This confused me too. I thought maybe Cases/10k was actually supposed to be Deaths/10k, which would be an interesting number. Jb510 (talk) 23:34, 3 June 2020 (UTC)Reply

Numbers for Los Angeles county (and probably other counties) are very old

edit

What is going on here? I have been watching this issue for several days, and it has not been resolved. Ever since this edit (https://en.wikipedia.org/w/index.php?title=Template:COVID-19_pandemic_data/California_medical_cases_by_county&oldid=960450560), the numbers have been stale. They currently say 43,052 cases, but if you look at the source, the number shown is 63,844. I'm happy to attempt to fix it but have no experience doing this and don't want to mess things up. --Emmmmar (talk) 17:54, 8 June 2020 (UTC)Reply

You're welcome to visit each county's Covid Dashboard and transcribe the numbers into this template. Change the Cases, Deaths, and if available, the Recovery numbers. As a bonus, update the cases/100k number too. Leave the auto-filled San Francisco Bay counties alone. Most states have a statewide dashboard that makes it easy to update these templates. California does too, except it is a day behind the individual county sites. Thus the necessity to visit each county's dashboard.EphemeralErrata (talk) 09:17, 9 June 2020 (UTC)Reply

Removal from article and upcoming automation

edit

I've temporarily removed this template from COVID-19 pandemic in California out of concern that it doesn't meet standards for inclusion in an article. The table had been updated piecemeal for months, which was problematic enough, but since a week ago, the table has been continually adjusted without any citations. Sources are especially important for this table, because different sources have very different reporting standards. (For example, some sources include San Quentin inmates in Marin County's case total, while others exclude them.) The usual sources for populations (U.S. Census Bureau, California Department of Finance, California State Association of Counties) don't corroborate the populations in this table, either.

On the bright side, I'm almost ready to replace this template with {{COVID-19 pandemic data/California medical cases by county/sandbox}}, which is automatically populated from Wikidata statements. It automatically sorts the rows, sums up the figures for the header row, and cites its sources. The populations currently come from Census Bureau estimates from last year, but if we find a better source, we can easily switch to it. Before we can deploy the new table, we need to automate updating Wikidata. This script gathers the requisite data from COVID Atlas, which automatically scrapes county and state dashboards and other aggregators, and generates QuickStatements commands that I've been running by hand. I'm still working through a few data issues in COVID Atlas, and I need to replace the QuickStatements part of the workflow with a proper bot, probably using pywikibot.

Thanks to Praline97, Qwerty325, Emmmmar, and others for your tireless contributions over the past several months as we coped with a lack of automation. Hopefully we can free up some of your time to work on other articles. :^) – Minh Nguyễn 💬 00:50, 10 August 2020 (UTC)Reply

Thank you, I think, for the upcoming automation. I've been editing 8+ other states, but not my home state due to the mess in this template. Two concerns: It is important to create an accessible daily history of case counts - does the Wikidata approach do that? Dashboards can have errors, omissions, and inconsistencies - does the Atlas crew include a human that checks and corrects the data? EphemeralErrata (talk) 11:52, 12 August 2020 (UTC)Reply
@EphemeralErrata: Wikidata items have revision histories just like this template and the tabular data at Commons that powers {{COVID-19 pandemic data/San Francisco Bay Area medical cases by county}}. It's also possible for an item to have a number-of-cases statement for each day of the outbreak, but the approach I'm pursuing would only keep the latest day. Maintaining scores of statements for past days would quickly become unmanageable, because we'd need to keep all those past days' numbers up-to-date as the counties revise their numbers retroactively. COVID Atlas is a volunteer-driven open source project; no one is formally assigned to keep an eye on the data's validity, but there are ideas for automated tests and other users have been proactive in reporting issues. It would actually be easier for me to write a bot that scrapes the sites directly, but relying on COVID Atlas allows me to share the significant burden of maintaining the scrapers. – Minh Nguyễn 💬 06:48, 13 August 2020 (UTC)Reply

Deaths per capita would be an interesting number

edit

@Mxn: Where are the data structures and software that result in this table?

Deaths per capita by county would be an interesting number.... 0mtwb9gd5wx (talk) 09:25, 20 June 2021 (UTC)Reply

@0mtwb9gd5wx: Sorry for missing this question. The table on the page is manually maintained. EphemeralErrata has just migrated it to the CDPH dashboard as the data source. (Thanks!) There's also a separate table for the Bay Area counties that's hooked up to a series of JSON tables via Module:Medical cases data; those tables are ultimately based on county dashboards, which differ from the state dashboard, most notably in Marin County. I've been updating the tables by script for the past couple years, but they're more reusable and extensible than the wikitext table in this template, or for that matter most of the county dashboards. Minh Nguyễn 💬 08:17, 21 February 2022 (UTC)Reply

CDPH script

edit

EphemeralErrata: Not sure what you used to pull together Special:Diff/1072345522, but in case it helps, I whipped up a little Bash script that grabs the latest per-county stats from CDPH and formats it as tabular data. I can look into integrating it into Module:Medical cases data, but for now, Module:Tabular data can transclude the whole table or any part you need:

COVID-19 cases in California by county
COVID-19 tests, cases, and deaths in California by county
DateCountyPopulationTestsPositive testsCasesDeaths
2023-05-16Alameda1,685,8868,260,571468,979386,2852,156
2023-05-16Alpine1,1173,154921390
2023-05-16Amador38,531259,72112,42910,69298
2023-05-16Butte217,769556,34147,50042,069499
2023-05-16Calaveras44,289123,79710,8819,091143
2023-05-16California40,129,160199,993,95813,807,56011,251,450101,724
2023-05-16Colusa22,59347,0353,9744,03124
2023-05-16Contra Costa1,160,0994,581,999320,836277,3651,594
2023-05-16Del Norte27,558227,5149,2717,97961
2023-05-16El Dorado193,098496,84638,47535,031247
2023-05-16Fresno1,032,2273,308,484358,772297,7493,029
2023-05-16Glenn29,34859,3546,2875,84756
2023-05-16Humboldt134,098401,39228,45923,572171
2023-05-16Imperial191,649726,17394,55571,663983
2023-05-16Inyo18,45350,3584,5614,82263
2023-05-16Kern927,2513,011,359269,390232,2072,499
2023-05-16Kings156,444882,21875,78163,296486
2023-05-16Lake64,871192,84416,67813,789164
2023-05-16Lassen30,065304,08512,52010,30165
2023-05-16Los Angeles10,257,55778,548,8124,440,0933,519,78036,058
2023-05-16Madera160,089653,90354,41846,817377
2023-05-16Marin260,8001,184,36559,09141,578260
2023-05-16Mariposa17,79556,8353,9793,38631
2023-05-16Mendocino88,439264,03019,70217,033146
2023-05-16Merced287,420854,07491,27378,125903
2023-05-16Modoc9,47510,14761898511
2023-05-16Mono13,96141,1483,9723,3448
2023-05-16Monterey448,7321,463,481114,53498,884812
2023-05-16Napa139,652639,03740,69835,014176
2023-05-16Nevada98,710290,47722,84120,460139
2023-05-16Orange3,228,51910,683,493893,336720,4008,153
2023-05-16Out of stateN/A1,860,93597,272N/AN/A
2023-05-16Placer400,4341,139,48890,25382,329685
2023-05-16Plumas18,99754,8453,6423,86615
2023-05-16Riverside2,468,1457,844,642921,368740,4186,854
2023-05-16Sacramento1,567,9755,559,129431,955376,9733,597
2023-05-16San Benito64,022233,18421,60517,851116
2023-05-16San Bernardino2,217,3988,049,405865,221715,5728,139
2023-05-16San Diego3,370,41813,853,1791,142,994994,7095,866
2023-05-16San Francisco892,2805,172,283257,333199,9401,214
2023-05-16San Joaquin782,5452,886,011260,513212,3922,453
2023-05-16San Luis Obispo278,8621,134,38176,29367,038577
2023-05-16San Mateo778,0014,701,630234,623186,219747
2023-05-16Santa Barbara456,3731,820,420139,587113,287769
2023-05-16Santa Clara1,967,58511,671,481591,517483,4092,800
2023-05-16Santa Cruz273,9991,780,37287,87969,447276
2023-05-16Shasta177,925572,76336,71336,728612
2023-05-16Sierra3,1155,8243353135
2023-05-16Siskiyou43,95673,8666,1865,53196
2023-05-16Solano444,2551,829,243130,674115,428476
2023-05-16Sonoma496,6681,969,913132,939115,225572
2023-05-16Stanislaus562,3031,677,996174,404146,6281,842
2023-05-16Sutter105,747283,11730,34626,018241
2023-05-16Tehama65,885154,34415,34314,579237
2023-05-16Trinity13,35417,7531,7311,48224
2023-05-16Tulare484,4231,500,912146,589129,5911,597
2023-05-16Tuolumne52,351245,80119,68817,036211
2023-05-16UnknownN/A35,1955,4544,2620
2023-05-16Ventura852,7473,611,655279,797222,5681,703
2023-05-16Yolo223,6121,857,80357,65450,627454
2023-05-16Yuba79,290213,34123,65720,250134
California Department of Public Health
Data available under Creative Commons Zero.

Minh Nguyễn 💬 11:51, 21 February 2022 (UTC)Reply

Partial updates

edit

@EphemeralErrata: Same question as on Commons: going forward, as CDPH winds down its reporting, should we freeze this template as it is today or remove the case count column while updating the death count column each week? Minh Nguyễn 💬 08:34, 20 May 2023 (UTC)Reply

As we transition from active reporting to historical reporting, I feel we should freeze this template as a snapshot in time. For example, if someone is looking for information on the 1918-20 pandemic, they'll expect the counts to cover that interval and not subsequent flair ups nor derivative viruses. In the future, I expect screen-scraping this template's history to be a valuable data source for a student who wishes to study the spread of Covid, but is unable to obtain official government data. Or they might access the data on Commons, though that is not possible for many other states. With Covid, it is looking like the long tail of continued cases could eventually swamp the counts from the initial pandemic. Perhaps it is time for a new article and template possibly called Covid in California post pandemic? California's continued reporting of deaths is unique as other states have simply ceased all detailed reporting. Some ceased reporting months ago, or even in 2022. Reporting very tiny changes has certain privacy concerns - such reporting can leak demographic data and resident status. For examples of challenging reporting, look at my work on the case count templates for Utah, which had dual level reporting, and Oklahoma, which ceased county level death reporting. EphemeralErrata (talk) 15:00, 20 May 2023 (UTC)Reply

@EphemeralErrata: Yes, freezing this template and creating a new one for the long tail would make plenty of sense.

Actually, the CDPH dataset is also a time series (cases/deaths/tests by county by date). We could upload the entire dataset to Commons for future research purposes, in case CDPH later takes down the archive and convenient API endpoint for SQL queries, but it's a massive amount of data for one page. Maybe one per county?

I've also been maintaining per-county time series tables on Commons for Bay Area counties, but those are from the county health departments themselves, some of which have different criteria than the state (particularly Alameda and Marin counties). A handful are continuing to post new case counts, though I don't know for how long. There are huge caveats anyways since systematic testing has ended. Some counties are posting wastewater test time series, which would be a fascinating alternative visualization in articles once {{Graph:Lines}} is operational again.

Minh Nguyễn 💬 21:01, 20 May 2023 (UTC)Reply