All the years combined, part 2

So what do you do after listening to all the Grateful Dead? In my case, I checked all my ratings, sorted out all the ones listed as “above average” (about 700 shows) and listened to them again. I’ve now narrowed my list of top shows to about 50 or so. Here they are!

Don’t forget – this is all very subjective, and I’m sure you’ll not be the first to mention that there are no 77 shows on this list!

You heard: no 1977. Or many other years!

I like 1977 shows, it’s just that this process has shown I like other years even more.

Scoring Methodology

So previously I just used a simple 1 to 5 score. This time I thought I’d get more imaginative.

For each show reviewed, I allocated a score A to E (where A is the best). This was my score for the show “as a whole”. I also added a numeric ranking 1 to 5 (1 being the best) to score the best part of the show – usually the main jam or sometimes just a face-melter of a song. So a show with a score of C1 means that the show was “average” out of all the ones I reviewed, but maybe it had an absolutely killer jam. Maybe that’s what you were looking for anyway.

All of these shows were seen as “above average” in my previous run through, so a C3 is still a very very good show.

Shows, Ranked

Here are all the shows that ranked an A, and also obtained a 1, 2 or 3. The A1 shows are, obviously – for me – the best shows they ever did. Here I’ve listed the A1, A2, A3, B1 and C1 shows – i.e. those with either an A or a 1 rating.

All the reviews can be seen at https://raw.githubusercontent.com/maximinus/grateful-dead-reviews/master/new_list.txt – although it’s not a perfect list and the dates are all in UK English format.

A1 – A band beyond description

`2-13-70`	Whatever music exists at this show is overshadowed by the final set. I consider it the culmination and peak of 60’s jamming. Candidate for best Dark Star ever. Then followed by a magnificent Cryptical > Other One > Cryptical and the Lovelight more than holds its own. Is this also the perfect set-list?
`4-8-72`	Crikes, what a set-list! Look at that second set, Star > Sugar Mags > Caution > Sat Night, every minute as tight as you like. Transitions into and out of Sugar Magnolia are magnificent, some of the best ever. I need a lie down after that one.
`5-7-72`	Bickershaw Festival, UK. Hot from the off – best 1st set ever? Opens with a decent Truckin’ and contains hot, long versions of Good Lovin’ and Playin’. Setlist is maybe the best of the tour. A great Star is trumped by the following Other One which then collapses into a black hole. Totally killer.
`5-24-72`	This Playin’ is really good… but then, we are back in Europe 72. Christ on a bike, this Other One is a sonic freakout on speed. Another 60 minute epic journey into the void. What else is there to say? Lots of good songs here as well. What a show.
`10-18-72`	I’ve seen some set-lists, but this one takes the cake. Almost sheer perfection, except maybe 1 or 2 too many rockers near the end of the show. The Dew > Playin reprise actually jams out the Dew, the Playin’ is excellent and followed by 30 mins of Star!
`3-23-75`	The GD come out for a 40 minute set with encore. There are almost no vocals, they are already jamming 5 minutes in and absolutely nothing lets up for the whole show. Certainly amongst the most unique shows, and one of the best ever.

A2 – Never had such a good time!

`2-11-6`9	For 69, this is about as good a set-list as you could wish for. Possibly the best Star from 69? A full 30 minutes, well developed themes and a great sound quality. Also ends on a killer “Death Don’t” instead of the usual monster Lovelight that was the norm.
`2-28-69`	Good golly, this show has the greatest set-list ever, even with songs that don’t exist (Drums > Power Jam > Caution)! The Dead often fail to live up to a crazy set-list, this is surely an exception. It’s a steamroller of hot sweaty energy.
`4-22-69`	Really up there for 1969. It has a decent set list, a rip-roaring jam in the 30 minute Dark Star, it’s furious when it needs to be and yet calm at times as well. Even the short songs are sung and played well, which can be uncommon for 69. Pity about the cut.
`5-2-69`	An absolute explosion of an 11, maybe one of the best ever? Most of the short show is just one explosive song after another. Perhaps opening for Iron Butterfly drove them to new heights of energy. There’s also both a Star and an Other One!
`4-17-72`	Very weird, not really spacey, Dark Star that seems almost off-kilter in it’s exploration – and this is not a bad thing! Also of note here is the Playin and Truckin’. The Caution at the end continues the killer, relaxed vibe of this show.
`4-26-72`	The “Jahrhundert Halle” show. Has a cracking Other One on it that is mainly an extended jam, I think less than 10 minutes would be identifiable as the song. On top of that we have a decent Playin’, all the Pig numbers are great and the Truckin’ shines as well.
`5-13-72`	The free show on the best tour ever. Excellent Other One, crushing second set overall, fantastic quality soundboard, and a Playin’ that goes places. Similar to other shows on this tour? That’s not a fault; who wouldn’t like more shows like Europe 72?
`8-27-72`	This one sat at A1 for a long time until some soul searching. It’s the Kesey’s Farm show, and I find it difficult to be objective with this. The video is the best GD video that exists, and by some way as well. For this reason, I never really listen to the show.
`2-28-73`	Face melting second set with a fantastic set-list. What’s not to like here? Truckin’ > Other One > Eyes > Dew, and each one is a great jam, probably the climax coming in that fantastic Eyes. It flows together most beautifully. Minor complaint: no Playin’!
`2-24-74`	A fairly typical ’74; very long show with a stupid long jam in the middle – in this case the Star > Dew, well worth the price of entry. Everything is very laid back here, yet there can be significant bite as well. Near perfect, a touch more energy needed.
`8-6-74`	This first set is like a whole show by itself, with an 18m Eyes and a 30m Playin’ sandwich; and the 2nd set is also like a whole show, starting with 8 single songs before an hour-long jam. A very unique show, a little all over the place but full of content.
8-13-75	Is this the best first set post ’74? It’s absolutely killer, sound quality as well. Near as could be perfect, to be honest. Not so much lacking jams, as mini-jams all over the place. Beautiful Blues For Allah for the encore. What a stunning show.
`1-22-78`	The whole show is electric in the greatest sense. Even T.Jed and Row Jimmy in the first set are killer versions. Yet energy levels are raised for set 2. Best ever St Stephen? Only a 15 minute NFA and a rocking duo to end the show.
`10-9-89`	First set is merely decent, more of an aperitif for the main second set. Unusual start to set 2. Energy level from Dark Star onward is crazy. Fades a little post “Death Don’t Have”. Hard pressed to find a more joyous second set. Best of the 80s.

B1 – If the thunder don’t get ya then the lightning will

A show rated B1 is about equal to an A1 rating in my book

`10-20-6`8	This is really hot stuff, culminating in what is surely the best feedback drenched Caution ever. Pigpen plays his socks off, possibly his best ever show on keys? He even has a solo. The 40 minute sequence from Dark Star onward is likely best of ’68.
`11-15-71`	Where did the short Star come from? It’s the jam of the year for me, although a bit too short. It also sounds so joyous. And in the first set as well. What’s the second set got? Possibly the best Not Fade Away jam … ever? Short again but absolutely killer.
`3-28-72`	This early PITB is really good. And look at the set-list, last show before Europe 72 is practically a show from the very tour! Other One is also a standout, in fact you could say everything is exactly perfect. Bonus points for first Donna screech.
`5-11-72`	You’d think a show with a 48 min Star would ease you in, but set one starts with a Playin’ and within 5 minutes we are in space. Heck of a journey in the Star: the length is there for a reason. Last 15 minutes Of set 2 is not really needed.
`8-20-72`	I was enjoying a nice 72 show and then all of sudden this crazy spaceship from the future crashed down in the middle of an Other One and proceeded to wreck brutal fury for 10 minutes. Luckily our heroes were able to escape with a great closing second set.
`8-24-72`	Has that some fluff that is not needed after the main jam, but really, how do you follow that Star > Dew – but otherwise this is a fine show. Some songs are a little too new for comfort (Sing Me Back Home, for example), but otherwise this is a great show.
9-24-72	There are a few nasty cuts here and there (and the AUD patch is pretty rough) but this show manages to get over them. It gets started in Bird Song, raises the game in Playin’ and then goes full meltdown in the Star. Really decent show.
`10-28-72`	Fairly rough board with Bill mixed right up. Whatever faults in the SQ that exist are blown out of the water by the stunning Dark Star at this show. And it’s the penultimate song! Absolutely killer, it even seems they ran out of time at the end.
`4-2-7`3	So this show lacks a massive jam, it just seems to have a few shorter insane ones. The ultimate Here Comes Sunshine is the best, but the Playin’ is also easily up there. And then there’s the fun run of Eyes > China > Sugar Mag. I need a lie down!
`7-1-73`	A very good show from summer ’73. Maybe it’s a bit over-relaxed in parts but the Playin’ and the fairly off-the-charts Other One show some serious improvisation skills. That combined with the totally stellar board mix make a really good show. More please!
`8-1-73`	Fantastic Dark Star that seems all over far too fast – it’s 25 minutes long! But fear not, the Eyes that follows a little after also tries to dive just as deep, setting up nicely for the Dew closer. This last number is totally killer – what an awesome show.
`12-18-73`	Killer second set has all the jams and Dark Star goes to the right places. Really, everything is practically note perfect, the band sound like they are having fun and for a good hour in the second set nothing lets up. The first set is pretty solid as well.
12-19-73	Classic 73 show. There’s no real “one big jam” but the Truckin’ > Nobody’s Fault > Other One that leads to a feedback drenched meltdown fulfills the same role. Some people say this is the best Here Comes Sunshine (they’re wrong). Also a killer Playin’!
`6-8-74`	Merely a very good show until Playing hits and everything kicks off. Unfortunately there seem to be some tuning issues that stop the jam a couple of times and the very end of the show is in audience only. Still, they certainly go for it.
9-10-74	A little uneven in application, this set-list is particularly strange. First set ends on Stella Blue? Oddities abound. Still, the climax, a slow, majestic tour through Star > Dew is fantastic. GD musically at the top of their game, song selection is an afterthought.
`6-28-74`	For a long time, my favorite show. This is a contender for best jam ever. Astounding how the 45+ minute jam in the WRS can all be over so quick when you listen to it! This is one of those shows where I get a different taste every time I listen. Killer jam – rest of show is largely perfunctory.

A3 – If you get confused, listen to the music play

A3 is a step below A1, A2 and B1.

`7-31-74`	Very long show where everything is hot. Even Jerry warns people about additional weirdness at the end of set 2. Everything is relaxed and played well, leading to a killer end sequence. Perhaps needed a little more manic energy at times. Great show.
`12-31-78`	Over-rated? Hardly. This is the last 3 set show where they could sustain the energy throughout. Nothing is played badly, and some songs are played excellently (Wharf Rat). We even have a fairly decent Dark Star and set 3 jam. All this and an great video!

C1 – The bottle was dusty but the liquor was clean

A lot different from A3, but likely their equal.

`3-1-69`	What a set-list! What a show! Am I allowed to say the first set is a bit too crushing, and needs to relax a little? This show is definitely a good representation of the year, it’s just balanced incorrectly. Bonus: the slow songs are really good!
9-16-72	Great Dark Star. Technically it jams into Brokedown Palace but I swear there is a minor cut in the tape. A good portion of the show is either missing or of bad sound quality unfortunately, but we get the sweet goods. Decent Playin’ and Dew as well.
11-18-72	Tape has some issues. The first set is missing, not a lot of Weir, some distortion, and almost every song is set 2 is standalone. This is made up for by the totally cosmic Playin’ in the middle of this, a whirling dervish of energy and maybe the best PITB ever.

Best, other decades

Only one show from the 80’s onward listed there, so here are the best 80’s and 90’s shows I encountered:

Best 1980s shows

A2: 10-09-89
B2: 10-26-89
B2: 06-21-80
B3: 05-12-80
B3: 10-09-80
B3: 12-13-80
B3: 03-09-81
C2: 09-12-81
B3: 06-10-81
B3: 12-26-81
B3: 08-07-82
B3: 09-15-82
B3: 08-15-87
B3: 10-02-89

Best 1990s shows

B2: 09-16-90
B2: 09-20-90
C2: 03-29-90
C2: 09-11-90
C2: 09-11-90
C2: 04-01-91
B3: 03-24-90
B3: 06-23-90
B3: 07-21-90
B3: 12-30-90
B3: 06-17-91
B3: 12-31-91

Furthur

Next is to listen to the top 200 or so of my top 700 and get this list more accurate. I think I’d like to do a “best of 77” and “best of 78” list as well.

Hope you enjoyed the read and leave comments below.

For any readers of my Grateful Dead machine-learning exploits, expect a new post in early summer.

What a long, strange trip – listening to every show

7 years. That’s how long it took me to listen to every available Grateful Dead show.

You can view the entire list, with a mini review and my rating out of 5, here: https://raw.githubusercontent.com/maximinus/grateful-dead-reviews/master/dead_reviews.txt

It started in 2013, when I quit my teaching job and started working as a computer programmer from home, giving me the freedom to listen to music all day long. I discovered pretty quickly that this freedom brought a burden; having to choose what to play all the time. Especially since the average album only lasts for about an hour.

Luckily, I had just acquired the complete Europe 72 release, and it occured to me to voyage through it’s entirety, in sequential order. What a massive amount of Dead – it was sure to keep my ears occupied for some time! All I had to do was start the next show at the end of the current. When the supply finally ran dry, it was trivial to continue, I already a prety vast collection sat on the hard drive. I never set out to listen to every show, I was just too lazy to ever stop.

For the most part, I listened in sequential order through the dates, although I cleared different eras out of order. My journey went from 72 through 77, then back to 65 through 70. From there, I skipped to 78 and went all the way to 93. This last year or so was spent clearing up the remains and trying to make sure I hadn’t missed any shows (that Miller guy keeps dumping soundboards online).

I started to give each show a score out of 5 very early on. Maybe from the first show, but definitly from Summer 72 (I had to listen to Europe 72 twice because I lost the first scores, if they ever existed). My grades were simple, on 1 to 5 scale: 3.5 was an “average” show, going up to 5 for the very best. A score of less than 2.5 was rare, simply because there was no real need to quantitatively compare bad shows in any detail. Some shows are rated lower than 2, this is usually for those unlistenable audience tapes where the music struggles to punch through – 1970 suffers a lot from this.

Average score per year. I rated 3.0 as an “bad” show

Scores are of course subjective and you may disagree with what I wrote for a show. However, I would be very surprised if your favourite show scored less than 3.5. Those shows are bad on many levels. Likewise, I’d be surprised if you hated a 4.5 or upwards. I have met people whose favourite show was a 3.5.

What makes a good show? For me, above and beyond the raw music, it is a certain level of chaos, the unexpected or some deviation from the norm. For musical tastes, if a show had 1 minute of Stockhausen atonal sounds or loud feedback, that was likely half a point added. If it got emotional, that was likely another half point (like that last ever Morning Dew – “I guess it DOES matter”). Most of all, I appreciated the jamming. When the music weaved it’s own unique thread, that is when I got excited. Sometimes a majestic sonic glory of merely a few seconds could elevate a show. St Stephen intro + Mutron + loud volume – it occurs once for about 5 seconds in 78 and every single hair of mine was electrified.

Show score by averaging the 40 closest shows, 1965 to 1995, left to right

But also fun were the non-musical interludes. I think of the 72 show where Wavy Gravy shares his single beer with the entire audience, the dentist’s chair in 1981, the female audience member who explains Shakedown by singing the melody line when it was a new song, the on-stage conversations – “How’s the 8-ball, Jerry?”, the audience comments – “Jerry really looks like Phil Spectre” – all of these in their own way paint a little corner of the Dead experience.

For the 1980s and onwards I mainly relied on archive.org, and reading the Deadhead stories posted there was a real delight. As the years roll on they become mere stories of excessive drug abuse (although the police dosing themselves at the penultimate show was pretty funny), but the late 70s and early 80s shows have a wealth of anecdotes, craziness and adventure. So many people with their lives changed by a random encounter with the greatest live band to ever exist. The best story was from Yugoslavian student. He studied in the early 80s in the US, and only in the last week before flying home, did he venture out with his fellow classmates for some fun. They took him to a show, fed him paper and he had an out of body religious experience with the music. Unfortunatly, upon flying home to Eastern Europe, he was unable to obtain ANY Grateful Dead until years later when the Iron curtain fell. I like to think that guy finally drank deeply from the well of tapes – after all, eventually he made it to archive.org and was able to declare his love for the band.

Show ratings, % by year. 1975 has 50% of shows rated as 5.0

For the masochists out there, you’ll be wanting to hear the China Doll from 30th July 1983. Or perhaps the mystery singer who sits in for the last few songs of 8th March 1970. Or you’d just rather Jerry turned his guitar off and let someone else play, in which case Steve Miller and the Dead, post-drums on 31st May 1992 might be your particular poison. To be honest, speculating on the bad Grateful Dead is easier than the good because really, even in the depths of what we might say are the bad years, the Dead were still, on the whole, not that bad. Truly bad Dead is the inane, the boring, and if I’m honest there were only 2 periods that were a struggle: 1970, because it has so many terrible audience recordings, and 1994, because the Dead here really were at their worst.

But I’m not here to look at the worst, I’m here to celebrate the best. Having spent so much time listening, now finally I come to the project that I am really looking forward to: I am going to listen to all the good shows – some 650 of them – that rated 4 or higher. “Above average” in my parlance. Sorted randomly, not fixed by date, time. One day in the future, I will have a top 50 to recommend to you; but those will be my favourite shows, not yours. Neither of us will be right or wrong, because your relationship with the music is just as valid as mine.

The Grateful Dead then, they are a journey, an experience, a tour de force, a philosophy with the writings of Hunter. Hearing all those shows has not made me tire of the GD, it has simply propelled me to listen to the good ones again. Maybe one day I will be bored of this journey; until then, “Let there be songs – to fill the air”.

2020 Vision & Goals

I’ve been a bit inactive of late, a combination of getting a new machine learning computer and work. Still, it’s given me time have a rethink about my approach to generating new Grateful Dead audio.

I have two new approaches I want to try out this year. You could say one of them has already been done – that is, someone has already generated “fake Grateful Dead” audio, and you’ve listened to it and accepted it as the real thing for years!

Filling In The Gaps

Some older shows recorded on digital (typically early 80’s shows) can suffer from digi-noise, that is, pops and hiccups caused by the tape simply being a little old and losing some of it’s digital information. for a (really bad) example of this, see https://archive.org/details/gd1981-09-25.sbd.miller.88822.sbeok.flac16 (and check out the weird split Sugar Mag if you can get past the sound issues).

Not all tapes are this bad, and in fact in most cases when this happens someone like Charlie Miller will cover up the noise with some editing. But think about that sentence again: “cover up the noise with some editing”, hey that sounds like putting in fake audio. You have a piece of music with a discontinuity, and you have to fill it with something that sounds nicer and sounds like the GD, right?

Well, almost. Look at a typical sample of digi-noise:

This digi-noise would be pretty loud compared with the rest of the audio (which is why we want to hide it). However, if we zoom in:

We can see that in fact the time for this audio event is from ~3.0195s to 3.0226, i.e. 0.0031s. That’s just 3/100’s of a second, which is probably easy enough to fix in a studio.

But this problem is ideal for generating new GD audio. Up to now, my effort has been to “teach” the computer by feeding it a large amount of GD, and then asking it to make some original audio. The problem with this approach is that the test is very difficult for the computer. If I first asked you to read every single Stephen King novel and then tasked you with writing a new paragraph in the same style, you would find that difficult. If however I asked you to start by filling in a missing word, well that would be a lot easier. Or if that was too much, start with a single letter.

And that, in a nutshell, is the new approach. Instead of asking the computer to generate new audio from scratch, we instead ask it to fill in the missing audio. At first this will be someting like 3/100ths of a second. When that works, I simply ask it to fill it larger and larger gaps.

This approach has been tried for images, and the results are pretty good.

GANS network filling in image after training on human faces

As you can see, the computer is able to generate many images to fill in the blank space.

Style Transfer

The second thing I shall try this year is a “style-transfer” with GD audio. These are best explained with images. Example: I have some photos. I also have digital copies of many paintings. I train the computer to recognise the style of the painter and “transfer” my image into the style of the painter.

Basic image on left, styles applied on right

So what styles are there in GD audio? Of all the tapes I have ever listened to, they are almost always one of two styles: audience or soundboard. So I will train the computer to tell the difference between them, and then ask it to output the audience audio into the style of a soundboard. I hasten to add that quite a few people prefer audience tapes (especially with the somewhat dry soundboard tapes of the early 80s), and that the style could easily go the other way.

Time Dependency

This last point is a technical issue, but one which could offer easily the best results.

So, a sound file is – to the computer – a linear series of numbers (each number being the volume at a given point in time).

What we are really asking the machine to do is to continue generating a series of new numbers based on the numbers so far.

But think how you might do this. To accurately guess what comes next, we work on a number of differing timescales. Note in the scale? Chord in the sequence? Verse to be sung? Is it a Bobby number next? All my attempts so far have been really concentrating on the “next note”, because music generates a lot of audio and so we only want to really check the local time area otherwise our computation gets really slow. In effect, to generate the next second of music, my code so far only looks at the previous 2-4 seconds. But to produce longer samples, we will need the computer to understand a lot more about the structure of the song.

I don’t want to get super-technical here, but Google researchers have a partial solution to this, which they used for creating realistic human voices (paper here: https://arxiv.org/abs/1609.03499).

It essentially means my software will be able to take inferences from a much much larger area of the song. Generating a longer section of audio might not get any quicker but no longer will the computer have the memory of a goldfish. I’m really interested in this approach because it’s been tried and tested. Here, for example, is a section of music generated by a computer that has been trained on piano recitals.

Piano recital sample generated purely from the mind of a computer

My point being: If it can be done with piano recitals, it can be done with the Grateful Dead.

Finally, Results

Yes. you read that headline correctly: I have results! However, those expecting authentic sounding Grateful Dead – in whatever form that may take – will probably want to be waiting a lot longer. But if we view this whole process as akin to cooking a meal, I have at least sorted out the ingredients and cutlery, even if the food so far is somewhat lacking.

Again, our friend phase

So the basic approach was as outlined in the previous blog post. We build 2 computer programs, 1 to detect Grateful Dead music and the other to create some music. Then we put these 2 machines in an evolutionary arms race, as they should slowly get better at their jobs, and ultimately the generator should be able to create new Grateful Dead music (or at least, new music that you cannot tell is different).

The approach taken is not to use actual music (because this is difficult), but instead to do some processing on the sound beforehand. We actually split the audio into hundreds of time slices, and then work out all the sine waves of each slice. Since this is the format we input, it is also the format we output.

This gives us a problem with the output. Let’s imagine a piece of audio composed of 2 sine waves. It would look like this:

2 sine waves, representing a single sound

You can see that the starting point for both of these sine waves – the left hand side of the graph – is different. The red sine wave starts high up and the blue low down. This information is known as the phase of the signal.

The problem our generator has is that it generates sine waves but no phase information. Our generator then starts all sine waves at point 0 – the black line. But is has to do this for all the times we slice the audio up. The result is a set of broken sine waves, where the wave is “reset” at the start of every time slice:

Audio file with phase=0 at start of every timeslice

As you can see, this is most definitely not going to sound like what we want!

Simple solutions for the win

I really took some time to try and fix this issue. It is a known problem in audio processing and basically there is no fix for this. There have been some attempts using machine learning, but that would involve another huge amount of work. so instead, I did something very simple and it seemed to work. Quite simply, I just randomised the phase information, instead of setting it to 0 all the time. As can been in the diagram above, there is a repetition in the phase information at the start of every time step. This repetition really sticks in the ear. If you randomise the phase information, then you get something more like this:

Phase at every time stamp randomised instead of constant zero

This is not perfect, but now the results sound a lot better, and phase randomisation turned out to be not that hard to implement. With that out of the way, let’s move on to the results.

What we don’t expect

So essentially, the computer code is trying to replicate a certain style of music. If it were unable to incrementally get better, we might expect it to just produce random noise. In particular, we might expect white noise.

3 second sample of white noise

Or even pink noise, which is apparently what the sound crew used to test the GD’s audio system before a gig (pink noise is where the volume of the differing sound waves is equal):

3 second sample of pink noise

So for any success, we do don’t want to sound like these 2. What we do want is sound like the Grateful Dead. So here is a sample of the Grateful Dead in exactly the same format as my results – a 22050Hz mono audio sample:

Bonus points for guessing year, show or song!

The Results

After many hours of rendering audio, my program produced some samples. To the best of my knowledge, this is the first computationally created Grateful Dead audio to ever be generated (more epochs should be better):

Render after 1000 epochs

Render after 1500 epochs

Render after 2000 epochs

So – is this progress? It is certainly not the Grateful Dead, but on the other hand it is not white or pink noise. It is also – unlike my previous posts – actual audio. I count this as a partial success, but also an indicator the final goal is some distance away.

The Future

The obvious thing to do now is to increase everything – the amount of music I use as data, the length of time I spend processing, and the size of the output data. This will take some time. I have another approach up my sleeve that involves generating songs from a very low resolution and then increasing the resolution, as opposed to starting with a small piece of music and trying to make it longer. But that’s for another post!

The Future Of Grateful Dead Music

Friends and fellow Deadheads, we are about to enter a new era for Grateful Dead music. And by that, I don’t mean for Dead cover bands, spin-offs, or associated acts. I mean actual Grateful Dead music from 1966 through to 1995. This music has properties that are singularly unique (no surprise there, Deadheads) for AI and machine learning. In this post I am going to look into my crystal ball and try and predict the future. I also suspect these things will arrive sooner than you think, because a lot of what I suggest is technology that already half-exists.

The Baseline

Since I’m going to be speculative, let’s start by trying to keep it real and grounded. Computers can do many things, but for the purposes of this post I am not going to give them superhuman skills. What I suggest could in theory be done with humans or is already “out there” in modern AI research.

The Close Future: Fake stuff

AI already does things that are impressive. Your mobile phone can likely run an app that will turn a photo of you into someone of the opposite gender. It is already possible to do this with voices, and we are not that far off from an app where you can change your voice to Jerry’s and it be 100% convincing.

From there, it is not too far to get to the world of deepfakes, where video can be manipulated to make people say anything. Take a look at https://www.theverge.com/tldr/2018/4/17/17247334/ai-fake-news-video-barack-obama-jordan-peele-buzzfeed and watch the Obama video. In 2 – 5 years this technology will be accessible to the public (some would say it already is) – your social media stream will start to have videos of Jerry wishing you “happy new year 2022” or “smoke Dan’s hippy weed for maximum buzz”. It’s unavoidable.

Images on the left are real, on the right fake

Cleaning Up What We Have

Technology already exists to clean up and improve video and images. I have code that can automatically colour in old Grateful Dead video. I have attempted to increase resolution in video, but that is some way off. I have done this with single images. The first time I did this with a picture of Jerry I really did stare at the screen for a while. Jerry in resolution that was simply not available when he was alive.

Original image (left) enhanced with increased resolution using machine learning

Currently this technology does not exist for sound, but it is coming. The benefits of this will be less hiss and cleaner sounding tapes. But what is cleaner sounding? Remember that computers will take “bad sound” and make it “good”, but what they add is not real Grateful Dead music. But soon apps will come along that allow you to mix and fiddle with audio to an unprecedented degree. “Barton Hall in quadraphonic stereo with enhanced bass, vocals at double CD quality remaster” is coming. Is that music better than what we have now? That is a hard call to make.

This technology will likely not change the audio you actually own or download, the filters can be on the player. This means that your friend who really likes Brent will be able to up his volume whenever they play a show, and this will be more than just a graphic equalizer, it will fundamentally alter the sound of the keyboards alone.

Transferring Style

So I said at the start that we will try not to give the computer mystic powers. But you’ll need a (very!) brief primer on what is known as domain transfer to understand the next part.

Domain transfer is where we take one thing that is categorised at being of type X, and we want to make it type Y. A computer that changes photos of you to a different gender are the best example of this. So to achieve this magic trick we start by creating a neural network. Now a neural network is something like your brain but running on a computer. There are thousands of connections between neurons, and each of these connections has a “weight”, which is normally a number between 0 and 1.

We train a neural network by feeding it a lot of data. It will learn to react to differences between data, and this these differences will be differences in the network weights. If we then take the average of these weights for 2 types of data, we can “move” one set of data another by reversing the neural network, so it thinks “backwards”, and then change the value of the weights so they are more like the ones for the other set of data.

Some pictorial examples may help: https://towardsdatascience.com/style-transfer-styling-images-with-convolutional-neural-networks-7d215b58f461

So what do we need to do this? Simple, a discriminator that can tell the difference between 2 sets of data. That is all. I said earlier that we would not rely on computers being super-human, so let’s have a think about what differences there could be in the music of the GD. For example:

Can you tell the difference between 60’s and 80’s shows?
Easy to tell a Bob song from a Jerry one?
Can you normally spot an audience from a soundboard?
Is it easy to tell the difference from a Brent solo to a Jerry one?

In these situations a style transfer may be possible. It is possible because we have a LOT of audio data to work with. Audience to soundboard is the obvious style transfer. But somebody, somewhere will start to do audio style transfers. You will be hearing a SBD of 8th May 1970. What it might sound like is currently anyone’s guess though.

Creating New Stuff

Have you ever seen a GD covers band? The fact that they exist can tell you that emulating the Dead is possible. Computers will take this to the extreme. I said in the last section that we need a discriminator that can tell the difference between 2 things. Once you have such a thing, you can then invert the neural network to make something that produces something of that type, it’s tutor being the discriminator. Let me choose a discriminator that can tell the difference between Grateful Dead and other music. Then the output of the inverted network is – after a large degree of training – music that cannot be differentiated from the Grateful Dead. This has already been done with images: https://thispersondoesnotexist.com/.

This person has never existed, and was imaged by a neural network. This will happen with music in the future

Now, you cannot do this with almost all music because there is not enough digital information. There’s a limited amount of Beatles audio, for example, and version of individual songs are nearly the same. Compare that with the Dead and the vast trove of recorded material. Conceptually, you can already perceive of some other intelligence producing new Grateful Dead music and jams – go check Dark Star orchestra. Computers are just going to accelerate that to the point where you will not be able to tell the difference between real GD and fake.

The Good, the Bad and the Ugly

This all comes with a caveat: the first examples you see of these technologies will not be pretty. I already have on my home computer artificial synthesized Grateful Dead music that was never played by the band. If it was good, I would be posting it here already. It will take time, but it will come.

Technology will also give us bad things. Don’t be surprised to see things that really boil your blood – a video of Jerry expressing support for Donald Trump is something that is not far away, and we could swap Trump for Obama, Hitler or even Nickleback.

But ultimately, it should bring good things. There are currently around 2000 shows to listen to. I do not see any strong reasons why, in the future ahead of us, a computer can produce a few more. And no particular reason to not try to do it.

“Gone are the days we stopped to decide where we should go, we just ride”.

Finally Getting The Data Right

I’ve been a bit quiet in posting recently, not such for a lack of work, more a lack of progress. But this weekend I did indeed finally mange to get my data sorted.

Those of you following along so far may know that I’ve had difficulty with my data format, that is, the stuff I actually give to the computer to learn from. There are 2 ways of doing things. I can use a Mel spectrogram – which is a mathematical conversion of sound into an image – or I could use a normal uncompressed wav file.

The MEL way seemed good because, on the face of it,, I managed to get the whole process working. I was able to train a Grateful Dead discriminator, followed by a producer that seemed to put out pretty good MEL representations. Here was new Grateful Dead music! To remind ourselves, I got this:

Real Grateful Dead audio on the left: machine learnt audio on the right

But…. you can’t hear a MEL image, can you? These couldn’t be converted into sound. With wav files, the exact opposite problem: turning them into sound is trivial, but I couldn’t get the machine learning actually learning at all (a common problem with wavs in the ML world). Over this the summer I tried various ways for the wav method to work, but it never did. It became obvious that I had to go the MEL route since it actually worked. This meant turning a MEL image back into audio, and that in turn meant tackling the maths.

Anatomy of a MEL

So how is a MEL made? The first step is the hardest to understand, although it is easy enough to express. We use a Fourier transform (some fancy maths) to take some sound and decompose it into its constituent frequencies. Give it some audio and you end up with a collection of sine waves. We don’t do this for the entire audio – we do one Fourier transform for every 1/5th of a second, as an example, so ending up with a series of time blocks for which we have all the sine waves generated by the band in that time period.

The final stage is to adjust this so that the frequencies we hear in our ears are increased in power, and those that the ear finds harder to hear, we diminish. We want the machine to “hear” the same that we do (this is actually the “MEL” part of “MEL Spectrogram”.

The Hard Work

When I first used MELs, my approach was to construct the data in the following way:

Cut the wav files up into slices
Normalise the audio so it’s all roughly the same volume
Turn the short audio files into MELs
Turn each MEL into an image file, to give to a neural network

My job was to reverse that process. However, a major problem was that I didn’t actually give my neural networks an image, but it was just that the software library I use (Keras) has some useful functionality that makes it easy to feed it images. Keras does what we call “data normalization” – essentially converting the colour into a number between 0 and 1.

Now if I was feeding the neural net normalised MEL images, then I would be getting back the same thing. This meant I could skip the image creating part. This was the key to me. Now my process could be:

Cut the wav files into slices
Normalise the audio
Turn audio into MEL files
Normalise the MEL files

Luckily, it turned out the last step was just a bit of maths fiddling, and once that was done, I was able to move back and forth between audio and MEL files easily. Finally!

The Results Are In…

I took 12 seconds of Grateful Dead, from the Scarlet Begonias on 16th April 1978. It’s been mixed down to mono, 22050Hz (half CD quality) at 16-bit (same as CD). It sounds like this:

Now here’s the same 12 seconds of audio, after being converted to a MEL spectrogram and then back again:

Now that doesn’t sound good, does it – what’s going on? Well, if you remember our Fourier transform – it turns audio into a set of sine waves – the fault lies there. Let’s look at a sine wave. Here the horizontal axis is time:

Now we take the music from a certain point in time (say, the first 1/5th of a second). The sine waves we get from that are accurate but we lose the information about where they start, that is, where the wave should start at the beginning of the time slice. We don’t know where is it on the vertical axis. Since we lose this information, when we reverse the process we have to start every single sine wave we reconstruct from point 0, that is the middle of the vertical axis.

This problem is not unknown in machine learning audio analysis, and it is said that the image is “out of phase” – you can certainly hear it in the audio. But I hope you can agree it’s still the good old Grateful Dead.

Moving Forward

Some researchers have been seemingly able to use machine learning to “re-phase” the audio and clear this mess up. So the answer to our machine learning problem is likely more machine learning. However I’ll look at this another time. For now, it’s back to my original experiment. I need to build a discriminator that works with these slightly new MELs. If that can be done (and I should be able to find that out quite soon), we won’t be far away from fully new, synthesised Grateful Dead audio. It’ll sound terrible, but I will have then something to show for my efforts.

Reducing a Jerry solo to numbers

My inbox is filled with literally no emails asking about how a neural network detects things.

Possibly that’s because no one is interested, or maybe they think it is too complex, or maybe they think I should just talk about something actually interesting to them. But this project will never be complete without the help of others, even if that help is of the form “you’ve mucked that up again Chris”. So here is a simple explanation of how my current network is intended to work. Well – it’s about as simple as I can make it.

After just a few months delving into machine learning, I can tell you the hardest part is DATA. Specifically, getting that data into exactly the right format. And specific you must be, for should any part of it be wrong, the computer will taunt you with many a horrible error message.

So let’s start with the easy end of the spectrum, and that is trivial: sound. Let’s visualise sound as a simple waveform:

Part of a waveform from a version of Tough Mama by the Jerry Garcia Band

Now since we start on a computer, we don’t have sound but we have a WAV file. That’s technology that has been around since 1991. So how do we go from sound to a WAV file?

The answer is move along the waveform from left to right. At regular points along the waveform, we take a sample at that point. Now we will lose some information at this point, but we sample at a really high rate so that shouldn’t be a major issue unless you are a purist. In the case of CD’s and, in particular, my WAV files, we sample 44,100 times a second – the sample rate. Remember that number – it will bite us later.

Now we have merely reduced the waveform to a fancy bar chart. How to simplify it more? Well the next stage is to realise that the waveform moves around a centre, sometimes being high up and sometimes being low down. Now, a quirk of recording means that generally there is a maximum level above or below zero that a microphone can handle. So knowing there is a maximum, we can simply assign some number which has a range -maximum to +maximum to every point on our bar chart. We could make “maximum” 5, which would mean all of our samples would be between -5 and +5.

Reducing to values from -5 to +5. Notice some accuracy is lost with such a small range

Computers, for reasons we don’t need to go into, choose some strange numbers. In my WAV files, they vary from -32,767 to +32768. So now we have reduced our music down to just a series of numbers. That is what is stored in the WAV file.

Luckily for us, machine learning requires a series of numbers – but just the right amount, and in the right format.

How much is just enough?

Actually, we are already pretty close with the WAV file. The most major change we need to make is that machine is rather fussy about it’s numbers. It requires values between 0 and 1. We have a series of integers from -32,767 to +32,768. It turns out to be pretty easy to convert between the 2 though: we simply take the number from the WAV file, add 32,767 to it and then divide by 65,535. For example, the number -13,270 would actually be:

(-13,270 + 32,767) / 65,535 = 0.29750515

The final tricky part is in reducing the data. You see, machine learning is a little slow. The more information – that is, the more numbers – we give the computer the slower it is. And machine learning is not just slow, it can be positively glacial at times. So we always want to try and reduce the data.

So what IS a reasonable amount of data? Well, let’s define our units. We’ll say “the total number of numbers we have to give to the machine for each piece of data”. A “piece of data” meaning, in this case, some sound. In my efforts with spectrogram images, I used a 10 second audio sample and the produced image had 320 X 240 x 3 = 230,400 numbers. With our WAV file, 10 seconds of audio works out at

10s * 44,100 sample rate * 2 stereo channels = 882,000

That’s 4 times the data we used with the spectrographs, so what can we do to reduce that? Well, does it really need to be stereo? Probably not – mono would be fine. Finally, does it need to be sampled at 44.1Hz? Likely not – if you sample at half that the quality is still good enough. With that in mind, look at those numbers again:

10s * 22,050 samples rate = 220,500

Now we have 10 seconds of audio reduced to just under a quarter of a million numbers from 0 to 1. In a future post, we’ll look at how machine learning uses those numbers to learn about the Grateful Dead.

The tricky question of audio

When I started writing this blog, content was pretty easy, since I already had a backlog of results complete and ready to write about. Now of course, updates are in real time, as they come. It’s unlikely they’ll sound as confident as progress is likely going to be slow. So for this post I’m going to look at a problem I have, and try to share some solutions with you. That problem is how to represent sound accurately and easily.

This is really a 2 part problem. Firstly I need to feed the computer pieces of audio. The longer the piece of audio, the more data it consumes and the longer it takes for the machine to study and learn from it. So I have to think very carefully about the data size of this audio data otherwise experiments will be very slow. Secondly, there are 2 main ways to feed the machine the audio – it could be digital wav files or a spectrograph (a graph image) of the sound file. The spectrograph is the current standard for audio recognition with machine learning, but however I give the data is how I get back results. So feeding it a spectrograph image doesn’t give me back audio – it gives me back an image that should be a spectrogram.

For the last few weeks I tried to solve this by turning a spectrogram back into audio. However, my efforts have not been that successful. The first problem was that even with a perfect setup, turning sound into a graph and back again loses you a lot of sonic information. Imagine creating an MP3 with a really terrible bitrate and then back to a wav. The second problem was that my code didn’t give me a spectrograph – it gave me an image which was meant to fool a spectrograph detector. Subtle, but it meant that the code to turn the graph back to sound failed in a lot of cases because the output was not a pure spectrograph.

With all that in mind, I decided to turn my attention to actually using sound. Whilst that sounds logical, any cursory examination of the machine learning field would show you that raw audio has not been that successful. Well, no problem, “no success” is currently the default with regards to audio results. And since if I use audio, I will get audio results, that is obviously better than what I have now. Let’s do this!

Now, back to the right amount of data. With data we only have one unit – numbers. My machine requires me to pass it a set of numbers, and this set of numbers must be of equal quantity and of the same dimension. The same dimension? This is easier to understand if we think about my spectrographs. The were images, right? And the size of the image was 320 x 240 pixels. But every image had three parts to it – the red, green and blue parts. So the dimensions of my data were 320 x 240 x 3 = 230,400 numbers. So this is the amount of data I was comfortable with processing in my last experiments. So let’s try and get a similar number for the raw audio.

Audio is encoded digitally by sampling the music frequently and assigning a single number to sample.

Audio data (red line) sampled over time – results in blue.

The sample rate for CD audio is 44.1kHz, which also happens to be the highest sample rate of the raw data I have. However, there are 2 channels – left and right. Let’s say we took 10 seconds of audio (the same amount of audio we used for the spectrographs), we get 44,100 * 2 * 10 = 882,000 numbers. Over three times larger. Well. let’s make some cuts to this. For a start, we probably don’t need stereo, and we can easily cut the sample rate down to half as well, getting us

22,050 samples/second * 1 mono channel * 10 seconds = 220,500 numbers

We are in the right ballpark. In actual fact, I ended up choosing 8 seconds of audio to be safe, and also to allow me to get more samples out of the data I had. A number, by the way, requires 4 bytes to be stored, so each of these 8 second clips ends takes up 841k.

Now onto the next question. Can the discriminator differentiate between GD and non GD given just 8 seconds of audio? Check back in a week or so to see the result.

Reaching 2000 Epochs

In machine learning, an “Epoch” is the length of time required for the machine to examine all of your input data and learn from it. More epochs = more learning. But also unfortunately more data = slower epochs. Now, at the time of the last post one epoch was some 25,000 Mel images. With this much data, my poor little laptop was struggling to do 50 epochs. And yet clearly, after even 100 epochs (as evidenced in the last post) the base images were not acceptable in any way. Even if they were, the resolution would be too small. So the time came to invest, so I went and bought a chunky new desktop, complete with fancy graphics cards (a must in serious machine learning) to give me a speed boost. My goal? 2,000 Epochs or bust.

Now that sounds great, but I then discovered that getting the correct graphics drivers setup was like completing the trials of Hercules – and I’m a paid IT professional. There were 3 weekends of arduous trial and error until finally it was all done and setup. But it was worth it, because when I ran my first test, that one with the images of Jerry Garcia, instead of taking 9 hours it took 5 minutes! A staggering 100x faster. Now I can really forge ahead, I thought! So, how does 2,000 epochs look? Like this:

Being Better Just Brings Bigger Problems

It was here that real problems began. The first thing I noticed was trivial but important: due to the way that my data was structured and loaded, half the machines memory was being wasted. This causes major slowdowns as data has to be read from disc. The other problem was more important though: quite often, my GANS would stop learning after a small number of generations.

It seems that this is because the discriminator was getting too good – it was learning so fast that the creator could not keep up. This process was random as well, so it took a load of runs to get to 2,000 epochs. In a way, this is a good result, because it is a common problem of the technique I’m using; this likely indicates I’m partially on the wrong track. All said and done though, I thought the final result wasn’t bad this early in the experiments.

Beyond the problems of low resolution, discriminators learning quickly and managing all the data in the local machine there is a much larger problem: I have no method for turning the spectrograms back into audio. Being as that is the ultimate showstopper when the aim is to produce audio, this is the next issue we will solve. Stay tuned for updates!

First Steps

This blog was started a month or two after I started experimentation, so I have some catching up to do. After coming up with my plan, a had a few things to do. Like learn how to do machine learning. The first thing I did was to take some existing code and train on some (any) data. Since I was going to be using a technique that worked with images, it made sense to work with some simple images at first. So I downloaded an example DCGANS and after a few weeks trying to understand what was going on, I managed to train it to pop out pictures of Jerry Garcia:

Output of DCGANS trained on pictures of Jerry Garcia

Now this worked, but I only had a few images of Jerry to work with (well, 32, but that’s not a lot in the world of machine learning). In fact, the machine was not so much learning to draw Jerry as to remember the images shown. But this was enough to show that essentially the technique was working and I had a decent base to start with.

Learn, and learn again

What did I get from this first foray? Quite a lot, but the main points were

You need to really understand the data you provide
The data needs a huge amount of processing
There is potentially more code in getting the data format than required for the DCGANS!
The process is SLOW. The image above took 9 hours to compute.

Go Grab Some Data

The next step was data collection. I took 5 shows of GD and roughly the same amount of other audio and split that up into 10 second sound slices. I then turned all that into Mel images. I had a problem here in that the code I had worked with 128×128 images, and it already took forever to train on that, so for the start I just resized all my Mels to 128×128. This would be awful for audio quality – probably even worse than some of those dreadful summer ’70 audience tapes – but you have to start somewhere.

I should note that doing the work in that simple paragraph was about 2/3 weeks on or off. Life does indeed get in the way. However, at the end of a pretty long learning session, I was able to post this image on to reddit for a few people to look at:

So there you go. I think you’re looking at the first computer generated Grateful Dead – although ideally you’d be listening to it. Problems? Well you’ll see the real image is both larger, has a different ratio and also, beyond some colour matching, is pretty much nothing like the final image on the right. Still, it’s a step in the right direction. It just needs a lot more training.