I have re-run the raid tests with re-creating the fileset between each of the random write workloads and performance does now match the previous newformat results. The bad news is that the huge gain that I had attributed to the newformat release, does not really exist. All of the previous results(except for the newformat run) were not re-creating the fileset, so the gain in performance was due only to having a fresh set of files, not any code changes. So, I have done 2 new sets of runs to look into this further. One is a 3 hour run of single threaded random write to the RAID system. I have compared this to ext3. Performance results are here: http://btrfs.boxacle.net/repository/raid/longwrite/longwrite/Longrandomwrite.html and graphing of all the iostat data can be found here: http://btrfs.boxacle.net/repository/raid/longwrite/summary.html The iostat graphs for btrfs are interesting for a number of reasons. First, it takes about 3000 seconds (or 50 minutes) for btrfs to reach steady state. Second, if you look at write throughput from the device view vs. the btrfs/application view, we see that for a application throughput of 21.5MB/sec it requires 63MB/sec of actual disk writes. That is an overhead of 3 to 1 vs an overhead of ~0 for ext3. Also, looking at the change in iops vs MB/sec, we see that while btrfs starts out with reasonable size IOs, it quickly deteriorate to an average IO size of only 13kb. Remember, the starting file set is only 100GB on a 2.1TB filesystem, and all data is overwrite, and this is single threaded, so there is no reason this should fragment. It seems like the allocator is having a problem doing sequential allocations. Another set of runs I did was to do repetitive 5 minute random write runs. Results are here: http://btrfs.boxacle.net/repository/raid/repeat/repeat/repeat.html This shows the dramatic degredation after just a short time, but I believe there is a fair amount of overhead in btrfs after newly mounting the FS (which this test did between each run) so I repeated without unmounting and remounting and results are here: http://btrfs.boxacle.net/repository/raid/repeat-nomount/repeat/repeat-nomount.html These results show a much less dramatic degradation, but btrfs still degrades by over 40% in just 30 minutes of run time. In fact, it was still degrading by 10% every 5 minutes when this test ended. Steve -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jul 23, 2009 at 01:35:21PM -0500, Steven Pratt wrote:> I have re-run the raid tests with re-creating the fileset between each > of the random write workloads and performance does now match the > previous newformat results. The bad news is that the huge gain that I > had attributed to the newformat release, does not really exist. All of > the previous results(except for the newformat run) were not re-creating > the fileset, so the gain in performance was due only to having a fresh > set of files, not any code changes.Thanks for doing all of these runs. This is still a little different than what I have here, my initial runs are very very fast and after 10 or so level out to a relatively low performance on random writes. With nodatacow, it stays even.> > So, I have done 2 new sets of runs to look into this further. One is a 3 > hour run of single threaded random write to the RAID system. I have > compared this to ext3. Performance results are here: > http://btrfs.boxacle.net/repository/raid/longwrite/longwrite/Longrandomwrite.html > > and graphing of all the iostat data can be found here: > > http://btrfs.boxacle.net/repository/raid/longwrite/summary.html > > The iostat graphs for btrfs are interesting for a number of reasons. > First, it takes about 3000 seconds (or 50 minutes) for btrfs to reach > steady state. Second, if you look at write throughput from the device > view vs. the btrfs/application view, we see that for a application > throughput of 21.5MB/sec it requires 63MB/sec of actual disk writes. > That is an overhead of 3 to 1 vs an overhead of ~0 for ext3. Also, > looking at the change in iops vs MB/sec, we see that while btrfs starts > out with reasonable size IOs, it quickly deteriorate to an average IO > size of only 13kb. Remember, the starting file set is only 100GB on a > 2.1TB filesystem, and all data is overwrite, and this is single > threaded, so there is no reason this should fragment. It seems like the > allocator is having a problem doing sequential allocations.There are two things happening. First the default allocation scheme isn''t very well suited to this, mount -o ssd will perform better. But over the long term, random overwrites to the file cause a lot of writes to the extent allocation tree. That''s really what -o nodatacow is saving us. There are optimizations we can do, but we''re holding off on that in favor of enospc and other pressing things. But, with all of that said, Josef has some really important allocator improvements. I''ve put them out along with our pending patches into the experimental branch of the btrfs-unstable tree. Could you please give this branch a try both with and without the ssd mount option? -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason wrote:> On Thu, Jul 23, 2009 at 01:35:21PM -0500, Steven Pratt wrote: > >> I have re-run the raid tests with re-creating the fileset between each >> of the random write workloads and performance does now match the >> previous newformat results. The bad news is that the huge gain that I >> had attributed to the newformat release, does not really exist. All of >> the previous results(except for the newformat run) were not re-creating >> the fileset, so the gain in performance was due only to having a fresh >> set of files, not any code changes. >> > > Thanks for doing all of these runs. This is still a little different > than what I have here, my initial runs are very very fast and after 10 > or so level out to a relatively low performance on random writes. With > nodatacow, it stays even. > >Right, I do not see this problem with nodatacow.>> So, I have done 2 new sets of runs to look into this further. One is a 3 >> hour run of single threaded random write to the RAID system. I have >> compared this to ext3. Performance results are here: >> http://btrfs.boxacle.net/repository/raid/longwrite/longwrite/Longrandomwrite.html >> >> and graphing of all the iostat data can be found here: >> >> http://btrfs.boxacle.net/repository/raid/longwrite/summary.html >> >> The iostat graphs for btrfs are interesting for a number of reasons. >> First, it takes about 3000 seconds (or 50 minutes) for btrfs to reach >> steady state. Second, if you look at write throughput from the device >> view vs. the btrfs/application view, we see that for a application >> throughput of 21.5MB/sec it requires 63MB/sec of actual disk writes. >> That is an overhead of 3 to 1 vs an overhead of ~0 for ext3. Also, >> looking at the change in iops vs MB/sec, we see that while btrfs starts >> out with reasonable size IOs, it quickly deteriorate to an average IO >> size of only 13kb. Remember, the starting file set is only 100GB on a >> 2.1TB filesystem, and all data is overwrite, and this is single >> threaded, so there is no reason this should fragment. It seems like the >> allocator is having a problem doing sequential allocations. >> > > There are two things happening. First the default allocation scheme > isn''t very well suited to this, mount -o ssd will perform better. But > over the long term, random overwrites to the file cause a lot of writes > to the extent allocation tree. That''s really what -o nodatacow is > saving us. There are optimizations we can do, but we''re holding off on > that in favor of enospc and other pressing things. >Well I have -o ssd data that I can upload, but it was worse than without. I do understand about timing and priorities.> But, with all of that said, Josef has some really important allocator > improvements. I''ve put them out along with our pending patches into the > experimental branch of the btrfs-unstable tree. Could you please give > this branch a try both with and without the ssd mount option? > >Sure, will try to get to it tomorrow. Steve> -chris > >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jul 23, 2009 at 05:04:49PM -0500, Steven Pratt wrote:> Chris Mason wrote: >> On Thu, Jul 23, 2009 at 01:35:21PM -0500, Steven Pratt wrote: >> >>> I have re-run the raid tests with re-creating the fileset between >>> each of the random write workloads and performance does now match >>> the previous newformat results. The bad news is that the huge gain >>> that I had attributed to the newformat release, does not really >>> exist. All of the previous results(except for the newformat run) >>> were not re-creating the fileset, so the gain in performance was due >>> only to having a fresh set of files, not any code changes. >>> >> >> Thanks for doing all of these runs. This is still a little different >> than what I have here, my initial runs are very very fast and after 10 >> or so level out to a relatively low performance on random writes. With >> nodatacow, it stays even. >> >> > Right, I do not see this problem with nodatacow. > >>> So, I have done 2 new sets of runs to look into this further. One is >>> a 3 hour run of single threaded random write to the RAID system. I >>> have compared this to ext3. Performance results are here: >>> http://btrfs.boxacle.net/repository/raid/longwrite/longwrite/Longrandomwrite.html >>> >>> and graphing of all the iostat data can be found here: >>> >>> http://btrfs.boxacle.net/repository/raid/longwrite/summary.html >>> >>> The iostat graphs for btrfs are interesting for a number of reasons. >>> First, it takes about 3000 seconds (or 50 minutes) for btrfs to >>> reach steady state. Second, if you look at write throughput from >>> the device view vs. the btrfs/application view, we see that for a >>> application throughput of 21.5MB/sec it requires 63MB/sec of actual >>> disk writes. That is an overhead of 3 to 1 vs an overhead of ~0 for >>> ext3. Also, looking at the change in iops vs MB/sec, we see that >>> while btrfs starts out with reasonable size IOs, it quickly >>> deteriorate to an average IO size of only 13kb. Remember, the >>> starting file set is only 100GB on a 2.1TB filesystem, and all data >>> is overwrite, and this is single threaded, so there is no reason >>> this should fragment. It seems like the allocator is having a >>> problem doing sequential allocations. >>> >> >> There are two things happening. First the default allocation scheme >> isn''t very well suited to this, mount -o ssd will perform better. But >> over the long term, random overwrites to the file cause a lot of writes >> to the extent allocation tree. That''s really what -o nodatacow is >> saving us. There are optimizations we can do, but we''re holding off on >> that in favor of enospc and other pressing things. >> > Well I have -o ssd data that I can upload, but it was worse than > without. I do understand about timing and priorities. > >> But, with all of that said, Josef has some really important allocator >> improvements. I''ve put them out along with our pending patches into the >> experimental branch of the btrfs-unstable tree. Could you please give >> this branch a try both with and without the ssd mount option? >> >> > Sure, will try to get to it tomorrow.Sorry, I missed a fix in the experimental branch. I''ll push out a rebased version in a few minutes. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jul 24, 2009 at 09:24:07AM -0400, Chris Mason wrote:> >> > > Sure, will try to get to it tomorrow. > > Sorry, I missed a fix in the experimental branch. I''ll push out a > rebased version in a few minutes. >Ok, the rebased version is ready to use. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason wrote:> On Fri, Jul 24, 2009 at 09:24:07AM -0400, Chris Mason wrote: > >>>> >>>> >>> Sure, will try to get to it tomorrow. >>> >> Sorry, I missed a fix in the experimental branch. I''ll push out a >> rebased version in a few minutes. >> >> > > Ok, the rebased version is ready to use. >Ok, good. Also seems I misspoke on the -o ssd results. I looked them over again this morning and they are slightly bette than without. Initial score of 46MB/sec vs 44MB/sec without, and degrades after 30 minutes to about 25MB/sec vs 20MB/sec without. So after 30 minutes of runtime -o ssd is running about 25% faster, but still significant degrade. Steve> -chris > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason wrote:> On Fri, Jul 24, 2009 at 09:24:07AM -0400, Chris Mason wrote: > >>>> >>>> >>> Sure, will try to get to it tomorrow. >>> >> Sorry, I missed a fix in the experimental branch. I''ll push out a >> rebased version in a few minutes. >> >> > > Ok, the rebased version is ready to use. >New results are up for both with and without nodatacow. Not much change. http://btrfs.boxacle.net/repository/raid/history/History.html Have another run going with nodatacow and ssd. Steve> -chris > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jul 28, 2009 at 03:12:38PM -0500, Steven Pratt wrote:> Chris Mason wrote: > >On Fri, Jul 24, 2009 at 09:24:07AM -0400, Chris Mason wrote: > >>>Sure, will try to get to it tomorrow. > >>Sorry, I missed a fix in the experimental branch. I''ll push out a > >>rebased version in a few minutes. > >> > > > >Ok, the rebased version is ready to use. > New results are up for both with and without nodatacow. Not much change. > > http://btrfs.boxacle.net/repository/raid/history/History.html > > Have another run going with nodatacow and ssd.Hi Steve, I think I''m going to start tuning something other than the random-writes, there is definitely low hanging fruit in the large file creates workload ;) Thanks again for posting all of these. The history graph has 2.6.31-rc btrfs against 2.6.29-rc ext4. Have you done more recent runs on ext4? -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason wrote:> On Tue, Jul 28, 2009 at 03:12:38PM -0500, Steven Pratt wrote: > >> Chris Mason wrote: >> >>> On Fri, Jul 24, 2009 at 09:24:07AM -0400, Chris Mason wrote: >>> >>>>> Sure, will try to get to it tomorrow. >>>>> >>>> Sorry, I missed a fix in the experimental branch. I''ll push out a >>>> rebased version in a few minutes. >>>> >>>> >>> Ok, the rebased version is ready to use. >>> >> New results are up for both with and without nodatacow. Not much change. >> >> http://btrfs.boxacle.net/repository/raid/history/History.html >> >> Have another run going with nodatacow and ssd. >> > > Hi Steve, > > I think I''m going to start tuning something other than the > random-writes, there is definitely low hanging fruit in the large file > creates workload ;) Thanks again for posting all of these. >Sure, no problem.> The history graph has 2.6.31-rc btrfs against 2.6.29-rc ext4. Have you > done more recent runs on ext4? > >Yes, thanks for pointing that out, had so many issues I forgot to update the graphs for other file systems. Just pushed new graphs with data for 2.6.30-rc7 for all the other file systems. This was from your "newformat" branch from June 6th. Steve> -chris >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jul 28, 2009 at 04:10:41PM -0500, Steven Pratt wrote:> > > >Hi Steve, > > > >I think I''m going to start tuning something other than the > >random-writes, there is definitely low hanging fruit in the large file > >creates workload ;) Thanks again for posting all of these. > Sure, no problem. > > >The history graph has 2.6.31-rc btrfs against 2.6.29-rc ext4. Have you > >done more recent runs on ext4? > > > Yes, thanks for pointing that out, had so many issues I forgot to > update the graphs for other file systems. Just pushed new graphs > with data for 2.6.30-rc7 for all the other file systems. This was > from your "newformat" branch from June 6th.I''ve been tuning the 128 thread large file streaming writes, and found some easy optimizations. While I''m fixing up these patches, could you please do a streaming O_DIRECT write test run for me? I think buffered writeback in general has some problems right now on high end arrays. On my box 2.6.31-rc5 streaming buffered write with xfs only got at 200MB/s (with the 128 thread ffsb workload). Buffered btrfs goes at 175MB/s. O_DIRECT btrfs runs at 390MB/s, while XFS varies a bit between 330MB/s and 250MB/s. I''m using a 1MB write blocksize. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
HI, Do you have any benchmarks against non-raid common workloads? Like say a desktop user? It would be great to compare against ext3, ext4, xfs etc., Thanks, On Thu, Aug 6, 2009 at 2:05 AM, Chris Mason<chris.mason@oracle.com> wrote:> On Tue, Jul 28, 2009 at 04:10:41PM -0500, Steven Pratt wrote: >> > >> >Hi Steve, >> > >> >I think I''m going to start tuning something other than the >> >random-writes, there is definitely low hanging fruit in the large file >> >creates workload ;) Thanks again for posting all of these. >> Sure, no problem. >> >> >The history graph has 2.6.31-rc btrfs against 2.6.29-rc ext4. Have you >> >done more recent runs on ext4? >> > >> Yes, thanks for pointing that out, had so many issues I forgot to >> update the graphs for other file systems. Just pushed new graphs >> with data for 2.6.30-rc7 for all the other file systems. This was >> from your "newformat" branch from June 6th. > > I''ve been tuning the 128 thread large file streaming writes, and found > some easy optimizations. While I''m fixing up these patches, could you > please do a streaming O_DIRECT write test run for me? I think buffered > writeback in general has some problems right now on high end arrays. > > On my box 2.6.31-rc5 streaming buffered write with xfs only got at > 200MB/s (with the 128 thread ffsb workload). Buffered btrfs goes at > 175MB/s. > > O_DIRECT btrfs runs at 390MB/s, while XFS varies a bit between 330MB/s > and 250MB/s. > > I''m using a 1MB write blocksize. > > -chris > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
debian developer wrote:> HI, > > Do you have any benchmarks against non-raid common workloads? > Like say a desktop user? It would be great to compare against ext3, > ext4, xfs etc., >Yes, have had a little trouble with that box recently, but plenty of results based on the 2.6.29 kernels here: http://btrfs.boxacle.net/repository/single-disk/History/History.html If you are not familiar with the runs I have been doing, you can find the details of the benchmarking machine and test procedures here: http://btrfs.boxacle.net/ Steve> Thanks, > > On Thu, Aug 6, 2009 at 2:05 AM, Chris Mason<chris.mason@oracle.com> wrote: > >> On Tue, Jul 28, 2009 at 04:10:41PM -0500, Steven Pratt wrote: >> >>>> Hi Steve, >>>> >>>> I think I''m going to start tuning something other than the >>>> random-writes, there is definitely low hanging fruit in the large file >>>> creates workload ;) Thanks again for posting all of these. >>>> >>> Sure, no problem. >>> >>> >>>> The history graph has 2.6.31-rc btrfs against 2.6.29-rc ext4. Have you >>>> done more recent runs on ext4? >>>> >>>> >>> Yes, thanks for pointing that out, had so many issues I forgot to >>> update the graphs for other file systems. Just pushed new graphs >>> with data for 2.6.30-rc7 for all the other file systems. This was >>> from your "newformat" branch from June 6th. >>> >> I''ve been tuning the 128 thread large file streaming writes, and found >> some easy optimizations. While I''m fixing up these patches, could you >> please do a streaming O_DIRECT write test run for me? I think buffered >> writeback in general has some problems right now on high end arrays. >> >> On my box 2.6.31-rc5 streaming buffered write with xfs only got at >> 200MB/s (with the 128 thread ffsb workload). Buffered btrfs goes at >> 175MB/s. >> >> O_DIRECT btrfs runs at 390MB/s, while XFS varies a bit between 330MB/s >> and 250MB/s. >> >> I''m using a 1MB write blocksize. >> >> -chris >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >>-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason wrote:> On Tue, Jul 28, 2009 at 04:10:41PM -0500, Steven Pratt wrote: > >>> Hi Steve, >>> >>> I think I''m going to start tuning something other than the >>> random-writes, there is definitely low hanging fruit in the large file >>> creates workload ;) Thanks again for posting all of these. >>> >> Sure, no problem. >> >> >>> The history graph has 2.6.31-rc btrfs against 2.6.29-rc ext4. Have you >>> done more recent runs on ext4? >>> >>> >> Yes, thanks for pointing that out, had so many issues I forgot to >> update the graphs for other file systems. Just pushed new graphs >> with data for 2.6.30-rc7 for all the other file systems. This was >> from your "newformat" branch from June 6th. >> > > I''ve been tuning the 128 thread large file streaming writes, and found > some easy optimizations. While I''m fixing up these patches, could you > please do a streaming O_DIRECT write test run for me? I think buffered > writeback in general has some problems right now on high end arrays. > > On my box 2.6.31-rc5 streaming buffered write with xfs only got at > 200MB/s (with the 128 thread ffsb workload). Buffered btrfs goes at > 175MB/s. > > O_DIRECT btrfs runs at 390MB/s, while XFS varies a bit between 330MB/s > and 250MB/s. > > I''m using a 1MB write blocksize. >On my todo list, but am swamped this week trying to get ready for vacation. Will try to get to it as soon as I can. Stee> -chris >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Aug 07, 2009 at 08:56:52AM -0500, Steven Pratt wrote:> Chris Mason wrote: > >On Tue, Jul 28, 2009 at 04:10:41PM -0500, Steven Pratt wrote: > >>>Hi Steve, > >>> > >>>I think I''m going to start tuning something other than the > >>>random-writes, there is definitely low hanging fruit in the large file > >>>creates workload ;) Thanks again for posting all of these. > >>Sure, no problem. > >> > >>>The history graph has 2.6.31-rc btrfs against 2.6.29-rc ext4. Have you > >>>done more recent runs on ext4? > >>> > >>Yes, thanks for pointing that out, had so many issues I forgot to > >>update the graphs for other file systems. Just pushed new graphs > >>with data for 2.6.30-rc7 for all the other file systems. This was > >>from your "newformat" branch from June 6th. > > > >I''ve been tuning the 128 thread large file streaming writes, and found > >some easy optimizations. While I''m fixing up these patches, could you > >please do a streaming O_DIRECT write test run for me? I think buffered > >writeback in general has some problems right now on high end arrays. > > > >On my box 2.6.31-rc5 streaming buffered write with xfs only got at > >200MB/s (with the 128 thread ffsb workload). Buffered btrfs goes at > >175MB/s. > > > >O_DIRECT btrfs runs at 390MB/s, while XFS varies a bit between 330MB/s > >and 250MB/s. > > > >I''m using a 1MB write blocksize. > On my todo list, but am swamped this week trying to get ready for > vacation. Will try to get to it as soon as I can.Ok, I''ve pushed out a very raw version of my buffered write fixes to a new branch named performance on btrfs-unstable. Please try this with the streaming large file create workload. I''m also curious to see if it improves on your box when you mount with mount -o thread_pool=128 -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason wrote:> On Fri, Aug 07, 2009 at 08:56:52AM -0500, Steven Pratt wrote: > >> Chris Mason wrote: >> >>> On Tue, Jul 28, 2009 at 04:10:41PM -0500, Steven Pratt wrote: >>> >>>>> Hi Steve, >>>>> >>>>> I think I''m going to start tuning something other than the >>>>> random-writes, there is definitely low hanging fruit in the large file >>>>> creates workload ;) Thanks again for posting all of these. >>>>> >>>> Sure, no problem. >>>> >>>> >>>>> The history graph has 2.6.31-rc btrfs against 2.6.29-rc ext4. Have you >>>>> done more recent runs on ext4? >>>>> >>>>> >>>> Yes, thanks for pointing that out, had so many issues I forgot to >>>> update the graphs for other file systems. Just pushed new graphs >>>> with data for 2.6.30-rc7 for all the other file systems. This was >>>> >>> >from your "newformat" branch from June 6th. >>> >>> I''ve been tuning the 128 thread large file streaming writes, and found >>> some easy optimizations. While I''m fixing up these patches, could you >>> please do a streaming O_DIRECT write test run for me? I think buffered >>> writeback in general has some problems right now on high end arrays. >>> >>> On my box 2.6.31-rc5 streaming buffered write with xfs only got at >>> 200MB/s (with the 128 thread ffsb workload). Buffered btrfs goes at >>> 175MB/s. >>> >>> O_DIRECT btrfs runs at 390MB/s, while XFS varies a bit between 330MB/s >>> and 250MB/s. >>> >>> I''m using a 1MB write blocksize. >>> >> On my todo list, but am swamped this week trying to get ready for >> vacation. Will try to get to it as soon as I can. >> > > Ok, I''ve pushed out a very raw version of my buffered write fixes to > a new branch named performance on btrfs-unstable. > > Please try this with the streaming large file create workload. I''m also > curious to see if it improves on your box when you mount with > > mount -o thread_pool=128 > >Better late than never. Finally got this finished up. Mixed bag on this one. BTRFS lags significantly on single threaded. Seems unable to keep IO outstanding to the device. Less that 60% busy on the DM device, compared to 97%+ for all other filesystems. nodatacow helps out, increasing utilization to about 70%, but still trails by a large margin. Results are more favorable for multithreaded tests. nodatacow is actually the top performer here! However, cow still raises it''s ugly head and causes significant performance degradation (45%) and increased CPU (43%). Also, even without cow, BTRFS is consuming 8-10x more CPU than other File Systems. I don''t have oprofile data for these runs, as that was causing some issues with BTRFS. Will retry, and see if that problem is fixed. thread_pool seemed to make no difference at all. All runs were done against an August 20th pull of experimental tree. These are 1M, odirect file creates, with each file being 1G in size. Results can be found here: http://btrfs.boxacle.net/repository/raid/large_create_test/write-test/1M_odirect_create.html Steve> -chris >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Aug 31, 2009 at 12:49:13PM -0500, Steven Pratt wrote:> Better late than never. Finally got this finished up. Mixed bag on > this one. BTRFS lags significantly on single threaded. Seems > unable to keep IO outstanding to the device. Less that 60% busy on > the DM device, compared to 97%+ for all other filesystems. > nodatacow helps out, increasing utilization to about 70%, but still > trails by a large margin.Hi Steve, Jens Axboe did some profiling on his big test rig and I think we found the biggest CPU problems. The end result is now setting in the master branch of the btrfs-unstable repo. On his boxes, btrfs went from around 400MB/s streaming writes to 1GB/s limit, and we''re now tied with XFS while using less CPU time. Hopefully you will see similar results ;) -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason wrote:> On Mon, Aug 31, 2009 at 12:49:13PM -0500, Steven Pratt wrote: > >> Better late than never. Finally got this finished up. Mixed bag on >> this one. BTRFS lags significantly on single threaded. Seems >> unable to keep IO outstanding to the device. Less that 60% busy on >> the DM device, compared to 97%+ for all other filesystems. >> nodatacow helps out, increasing utilization to about 70%, but still >> trails by a large margin. >> > > Hi Steve, > > Jens Axboe did some profiling on his big test rig and I think we found > the biggest CPU problems. The end result is now setting in the master > branch of the btrfs-unstable repo. > > On his boxes, btrfs went from around 400MB/s streaming writes to 1GB/s > limit, and we''re now tied with XFS while using less CPU time. > > Hopefully you will see similar results ;) >Hmmm, well no I didn''t. Throughputs at 1 and 128 threads are pretty much unchanged, although I do see a good CPU savings on the 128 thread case (with cow). For 16 threads we actually regressed with cow enabled. Results are here: http://btrfs.boxacle.net/repository/raid/large_create_test/write-test/1M_odirect_create.html I''ll try to look more into this next week. Steve> -chris > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Sep 11, 2009 at 04:35:50PM -0500, Steven Pratt wrote:> Chris Mason wrote: > >On Mon, Aug 31, 2009 at 12:49:13PM -0500, Steven Pratt wrote: > >>Better late than never. Finally got this finished up. Mixed bag on > >>this one. BTRFS lags significantly on single threaded. Seems > >>unable to keep IO outstanding to the device. Less that 60% busy on > >>the DM device, compared to 97%+ for all other filesystems. > >>nodatacow helps out, increasing utilization to about 70%, but still > >>trails by a large margin. > > > >Hi Steve, > > > >Jens Axboe did some profiling on his big test rig and I think we found > >the biggest CPU problems. The end result is now setting in the master > >branch of the btrfs-unstable repo. > > > >On his boxes, btrfs went from around 400MB/s streaming writes to 1GB/s > >limit, and we''re now tied with XFS while using less CPU time. > > > >Hopefully you will see similar results ;) > Hmmm, well no I didn''t. Throughputs at 1 and 128 threads are pretty > much unchanged, although I do see a good CPU savings on the 128 > thread case (with cow). For 16 threads we actually regressed with > cow enabled. > > Results are here: > > http://btrfs.boxacle.net/repository/raid/large_create_test/write-test/1M_odirect_create.html > > I''ll try to look more into this next week. >Hmmm, Jens was benchmarking buffered writes, but he was also testing on his new per-bdi write back code. If your next run could be buffered instead of O_DIRECT, I''d be curious to see the results. Thanks, Chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Sep 14 2009, Chris Mason wrote:> On Fri, Sep 11, 2009 at 04:35:50PM -0500, Steven Pratt wrote: > > Chris Mason wrote: > > >On Mon, Aug 31, 2009 at 12:49:13PM -0500, Steven Pratt wrote: > > >>Better late than never. Finally got this finished up. Mixed bag on > > >>this one. BTRFS lags significantly on single threaded. Seems > > >>unable to keep IO outstanding to the device. Less that 60% busy on > > >>the DM device, compared to 97%+ for all other filesystems. > > >>nodatacow helps out, increasing utilization to about 70%, but still > > >>trails by a large margin. > > > > > >Hi Steve, > > > > > >Jens Axboe did some profiling on his big test rig and I think we found > > >the biggest CPU problems. The end result is now setting in the master > > >branch of the btrfs-unstable repo. > > > > > >On his boxes, btrfs went from around 400MB/s streaming writes to 1GB/s > > >limit, and we''re now tied with XFS while using less CPU time. > > > > > >Hopefully you will see similar results ;) > > Hmmm, well no I didn''t. Throughputs at 1 and 128 threads are pretty > > much unchanged, although I do see a good CPU savings on the 128 > > thread case (with cow). For 16 threads we actually regressed with > > cow enabled. > > > > Results are here: > > > > http://btrfs.boxacle.net/repository/raid/large_create_test/write-test/1M_odirect_create.html > > > > I''ll try to look more into this next week. > > > > Hmmm, Jens was benchmarking buffered writes, but he was also testing on > his new per-bdi write back code. If your next run could be buffered > instead of O_DIRECT, I''d be curious to see the results.I found out today that a larger MAX_WRITEBACK_PAGES is still an essential for me. It basically doubles throughput on btrfs. So I think we need to do something about that, and sooner and rather than later. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason wrote:> On Fri, Sep 11, 2009 at 04:35:50PM -0500, Steven Pratt wrote: > >> Chris Mason wrote: >> >>> On Mon, Aug 31, 2009 at 12:49:13PM -0500, Steven Pratt wrote: >>> >>>> Better late than never. Finally got this finished up. Mixed bag on >>>> this one. BTRFS lags significantly on single threaded. Seems >>>> unable to keep IO outstanding to the device. Less that 60% busy on >>>> the DM device, compared to 97%+ for all other filesystems. >>>> nodatacow helps out, increasing utilization to about 70%, but still >>>> trails by a large margin. >>>> >>> Hi Steve, >>> >>> Jens Axboe did some profiling on his big test rig and I think we found >>> the biggest CPU problems. The end result is now setting in the master >>> branch of the btrfs-unstable repo. >>> >>> On his boxes, btrfs went from around 400MB/s streaming writes to 1GB/s >>> limit, and we''re now tied with XFS while using less CPU time. >>> >>> Hopefully you will see similar results ;) >>> >> Hmmm, well no I didn''t. Throughputs at 1 and 128 threads are pretty >> much unchanged, although I do see a good CPU savings on the 128 >> thread case (with cow). For 16 threads we actually regressed with >> cow enabled. >> >> Results are here: >> >> http://btrfs.boxacle.net/repository/raid/large_create_test/write-test/1M_odirect_create.html >> >> I''ll try to look more into this next week. >> >> > > Hmmm, Jens was benchmarking buffered writes, but he was also testing on > his new per-bdi write back code. If your next run could be buffered > instead of O_DIRECT, I''d be curious to see the results. > >Buffered does look a lot better. I don''t have a btrfs baseline before these latest changes for this exact workload, but these results are not bad at all. With cow, beats just about everything except XFS, and with nocow simply screams. CPU consumption looks good as well. I''ll probably give the full set of tests a run tonight. Results are here: http://btrfs.boxacle.net/repository/raid/buffered-creates/buffered-create/buffered-create.html Only bit of bad news is I did get one error that crashed the system on single threaded nocow run. So that data point is missing. Output below: btrfs1 kernel: [251789.525886] ------------[ cut here ]------------ Message from syslogd@ at Mon Sep 14 13:13:04 2009 ... btrfs1 kernel: [251789.526574] invalid opcode: 0000 [#1] SMP Message from syslogd@ at Mon Sep 14 13:13:04 2009 ... btrfs1 kernel: [251789.526654] last sysfs file: /sys/devices/pci0000:0c/0000:0c:01.0/local_cpus Message from syslogd@ at Mon Sep 14 13:13:05 2009 ... btrfs1 kernel: [251789.526654] Stack: Message from syslogd@ at Mon Sep 14 13:13:05 2009 ... btrfs1 kernel: [251789.526654] ffff88013fc234c0 ffff88013fbcf400 0000000000000000 ffff88013fc01080 Message from syslogd@ at Mon Sep 14 13:13:05 2009 ... btrfs1 kernel: [251789.526654] ffff880132e11d38 ffffffff802a5392 0000000000000001 ffff88013fbcf400 Message from syslogd@ at Mon Sep 14 13:13:05 2009 ... btrfs1 kernel: [251789.526654] Call Trace: Message from syslogd@ at Mon Sep 14 13:13:05 2009 ... btrfs1 kernel: [251789.526654] [<ffffffff802a5392>] cache_flusharray+0x7d/0xae Message from syslogd@ at Mon Sep 14 13:13:05 2009 ... btrfs1 kernel: [251789.526654] [<ffffffff802a5629>] kfree+0x192/0x1b1 Message from syslogd@ at Mon Sep 14 13:13:05 2009 ... btrfs1 kernel: [251789.526654] [<ffffffffa0378c9f>] put_worker+0x14/0x16 [btrfs] Message from syslogd@ at Mon Sep 14 13:13:05 2009 ... btrfs1 kernel: [251789.526654] [<ffffffffa0378d55>] btrfs_stop_workers+0xb4/0xc9 [btrfs] Message from syslogd@ at Mon Sep 14 13:13:05 2009 ... btrfs1 kernel: [251789.526654] [<ffffffffa0355cbe>] close_ctree+0x210/0x288 [btrfs] Message from syslogd@ at Mon Sep 14 13:13:05 2009 ... btrfs1 kernel: [251789.526654] [<ffffffff802bd1a1>] ? invalidate_inodes+0x100/0x112 Message from syslogd@ at Mon Sep 14 13:13:05 2009 ... btrfs1 kernel: [251789.526654] [<ffffffffa033f4cb>] btrfs_put_super+0x18/0x27 [btrfs] Message from syslogd@ at Mon Sep 14 13:13:05 2009 ... btrfs1 kernel: [251789.526654] [<ffffffff802ad12b>] generic_shutdown_super+0x73/0xe2 Message from syslogd@ at Mon Sep 14 13:13:05 2009 ... btrfs1 kernel: [251789.526654] [<ffffffff802ad1e5>] kill_anon_super+0x11/0x3b Message from syslogd@ at Mon Sep 14 13:13:05 2009 ... btrfs1 kernel: [251789.526654] [<ffffffff802ad51d>] deactivate_super+0x62/0x77 Message from syslogd@ at Mon Sep 14 13:13:05 2009 ... btrfs1 kernel: [251789.526654] [<ffffffff802bf9eb>] mntput_no_expire+0xec/0x12c Message from syslogd@ at Mon Sep 14 13:13:05 2009 ... btrfs1 kernel: [251789.526654] [<ffffffff802bff3a>] sys_umount+0x2c5/0x31c Message from syslogd@ at Mon Sep 14 13:13:05 2009 ... btrfs1 kernel: [251789.526654] [<ffffffff8020ba2b>] system_call_fastpath+0x16/0x1b Message from syslogd@ at Mon Sep 14 13:13:05 2009 ... btrfs1 kernel: [251789.526654] Code: 89 f7 e8 48 07 f8 ff 48 c1 e8 0c 48 ba 00 00 00 00 00 e2 ff ff 48 6b c0 38 48 01 d0 66 83 38 00 79 04 48 8b 40 10 80 38 00 78 04 <0f> 0b eb fe 48 8b 58 30 48 63 45 c8 48 89 df 4d 8b a4 c5 60 08 Steve> Thanks, > Chris >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Sep 14, 2009 at 04:41:48PM -0500, Steven Pratt wrote:> Chris Mason wrote: > >On Fri, Sep 11, 2009 at 04:35:50PM -0500, Steven Pratt wrote: > >>Chris Mason wrote: > >>>On Mon, Aug 31, 2009 at 12:49:13PM -0500, Steven Pratt wrote: > >>>>Better late than never. Finally got this finished up. Mixed bag on > >>>>this one. BTRFS lags significantly on single threaded. Seems > >>>>unable to keep IO outstanding to the device. Less that 60% busy on > >>>>the DM device, compared to 97%+ for all other filesystems. > >>>>nodatacow helps out, increasing utilization to about 70%, but still > >>>>trails by a large margin. > >>>Hi Steve, > >>> > >>>Jens Axboe did some profiling on his big test rig and I think we found > >>>the biggest CPU problems. The end result is now setting in the master > >>>branch of the btrfs-unstable repo. > >>> > >>>On his boxes, btrfs went from around 400MB/s streaming writes to 1GB/s > >>>limit, and we''re now tied with XFS while using less CPU time. > >>> > >>>Hopefully you will see similar results ;) > >>Hmmm, well no I didn''t. Throughputs at 1 and 128 threads are pretty > >>much unchanged, although I do see a good CPU savings on the 128 > >>thread case (with cow). For 16 threads we actually regressed with > >>cow enabled. > >> > >>Results are here: > >> > >>http://btrfs.boxacle.net/repository/raid/large_create_test/write-test/1M_odirect_create.html > >> > >>I''ll try to look more into this next week. > >> > > > >Hmmm, Jens was benchmarking buffered writes, but he was also testing on > >his new per-bdi write back code. If your next run could be buffered > >instead of O_DIRECT, I''d be curious to see the results. > > > Buffered does look a lot better. I don''t have a btrfs baseline > before these latest changes for this exact workload, but these > results are not bad at all. With cow, beats just about everything > except XFS, and with nocow simply screams. CPU consumption looks > good as well. I''ll probably give the full set of tests a run > tonight.Wow, good news at last ;) For the oops, try the patch below (I need to push it out, but I think it''ll help). I''ll try to figure out the O_DIRECT problems. -chris diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c index 6ea5cd0..ba28742 100644 --- a/fs/btrfs/async-thread.c +++ b/fs/btrfs/async-thread.c @@ -177,7 +177,7 @@ static int try_worker_shutdown(struct btrfs_worker_thread *worker) int freeit = 0; spin_lock_irq(&worker->lock); - spin_lock_irq(&worker->workers->lock); + spin_lock(&worker->workers->lock); if (worker->workers->num_workers > 1 && worker->idle && !worker->working && @@ -188,7 +188,7 @@ static int try_worker_shutdown(struct btrfs_worker_thread *worker) list_del_init(&worker->worker_list); worker->workers->num_workers--; } - spin_unlock_irq(&worker->workers->lock); + spin_unlock(&worker->workers->lock); spin_unlock_irq(&worker->lock); if (freeit) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Sep 14, 2009 at 04:41:48PM -0500, Steven Pratt wrote:> > Only bit of bad news is I did get one error that crashed the system > on single threaded nocow run. So that data point is missing. > Output below:I hope I''ve got this fixed. If you pull from the master branch of btrfs-unstable there are fixes for async thread races. The single patch I sent before is included, but not enough. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason wrote:> On Mon, Sep 14, 2009 at 04:41:48PM -0500, Steven Pratt wrote: > >> Only bit of bad news is I did get one error that crashed the system >> on single threaded nocow run. So that data point is missing. >> Output below: >> > > I hope I''ve got this fixed. If you pull from the master branch of > btrfs-unstable there are fixes for async thread races. The single > patch I sent before is included, but not enough. >Glad you said that. Keeps me from sending the email that said the patch didn''t help :-) Steve> -chris > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Steven Pratt wrote:> Chris Mason wrote: >> On Mon, Sep 14, 2009 at 04:41:48PM -0500, Steven Pratt wrote: >> >>> Only bit of bad news is I did get one error that crashed the system >>> on single threaded nocow run. So that data point is missing. >>> Output below: >>> >> >> I hope I''ve got this fixed. If you pull from the master branch of >> btrfs-unstable there are fixes for async thread races. The single >> patch I sent before is included, but not enough. >> > Glad you said that. Keeps me from sending the email that said the > patch didn''t help :-) > > SteveWell, still getting oopses even with new code. Lots of: Sep 16 11:07:27 btrfs1 kernel: [ 1862.942754] BUG: soft lockup - CPU#10 stuck for 61s! [btrfs-endio-1:30250] Sep 16 11:07:27 btrfs1 kernel: [ 1862.942754] Pid: 30250, comm: btrfs-endio-1 Not tainted 2.6.31-autokern1 #1 IBM x3950-[88726RU]- Sep 16 11:07:27 btrfs1 kernel: [ 1862.942754] RIP: 0010:[<ffffffff81153920>] [<ffffffff81153920>] crc32c+0x20/0x26 Sep 16 11:07:27 btrfs1 kernel: [ 1862.942754] RSP: 0018:ffff88013a857cc8 EFLAGS: 00000217 Sep 16 11:07:27 btrfs1 kernel: [ 1862.942754] RAX: 0000000000000040 RBX: ffff88013a857cc8 RCX: ffff88013d8022c0 Sep 16 11:07:27 btrfs1 kernel: [ 1862.942754] RDX: 0000000000000010 RSI: ffff88001d349ff0 RDI: 0000000041703e71 Sep 16 11:07:27 btrfs1 kernel: [ 1862.942754] RBP: ffffffff8100c4ee R08: 0000000000000000 R09: 0000000000000000 Sep 16 11:07:27 btrfs1 kernel: [ 1862.942754] R10: ffff88013a857d30 R11: 0000000000000002 R12: ffff88013a857d10 Sep 16 11:07:27 btrfs1 kernel: [ 1862.942754] R13: 0000000000000002 R14: ffff88013a857cb0 R15: ffffffff8100c38e Sep 16 11:07:27 btrfs1 kernel: [ 1862.942754] FS: 0000000000000000(0000) GS:ffff880028159000(0000) knlGS:0000000000000000 Sep 16 11:07:27 btrfs1 kernel: [ 1862.942754] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Sep 16 11:07:27 btrfs1 kernel: [ 1862.942754] CR2: 0000000000000043 CR3: 00000001368f7000 CR4: 00000000000006e0 Sep 16 11:07:27 btrfs1 kernel: [ 1862.942754] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Sep 16 11:07:27 btrfs1 kernel: [ 1862.942754] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Sep 16 11:07:27 btrfs1 kernel: [ 1862.942754] Call Trace: Sep 16 11:07:27 btrfs1 kernel: [ 1862.942754] [<ffffffff8115397e>] ? chksum_update+0x10/0x18 Sep 16 11:07:27 btrfs1 kernel: [ 1862.942754] [<ffffffff81150084>] ? crypto_shash_update+0x1a/0x1c Sep 16 11:07:27 btrfs1 kernel: [ 1862.942754] [<ffffffff81175c34>] ? crc32c+0x4c/0x60 Sep 16 11:07:27 btrfs1 kernel: [ 1862.942754] [<ffffffffa0391d0f>] ? get_state_private+0x38/0x6f [btrfs] Sep 16 11:07:27 btrfs1 kernel: [ 1862.942754] [<ffffffffa0376688>] ? btrfs_csum_data+0xd/0xf [btrfs] Sep 16 11:07:27 btrfs1 kernel: [ 1862.942754] [<ffffffffa037fefc>] ? btrfs_readpage_end_io_hook+0x158/0x27b [btrfs] Sep 16 11:07:27 btrfs1 kernel: [ 1862.942754] [<ffffffffa0392a46>] ? end_bio_extent_readpage+0xb8/0x1c0 [btrfs] Sep 16 11:07:27 btrfs1 kernel: [ 1862.942754] [<ffffffff810e5733>] ? bio_endio+0x26/0x28 Sep 16 11:07:27 btrfs1 kernel: [ 1862.947656] [<ffffffffa037666e>] ? end_workqueue_fn+0x111/0x11e [btrfs] Sep 16 11:07:27 btrfs1 kernel: [ 1862.947823] [<ffffffffa039a490>] ? worker_loop+0x12a/0x3ea [btrfs] Sep 16 11:07:27 btrfs1 kernel: [ 1862.947823] [<ffffffffa039a366>] ? worker_loop+0x0/0x3ea [btrfs] Sep 16 11:07:27 btrfs1 kernel: [ 1862.948800] [<ffffffff810544e4>] ? kthread+0x8f/0x97 Sep 16 11:07:27 btrfs1 kernel: [ 1862.948800] [<ffffffff8100ca1a>] ? child_rip+0xa/0x20 Sep 16 11:07:27 btrfs1 kernel: [ 1862.948800] [<ffffffff81054455>] ? kthread+0x0/0x97 Sep 16 11:07:27 btrfs1 kernel: [ 1862.948800] [<ffffffff8100ca10>] ? child_rip+0x0/0x20 Sep 16 11:54:47 btrfs1 kernel: [ 4703.082621] Pid: 31421, comm: btrfs-endio-wri Not tainted 2.6.31-autokern1 #1 IBM x3950-[88726RU]- Sep 16 11:54:47 btrfs1 kernel: [ 4703.082621] RIP: 0010:[<ffffffffa036afb3>] [<ffffffffa036afb3>] alloc_reserved_file_extent+0x8d/0x1c3 [btrfs] Sep 16 11:54:47 btrfs1 kernel: [ 4703.082621] RSP: 0018:ffff8800aa555af0 EFLAGS: 00010282 Sep 16 11:54:47 btrfs1 kernel: [ 4703.082621] RAX: 00000000ffffffef RBX: ffff88013b55e000 RCX: 0000000000000002 Sep 16 11:54:47 btrfs1 kernel: [ 4703.082621] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88012f20a9a0 Sep 16 11:54:47 btrfs1 kernel: [ 4703.082621] RBP: ffff8800aa555b60 R08: ffff8800aa555888 R09: ffff8800aa555880 Sep 16 11:54:47 btrfs1 kernel: [ 4703.082621] R10: ffff880077937400 R11: 00000000fffffffa R12: 000000000000001d Sep 16 11:54:47 btrfs1 kernel: [ 4703.082621] R13: ffff880077937400 R14: 0000000000000000 R15: 0000000000000000 Sep 16 11:54:47 btrfs1 kernel: [ 4703.082621] FS: 0000000000000000(0000) GS:ffff88002804b000(0000) knlGS:0000000000000000 Sep 16 11:54:47 btrfs1 kernel: [ 4703.082621] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Sep 16 11:54:47 btrfs1 kernel: [ 4703.082621] CR2: 00000000007c0000 CR3: 000000013e038000 CR4: 00000000000006f0 Sep 16 11:54:47 btrfs1 kernel: [ 4703.082621] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Sep 16 11:54:47 btrfs1 kernel: [ 4703.082621] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Sep 16 11:54:47 btrfs1 kernel: [ 4703.082621] Process btrfs-endio-wri (pid: 31421, threadinfo ffff8800aa554000, task ffff8801395447a0) Sep 16 11:54:47 btrfs1 kernel: [ 4703.082621] Stack: Sep 16 11:54:47 btrfs1 kernel: [ 4703.082621] ffff880077937400 0000000000000a7c 0000000000000005 0000000000000000 Sep 16 11:54:47 btrfs1 kernel: [ 4703.082621] <0> ffff880101d0c800 ffff8801140bbd20 000000b2aa555b60 ffffffffa036a190 Sep 16 11:54:47 btrfs1 kernel: [ 4703.082621] <0> 000000350000091d ffff8801090fdd40 ffff88013a4e9d40 0000000000000001 Sep 16 11:54:47 btrfs1 kernel: [ 4703.082621] Call Trace: Sep 16 11:54:47 btrfs1 kernel: [ 4703.082621] [<ffffffffa036a190>] ? update_reserved_extents+0xa7/0xbe [btrfs] Sep 16 11:54:47 btrfs1 kernel: [ 4703.082621] [<ffffffffa036f430>] run_one_delayed_ref+0x382/0x42f [btrfs] Sep 16 11:54:47 btrfs1 kernel: [ 4703.082621] [<ffffffff8100c4ee>] ? apic_timer_interrupt+0xe/0x20 Sep 16 11:54:47 btrfs1 kernel: [ 4703.082621] [<ffffffffa03700b1>] run_clustered_refs+0x237/0x2b4 [btrfs] Sep 16 11:54:47 btrfs1 kernel: [ 4703.082621] [<ffffffffa03a5665>] ? btrfs_find_ref_cluster+0xdc/0x115 [btrfs] Sep 16 11:54:47 btrfs1 kernel: [ 4703.082621] [<ffffffffa03701da>] btrfs_run_delayed_refs+0xac/0x195 [btrfs] Sep 16 11:54:47 btrfs1 kernel: [ 4703.082621] [<ffffffffa0379a76>] __btrfs_end_transaction+0x59/0xfe [btrfs] Sep 16 11:54:47 btrfs1 kernel: [ 4703.082621] [<ffffffffa0379b36>] btrfs_end_transaction+0xb/0xd [btrfs] Sep 16 11:54:47 btrfs1 kernel: [ 4703.082621] [<ffffffffa037f29b>] btrfs_finish_ordered_io+0x23c/0x265 [btrfs] Sep 16 11:54:47 btrfs1 kernel: [ 4703.082621] [<ffffffffa037f2d9>] btrfs_writepage_end_io_hook+0x15/0x17 [btrfs] Sep 16 11:54:47 btrfs1 kernel: [ 4703.082621] [<ffffffffa0392901>] end_bio_extent_writepage+0xa5/0x132 [btrfs] Sep 16 11:54:47 btrfs1 kernel: [ 4703.082621] [<ffffffff810e5733>] bio_endio+0x26/0x28 Sep 16 11:54:47 btrfs1 kernel: [ 4703.082621] [<ffffffffa037666e>] end_workqueue_fn+0x111/0x11e [btrfs] Sep 16 11:54:47 btrfs1 kernel: [ 4703.082621] [<ffffffffa039a490>] worker_loop+0x12a/0x3ea [btrfs] Sep 16 11:54:47 btrfs1 kernel: [ 4703.082621] [<ffffffffa039a366>] ? worker_loop+0x0/0x3ea [btrfs] Sep 16 11:54:48 btrfs1 kernel: [ 4703.082621] [<ffffffff810544e4>] kthread+0x8f/0x97 Sep 16 11:54:48 btrfs1 kernel: [ 4703.082621] [<ffffffff8100ca1a>] child_rip+0xa/0x20 Sep 16 11:54:48 btrfs1 kernel: [ 4703.082621] [<ffffffff81054455>] ? kthread+0x0/0x97 Sep 16 11:54:48 btrfs1 kernel: [ 4703.082621] [<ffffffff8100ca10>] ? child_rip+0x0/0x20 Sep 16 11:54:48 btrfs1 kernel: [ 4703.082621] Code: 08 4c 8d 45 d4 41 8d 44 24 18 48 8b 73 20 48 8b 4d 18 41 b9 01 00 00 00 48 8b 7d b8 4c 89 ea 89 45 d4 e8 93 e3 ff ff 85 c0 74 04 <0f> 0b eb fe 49 63 75 40 4d 8b 65 00 49 83 cf 01 4c 89 e7 48 6b Happened on 2 machines. Steve> >> -chris >> >> -- >> To unsubscribe from this list: send the line "unsubscribe >> linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Sep 16, 2009 at 12:57:22PM -0500, Steven Pratt wrote:> Steven Pratt wrote: > >Chris Mason wrote: > >>On Mon, Sep 14, 2009 at 04:41:48PM -0500, Steven Pratt wrote: > >>>Only bit of bad news is I did get one error that crashed the system > >>>on single threaded nocow run. So that data point is missing. > >>>Output below: > >> > >>I hope I''ve got this fixed. If you pull from the master branch of > >>btrfs-unstable there are fixes for async thread races. The single > >>patch I sent before is included, but not enough. > >Glad you said that. Keeps me from sending the email that said the > >patch didn''t help :-) > > > >Steve > Well, still getting oopses even with new code. > > Lots of: > Sep 16 11:07:27 btrfs1 kernel: [ 1862.942754] BUG: soft lockup - > CPU#10 stuck for 61s! [btrfs-endio-1:30250] > Sep 16 11:07:27 btrfs1 kernel: [ 1862.942754] Pid: 30250, comm: > btrfs-endio-1 Not tainted 2.6.31-autokern1 #1 IBM x3950-[88726RU]- > Sep 16 11:07:27 btrfs1 kernel: [ 1862.942754] RIP: > 0010:[<ffffffff81153920>] [<ffffffff81153920>] crc32c+0x20/0x26If I''m reading this right, you''ve got a softlockup in crc32c? Something has gone really wrong here. Are you reusing datasets from old runs? -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason wrote:> On Wed, Sep 16, 2009 at 12:57:22PM -0500, Steven Pratt wrote: > >> Steven Pratt wrote: >> >>> Chris Mason wrote: >>> >>>> On Mon, Sep 14, 2009 at 04:41:48PM -0500, Steven Pratt wrote: >>>> >>>>> Only bit of bad news is I did get one error that crashed the system >>>>> on single threaded nocow run. So that data point is missing. >>>>> Output below: >>>>> >>>> I hope I''ve got this fixed. If you pull from the master branch of >>>> btrfs-unstable there are fixes for async thread races. The single >>>> patch I sent before is included, but not enough. >>>> >>> Glad you said that. Keeps me from sending the email that said the >>> patch didn''t help :-) >>> >>> Steve >>> >> Well, still getting oopses even with new code. >> >> Lots of: >> Sep 16 11:07:27 btrfs1 kernel: [ 1862.942754] BUG: soft lockup - >> CPU#10 stuck for 61s! [btrfs-endio-1:30250] >> Sep 16 11:07:27 btrfs1 kernel: [ 1862.942754] Pid: 30250, comm: >> btrfs-endio-1 Not tainted 2.6.31-autokern1 #1 IBM x3950-[88726RU]- >> Sep 16 11:07:27 btrfs1 kernel: [ 1862.942754] RIP: >> 0010:[<ffffffff81153920>] [<ffffffff81153920>] crc32c+0x20/0x26 >> > > If I''m reading this right, you''ve got a softlockup in crc32c? Something > has gone really wrong here. Are you reusing datasets from old runs? >No, mkfs before every run. Steve> -chris > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason wrote:> On Wed, Sep 16, 2009 at 12:57:22PM -0500, Steven Pratt wrote: > >> Steven Pratt wrote: >> >>> Chris Mason wrote: >>> >>>> On Mon, Sep 14, 2009 at 04:41:48PM -0500, Steven Pratt wrote: >>>> >>>>> Only bit of bad news is I did get one error that crashed the system >>>>> on single threaded nocow run. So that data point is missing. >>>>> Output below: >>>>> >>>> I hope I''ve got this fixed. If you pull from the master branch of >>>> btrfs-unstable there are fixes for async thread races. The single >>>> patch I sent before is included, but not enough. >>>> >>> Glad you said that. Keeps me from sending the email that said the >>> patch didn''t help :-) >>> >>> Steve >>> >> Well, still getting oopses even with new code. >> >> Lots of: >> Sep 16 11:07:27 btrfs1 kernel: [ 1862.942754] BUG: soft lockup - >> CPU#10 stuck for 61s! [btrfs-endio-1:30250] >> Sep 16 11:07:27 btrfs1 kernel: [ 1862.942754] Pid: 30250, comm: >> btrfs-endio-1 Not tainted 2.6.31-autokern1 #1 IBM x3950-[88726RU]- >> Sep 16 11:07:27 btrfs1 kernel: [ 1862.942754] RIP: >> 0010:[<ffffffff81153920>] [<ffffffff81153920>] crc32c+0x20/0x26 >> > > If I''m reading this right, you''ve got a softlockup in crc32c? Something > has gone really wrong here. Are you reusing datasets from old runs? >From the second machine a single bug: Sep 16 11:53:42 btrfs2 kernel: [ 3769.298240] ------------[ cut here ]------------ Sep 16 11:53:42 btrfs2 kernel: [ 3769.298550] kernel BUG at fs/btrfs/extent-tree.c:4097! Sep 16 11:53:42 btrfs2 kernel: [ 3769.298550] invalid opcode: 0000 [#1] SMP Sep 16 11:53:42 btrfs2 kernel: [ 3769.298550] last sysfs file: /sys/devices/system/cpu/cpu15/cache/index1/shared_cpu_map Sep 16 11:53:42 btrfs2 kernel: [ 3769.298550] CPU 9 Sep 16 11:53:42 btrfs2 kernel: [ 3769.298550] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler btrfs zlib_deflate oprofile autofs4 nfs lockd nfs_acl auth_rpc gss sunrpc dm_multipath video output sbs sbshc battery ac parport_pc lp parport sg joydev serio_raw acpi_memhotplug rtc_cmos rtc_core rtc_lib button tg3 libphy i2c_ piix4 i2c_core pcspkr dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod lpfc scsi_transport_fc aic94xx libsas libata scsi_transport_sas sd_mod scsi_mod ext 3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode] Sep 16 11:53:42 btrfs2 kernel: [ 3769.298550] Pid: 2106, comm: btrfs-endio-wri Not tainted 2.6.31-autokern1 #1 IBM x3950-[88726RU]- Sep 16 11:53:42 btrfs2 kernel: [ 3769.298550] RIP: 0010:[<ffffffffa0386fb3>] [<ffffffffa0386fb3>] alloc_reserved_file_extent+0x8d/0x1c3 [btrfs] Sep 16 11:53:42 btrfs2 kernel: [ 3769.298550] RSP: 0018:ffff88002758faf0 EFLAGS: 00010282 Sep 16 11:53:42 btrfs2 kernel: [ 3769.298550] RAX: 00000000ffffffef RBX: ffff880136434000 RCX: 0000000000000002 Sep 16 11:53:42 btrfs2 kernel: [ 3769.298550] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8800a7040370 Sep 16 11:53:42 btrfs2 kernel: [ 3769.298550] RBP: ffff88002758fb60 R08: ffff88002758f958 R09: ffff88002758f950 Sep 16 11:53:42 btrfs2 kernel: [ 3769.298550] R10: 0000000000000004 R11: ffff8800a7040370 R12: 000000000000001d Sep 16 11:53:43 btrfs2 kernel: [ 3769.298550] R13: ffff8800b79e6910 R14: 0000000000000000 R15: 0000000000000000 Sep 16 11:53:43 btrfs2 kernel: [ 3769.298550] FS: 0000000000000000(0000) GS:ffff88002813e000(0000) knlGS:0000000000000000 Sep 16 11:53:43 btrfs2 kernel: [ 3769.298550] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Sep 16 11:53:43 btrfs2 kernel: [ 3769.298550] CR2: 00007f1f6915a000 CR3: 000000013dd4e000 CR4: 00000000000006e0 Sep 16 11:53:43 btrfs2 kernel: [ 3769.298550] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Sep 16 11:53:43 btrfs2 kernel: [ 3769.298550] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Sep 16 11:53:43 btrfs2 kernel: [ 3769.298550] Process btrfs-endio-wri (pid: 2106, threadinfo ffff88002758e000, task ffff88013b94c100) Sep 16 11:53:43 btrfs2 kernel: [ 3769.298550] Stack: Sep 16 11:53:43 btrfs2 kernel: [ 3769.298550] ffff8800709fc760 0000000000000856 0000000000000005 0000000000000000 Sep 16 11:53:43 btrfs2 kernel: [ 3769.298550] <0> ffff8801329d5000 ffff880102242de0 000000b22758fb60 ffffffffa0386190 Sep 16 11:53:43 btrfs2 kernel: [ 3769.298550] <0> 00000035329d5000 ffff880128291440 ffff880108302340 0000000000000001 Sep 16 11:53:43 btrfs2 kernel: [ 3769.298550] Call Trace: Sep 16 11:53:43 btrfs2 kernel: [ 3769.298550] [<ffffffffa0386190>] ? update_reserved_extents+0xa7/0xbe [btrfs] Sep 16 11:53:43 btrfs2 kernel: [ 3769.298550] [<ffffffffa038b430>] run_one_delayed_ref+0x382/0x42f [btrfs] Sep 16 11:53:43 btrfs2 kernel: [ 3769.298550] [<ffffffffa038c0b1>] run_clustered_refs+0x237/0x2b4 [btrfs] Sep 16 11:53:43 btrfs2 kernel: [ 3769.298550] [<ffffffffa03c1665>] ? btrfs_find_ref_cluster+0xdc/0x115 [btrfs] Sep 16 11:53:43 btrfs2 kernel: [ 3769.298550] [<ffffffffa038c1da>] btrfs_run_delayed_refs+0xac/0x195 [btrfs] Sep 16 11:53:43 btrfs2 kernel: [ 3769.298550] [<ffffffffa0395a76>] __btrfs_end_transaction+0x59/0xfe [btrfs] Sep 16 11:53:43 btrfs2 kernel: [ 3769.298550] [<ffffffffa0395b36>] btrfs_end_transaction+0xb/0xd [btrfs] Sep 16 11:53:43 btrfs2 kernel: [ 3769.298550] [<ffffffffa039b29b>] btrfs_finish_ordered_io+0x23c/0x265 [btrfs] Sep 16 11:53:43 btrfs2 kernel: [ 3769.298550] [<ffffffffa039b2d9>] btrfs_writepage_end_io_hook+0x15/0x17 [btrfs] Sep 16 11:53:43 btrfs2 kernel: [ 3769.298550] [<ffffffffa03ae901>] end_bio_extent_writepage+0xa5/0x132 [btrfs] Sep 16 11:53:43 btrfs2 kernel: [ 3769.298550] [<ffffffff810e5733>] bio_endio+0x26/0x28 Sep 16 11:53:43 btrfs2 kernel: [ 3769.298550] [<ffffffffa039266e>] end_workqueue_fn+0x111/0x11e [btrfs] Sep 16 11:53:43 btrfs2 kernel: [ 3769.298550] [<ffffffffa03b6490>] worker_loop+0x12a/0x3ea [btrfs] Sep 16 11:53:43 btrfs2 kernel: [ 3769.298550] [<ffffffffa03b6366>] ? worker_loop+0x0/0x3ea [btrfs] Sep 16 11:53:43 btrfs2 kernel: [ 3769.298550] [<ffffffff810544e4>] kthread+0x8f/0x97 Sep 16 11:53:43 btrfs2 kernel: [ 3769.298550] [<ffffffff8100ca1a>] child_rip+0xa/0x20 Sep 16 11:53:43 btrfs2 kernel: [ 3769.298550] [<ffffffff81054455>] ? kthread+0x0/0x97 Sep 16 11:53:43 btrfs2 kernel: [ 3769.298550] [<ffffffff8100ca10>] ? child_rip+0x0/0x20 Sep 16 11:53:43 btrfs2 kernel: [ 3769.298550] Code: 08 4c 8d 45 d4 41 8d 44 24 18 48 8b 73 20 48 8b 4d 18 41 b9 01 00 00 00 48 8b 7d b8 4c 89 ea 89 45 d4 e8 93 e3 f f ff 85 c0 74 04 <0f> 0b eb fe 49 63 75 40 4d 8b 65 00 49 83 cf 01 4c 89 e7 48 6b Steve> -chris > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Sep 16, 2009 at 01:15:12PM -0500, Steven Pratt wrote:> Chris Mason wrote: > >On Wed, Sep 16, 2009 at 12:57:22PM -0500, Steven Pratt wrote: > >>Steven Pratt wrote: > >>>Chris Mason wrote: > >>>>On Mon, Sep 14, 2009 at 04:41:48PM -0500, Steven Pratt wrote: > >>>>>Only bit of bad news is I did get one error that crashed the system > >>>>>on single threaded nocow run. So that data point is missing. > >>>>>Output below: > >>>>I hope I''ve got this fixed. If you pull from the master branch of > >>>>btrfs-unstable there are fixes for async thread races. The single > >>>>patch I sent before is included, but not enough. > >>>Glad you said that. Keeps me from sending the email that said the > >>>patch didn''t help :-) > >>> > >>>Steve > >>Well, still getting oopses even with new code. > >> > >>Lots of: > >>Sep 16 11:07:27 btrfs1 kernel: [ 1862.942754] BUG: soft lockup - > >>CPU#10 stuck for 61s! [btrfs-endio-1:30250] > >>Sep 16 11:07:27 btrfs1 kernel: [ 1862.942754] Pid: 30250, comm: > >>btrfs-endio-1 Not tainted 2.6.31-autokern1 #1 IBM x3950-[88726RU]- > >>Sep 16 11:07:27 btrfs1 kernel: [ 1862.942754] RIP: > >>0010:[<ffffffff81153920>] [<ffffffff81153920>] crc32c+0x20/0x26 > > > >If I''m reading this right, you''ve got a softlockup in crc32c? Something > >has gone really wrong here. Are you reusing datasets from old runs? > No, mkfs before every run.Could you please send me the full softlockup output? Its hard to read all line wrapped, so the original files would help. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Sep 16, 2009 at 01:16:56PM -0500, Steven Pratt wrote:> Chris Mason wrote: > >On Wed, Sep 16, 2009 at 12:57:22PM -0500, Steven Pratt wrote: > >>Steven Pratt wrote: > >>>Chris Mason wrote: > >>>>On Mon, Sep 14, 2009 at 04:41:48PM -0500, Steven Pratt wrote: > >>>>>Only bit of bad news is I did get one error that crashed the system > >>>>>on single threaded nocow run. So that data point is missing. > >>>>>Output below: > >>>>I hope I''ve got this fixed. If you pull from the master branch of > >>>>btrfs-unstable there are fixes for async thread races. The single > >>>>patch I sent before is included, but not enough. > >>>Glad you said that. Keeps me from sending the email that said the > >>>patch didn''t help :-) > >>> > >>>Steve > >>Well, still getting oopses even with new code. > >> > >>Lots of: > >>Sep 16 11:07:27 btrfs1 kernel: [ 1862.942754] BUG: soft lockup - > >>CPU#10 stuck for 61s! [btrfs-endio-1:30250] > >>Sep 16 11:07:27 btrfs1 kernel: [ 1862.942754] Pid: 30250, comm: > >>btrfs-endio-1 Not tainted 2.6.31-autokern1 #1 IBM x3950-[88726RU]- > >>Sep 16 11:07:27 btrfs1 kernel: [ 1862.942754] RIP: > >>0010:[<ffffffff81153920>] [<ffffffff81153920>] crc32c+0x20/0x26 > > > >If I''m reading this right, you''ve got a softlockup in crc32c? Something > >has gone really wrong here. Are you reusing datasets from old runs? > From the second machine a single bug: > Sep 16 11:53:42 btrfs2 kernel: [ 3769.298240] ------------[ cut hereOk, which mount options and job file is this from? -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason wrote:> On Wed, Sep 16, 2009 at 01:16:56PM -0500, Steven Pratt wrote: > >> Chris Mason wrote: >> >>> On Wed, Sep 16, 2009 at 12:57:22PM -0500, Steven Pratt wrote: >>> >>>> Steven Pratt wrote: >>>> >>>>> Chris Mason wrote: >>>>> >>>>>> On Mon, Sep 14, 2009 at 04:41:48PM -0500, Steven Pratt wrote: >>>>>> >>>>>>> Only bit of bad news is I did get one error that crashed the system >>>>>>> on single threaded nocow run. So that data point is missing. >>>>>>> Output below: >>>>>>> >>>>>> I hope I''ve got this fixed. If you pull from the master branch of >>>>>> btrfs-unstable there are fixes for async thread races. The single >>>>>> patch I sent before is included, but not enough. >>>>>> >>>>> Glad you said that. Keeps me from sending the email that said the >>>>> patch didn''t help :-) >>>>> >>>>> Steve >>>>> >>>> Well, still getting oopses even with new code. >>>> >>>> Lots of: >>>> Sep 16 11:07:27 btrfs1 kernel: [ 1862.942754] BUG: soft lockup - >>>> CPU#10 stuck for 61s! [btrfs-endio-1:30250] >>>> Sep 16 11:07:27 btrfs1 kernel: [ 1862.942754] Pid: 30250, comm: >>>> btrfs-endio-1 Not tainted 2.6.31-autokern1 #1 IBM x3950-[88726RU]- >>>> Sep 16 11:07:27 btrfs1 kernel: [ 1862.942754] RIP: >>>> 0010:[<ffffffff81153920>] [<ffffffff81153920>] crc32c+0x20/0x26 >>>> >>> If I''m reading this right, you''ve got a softlockup in crc32c? Something >>> has gone really wrong here. Are you reusing datasets from old runs? >>> >> From the second machine a single bug: >> Sep 16 11:53:42 btrfs2 kernel: [ 3769.298240] ------------[ cut here >> > > Ok, which mount options and job file is this from? > >mount -t btrfs /dev/ffsbdev1 /mnt/ffsb1'' [20090916-11:47:37.738883526] PROCESSING COMMAND : ''run random_writes__threads_0001 ffsb http://hks.austin.ibm.com/users/corry/btrfs/ffsb/profiles/btrfs2/random_writes.ffsb num_threads=1'' So , this is single disk machine, running single threaded random write workload. Buffered, not odirect. I''m packaging up full messages file, repeated errros makes it big. Will send separately. Steve> -chris > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason wrote:> On Mon, Sep 14, 2009 at 04:41:48PM -0500, Steven Pratt wrote: >> Only bit of bad news is I did get one error that crashed the system >> on single threaded nocow run. So that data point is missing. >> Output below: > > I hope I''ve got this fixed. If you pull from the master branch of > btrfs-unstable there are fixes for async thread races. The single > patch I sent before is included, but not enough.Chris: FYI - all five of my test systems have now finished my standard test cycle on the -unstable master branch, and I''ve not seen a single hang. So, your fix for the async thread shutdown race seems to have fixed my problems, even if Steve''s still seeing trouble. I''ll note that the running times for fsstress on some of my systems have become rather longer with btrfs-unstable/master kernels - 3.5 rather than 2.5 hours on multidevice filesystems. Running times on single device filesystems are roughly the same. I''m going to start another set of tests for thoroughness unless you''ve got more patches coming. Thanks, Eric> > -chris >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Whitney wrote:> > > Chris Mason wrote: >> On Mon, Sep 14, 2009 at 04:41:48PM -0500, Steven Pratt wrote: >>> Only bit of bad news is I did get one error that crashed the system >>> on single threaded nocow run. So that data point is missing. >>> Output below: >> >> I hope I''ve got this fixed. If you pull from the master branch of >> btrfs-unstable there are fixes for async thread races. The single >> patch I sent before is included, but not enough. > > Chris: > > FYI - all five of my test systems have now finished my standard test > cycle on the -unstable master branch, and I''ve not seen a single hang. > So, your fix for the async thread shutdown race seems to have fixed my > problems, even if Steve''s still seeing trouble. > > I''ll note that the running times for fsstress on some of my systems > have become rather longer with btrfs-unstable/master kernels - 3.5 > rather than 2.5 hours on multidevice filesystems. Running times on > single device filesystems are roughly the same. > > I''m going to start another set of tests for thoroughness unless you''ve > got more patches coming.I''ve had some offline discussions with Chris, and it seems the problem is triggered by unmounting and re-mounting the file system between tests (but not running mkfs again). I have also just verified that the problem does not occur if repeated tests are run without the unmount mount cycle. So in case this is not clear: mkfs mount create new files run test umount mount delete old files create new files run test BUG but.. mkfs mount create new files run test umount mkfs <------ differnet mount delete old files create new files run test ... all is fine or... mkfs mount create new files run test # no mounts or mkfs here delete old files create new files run test ... all is fine Steve> > Thanks, > Eric > >> >> -chris >>-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 17, 2009 at 01:39:01PM -0500, Steven Pratt wrote:> Eric Whitney wrote: > > > > > >Chris Mason wrote: > >>On Mon, Sep 14, 2009 at 04:41:48PM -0500, Steven Pratt wrote: > >>>Only bit of bad news is I did get one error that crashed the system > >>>on single threaded nocow run. So that data point is missing. > >>>Output below: > >> > >>I hope I''ve got this fixed. If you pull from the master branch of > >>btrfs-unstable there are fixes for async thread races. The single > >>patch I sent before is included, but not enough. > > > >Chris: > > > >FYI - all five of my test systems have now finished my standard > >test cycle on the -unstable master branch, and I''ve not seen a > >single hang. So, your fix for the async thread shutdown race seems > >to have fixed my problems, even if Steve''s still seeing trouble. > > > >I''ll note that the running times for fsstress on some of my > >systems have become rather longer with btrfs-unstable/master > >kernels - 3.5 rather than 2.5 hours on multidevice filesystems. > >Running times on single device filesystems are roughly the same. > > > >I''m going to start another set of tests for thoroughness unless > >you''ve got more patches coming. > I''ve had some offline discussions with Chris, and it seems the > problem is triggered by unmounting and re-mounting the file system > between tests (but not running mkfs again). I have also just > verified that the problem does not occur if repeated tests are run > without the unmount mount cycle. So in case this is not clear:Ok, I''ve triggered it here. Next step is trying Yan Zheng''s async caching update. ------------[ cut here ]------------ kernel BUG at fs/btrfs/extent-tree.c:4097! invalid opcode: 0000 [#1] SMP -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ crashes on runs involving unmounts ] The run is still going here, but it has survived longer than before. I''m trying with Yan Zheng''s patch: From: Yan Zheng <zheng.yan@oracle.com> Date: Fri, 11 Sep 2009 16:11:19 -0400 Subject: [PATCH] Btrfs: improve async block group caching This patch gets rid of two limitations of async block group caching. The old code delays handling pinned extents when block group is in caching. To allocate logged file extents, the old code need wait until block group is fully cached. To get rid of the limitations, This patch introduces a data structure to track the progress of caching. Base on the caching progress, we know which extents should be added to the free space cache when handling the pinned extents. The logged file extents are also handled in a similar way. This patch also changes how pinned extents are tracked. The old code uses one tree to track pinned extents, and copy the pinned extents tree at transaction commit time. This patch makes it use two trees to track pinned extents. One tree for extents that are pinned in the running transaction, one tree for extents that can be unpinned. At transaction commit time, we swap the two trees. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com> --- fs/btrfs/ctree.h | 29 ++- fs/btrfs/disk-io.c | 7 +- fs/btrfs/extent-tree.c | 586 +++++++++++++++++++++++++++++------------------- fs/btrfs/transaction.c | 15 +- fs/btrfs/tree-log.c | 4 +- 5 files changed, 382 insertions(+), 259 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 732d5b8..3b6df71 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -726,6 +726,15 @@ enum btrfs_caching_type { BTRFS_CACHE_FINISHED = 2, }; +struct btrfs_caching_control { + struct list_head list; + struct mutex mutex; + wait_queue_head_t wait; + struct btrfs_block_group_cache *block_group; + u64 progress; + atomic_t count; +}; + struct btrfs_block_group_cache { struct btrfs_key key; struct btrfs_block_group_item item; @@ -742,8 +751,9 @@ struct btrfs_block_group_cache { int dirty; /* cache tracking stuff */ - wait_queue_head_t caching_q; int cached; + struct btrfs_caching_control *caching_ctl; + u64 last_byte_to_unpin; struct btrfs_space_info *space_info; @@ -788,7 +798,8 @@ struct btrfs_fs_info { spinlock_t block_group_cache_lock; struct rb_root block_group_cache_tree; - struct extent_io_tree pinned_extents; + struct extent_io_tree freed_extents[2]; + struct extent_io_tree *pinned_extents; /* logical->physical extent mapping */ struct btrfs_mapping_tree mapping_tree; @@ -825,8 +836,6 @@ struct btrfs_fs_info { struct mutex drop_mutex; struct mutex volume_mutex; struct mutex tree_reloc_mutex; - struct rw_semaphore extent_commit_sem; - /* * this protects the ordered operations list only while we are * processing all of the entries on it. This way we make @@ -835,10 +844,12 @@ struct btrfs_fs_info { * before jumping into the main commit. */ struct mutex ordered_operations_mutex; + struct rw_semaphore extent_commit_sem; struct list_head trans_list; struct list_head hashers; struct list_head dead_roots; + struct list_head caching_block_groups; atomic_t nr_async_submits; atomic_t async_submit_draining; @@ -1920,8 +1931,8 @@ void btrfs_put_block_group(struct btrfs_block_group_cache *cache); int btrfs_run_delayed_refs(struct btrfs_trans_handle *trans, struct btrfs_root *root, unsigned long count); int btrfs_lookup_extent(struct btrfs_root *root, u64 start, u64 len); -int btrfs_update_pinned_extents(struct btrfs_root *root, - u64 bytenr, u64 num, int pin); +int btrfs_pin_extent(struct btrfs_root *root, + u64 bytenr, u64 num, int reserved); int btrfs_drop_leaf_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct extent_buffer *leaf); int btrfs_cross_ref_exist(struct btrfs_trans_handle *trans, @@ -1971,9 +1982,10 @@ int btrfs_free_extent(struct btrfs_trans_handle *trans, u64 root_objectid, u64 owner, u64 offset); int btrfs_free_reserved_extent(struct btrfs_root *root, u64 start, u64 len); +int btrfs_prepare_extent_commit(struct btrfs_trans_handle *trans, + struct btrfs_root *root); int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans, - struct btrfs_root *root, - struct extent_io_tree *unpin); + struct btrfs_root *root); int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 bytenr, u64 num_bytes, u64 parent, @@ -2006,7 +2018,6 @@ void btrfs_delalloc_reserve_space(struct btrfs_root *root, struct inode *inode, u64 bytes); void btrfs_delalloc_free_space(struct btrfs_root *root, struct inode *inode, u64 bytes); -void btrfs_free_pinned_extents(struct btrfs_fs_info *info); /* ctree.c */ int btrfs_bin_search(struct extent_buffer *eb, struct btrfs_key *key, int level, int *slot); diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 253da7e..16dae12 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1563,6 +1563,7 @@ struct btrfs_root *open_ctree(struct super_block *sb, INIT_LIST_HEAD(&fs_info->hashers); INIT_LIST_HEAD(&fs_info->delalloc_inodes); INIT_LIST_HEAD(&fs_info->ordered_operations); + INIT_LIST_HEAD(&fs_info->caching_block_groups); spin_lock_init(&fs_info->delalloc_lock); spin_lock_init(&fs_info->new_trans_lock); spin_lock_init(&fs_info->ref_cache_lock); @@ -1621,8 +1622,11 @@ struct btrfs_root *open_ctree(struct super_block *sb, spin_lock_init(&fs_info->block_group_cache_lock); fs_info->block_group_cache_tree.rb_node = NULL; - extent_io_tree_init(&fs_info->pinned_extents, + extent_io_tree_init(&fs_info->freed_extents[0], fs_info->btree_inode->i_mapping, GFP_NOFS); + extent_io_tree_init(&fs_info->freed_extents[1], + fs_info->btree_inode->i_mapping, GFP_NOFS); + fs_info->pinned_extents = &fs_info->freed_extents[0]; fs_info->do_barriers = 1; BTRFS_I(fs_info->btree_inode)->root = tree_root; @@ -2359,7 +2363,6 @@ int close_ctree(struct btrfs_root *root) free_extent_buffer(root->fs_info->csum_root->commit_root); btrfs_free_block_groups(root->fs_info); - btrfs_free_pinned_extents(root->fs_info); del_fs_roots(fs_info); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index edd86ae..9bcb9c0 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -32,12 +32,12 @@ #include "locking.h" #include "free-space-cache.h" -static int update_reserved_extents(struct btrfs_root *root, - u64 bytenr, u64 num, int reserve); static int update_block_group(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 bytenr, u64 num_bytes, int alloc, int mark_free); +static int update_reserved_extents(struct btrfs_block_group_cache *cache, + u64 num_bytes, int reserve); static int __btrfs_free_extent(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 bytenr, u64 num_bytes, u64 parent, @@ -57,10 +57,17 @@ static int alloc_reserved_tree_block(struct btrfs_trans_handle *trans, u64 parent, u64 root_objectid, u64 flags, struct btrfs_disk_key *key, int level, struct btrfs_key *ins); - static int do_chunk_alloc(struct btrfs_trans_handle *trans, struct btrfs_root *extent_root, u64 alloc_bytes, u64 flags, int force); +static int pin_down_bytes(struct btrfs_trans_handle *trans, + struct btrfs_root *root, + struct btrfs_path *path, + u64 bytenr, u64 num_bytes, + int is_data, int reserved, + struct extent_buffer **must_clean); +static int find_next_key(struct btrfs_path *path, int level, + struct btrfs_key *key); static noinline int block_group_cache_done(struct btrfs_block_group_cache *cache) @@ -153,34 +160,34 @@ block_group_cache_tree_search(struct btrfs_fs_info *info, u64 bytenr, return ret; } -/* - * We always set EXTENT_LOCKED for the super mirror extents so we don''t - * overwrite them, so those bits need to be unset. Also, if we are unmounting - * with pinned extents still sitting there because we had a block group caching, - * we need to clear those now, since we are done. - */ -void btrfs_free_pinned_extents(struct btrfs_fs_info *info) +static int add_excluded_extent(struct btrfs_root *root, + u64 start, u64 num_bytes) { - u64 start, end, last = 0; - int ret; + u64 end = start + num_bytes - 1; + set_extent_bits(&root->fs_info->freed_extents[0], + start, end, EXTENT_UPTODATE, GFP_NOFS); + set_extent_bits(&root->fs_info->freed_extents[1], + start, end, EXTENT_UPTODATE, GFP_NOFS); + return 0; +} - while (1) { - ret = find_first_extent_bit(&info->pinned_extents, last, - &start, &end, - EXTENT_LOCKED|EXTENT_DIRTY); - if (ret) - break; +static void free_excluded_extents(struct btrfs_root *root, + struct btrfs_block_group_cache *cache) +{ + u64 start, end; - clear_extent_bits(&info->pinned_extents, start, end, - EXTENT_LOCKED|EXTENT_DIRTY, GFP_NOFS); - last = end+1; - } + start = cache->key.objectid; + end = start + cache->key.offset - 1; + + clear_extent_bits(&root->fs_info->freed_extents[0], + start, end, EXTENT_UPTODATE, GFP_NOFS); + clear_extent_bits(&root->fs_info->freed_extents[1], + start, end, EXTENT_UPTODATE, GFP_NOFS); } -static int remove_sb_from_cache(struct btrfs_root *root, - struct btrfs_block_group_cache *cache) +static int exclude_super_stripes(struct btrfs_root *root, + struct btrfs_block_group_cache *cache) { - struct btrfs_fs_info *fs_info = root->fs_info; u64 bytenr; u64 *logical; int stripe_len; @@ -192,17 +199,41 @@ static int remove_sb_from_cache(struct btrfs_root *root, cache->key.objectid, bytenr, 0, &logical, &nr, &stripe_len); BUG_ON(ret); + while (nr--) { - try_lock_extent(&fs_info->pinned_extents, - logical[nr], - logical[nr] + stripe_len - 1, GFP_NOFS); + ret = add_excluded_extent(root, logical[nr], + stripe_len); + BUG_ON(ret); } + kfree(logical); } - return 0; } +static struct btrfs_caching_control * +get_caching_control(struct btrfs_block_group_cache *cache) +{ + struct btrfs_caching_control *ctl; + + spin_lock(&cache->lock); + if (cache->cached != BTRFS_CACHE_STARTED) { + spin_unlock(&cache->lock); + return NULL; + } + + ctl = cache->caching_ctl; + atomic_inc(&ctl->count); + spin_unlock(&cache->lock); + return ctl; +} + +static void put_caching_control(struct btrfs_caching_control *ctl) +{ + if (atomic_dec_and_test(&ctl->count)) + kfree(ctl); +} + /* * this is only called by cache_block_group, since we could have freed extents * we need to check the pinned_extents for any extents that can''t be used yet @@ -215,9 +246,9 @@ static u64 add_new_free_space(struct btrfs_block_group_cache *block_group, int ret; while (start < end) { - ret = find_first_extent_bit(&info->pinned_extents, start, + ret = find_first_extent_bit(info->pinned_extents, start, &extent_start, &extent_end, - EXTENT_DIRTY|EXTENT_LOCKED); + EXTENT_DIRTY | EXTENT_UPTODATE); if (ret) break; @@ -249,22 +280,24 @@ static int caching_kthread(void *data) { struct btrfs_block_group_cache *block_group = data; struct btrfs_fs_info *fs_info = block_group->fs_info; - u64 last = 0; + struct btrfs_caching_control *caching_ctl = block_group->caching_ctl; + struct btrfs_root *extent_root = fs_info->extent_root; struct btrfs_path *path; - int ret = 0; - struct btrfs_key key; struct extent_buffer *leaf; - int slot; + struct btrfs_key key; u64 total_found = 0; - - BUG_ON(!fs_info); + u64 last = 0; + u32 nritems; + int ret = 0; path = btrfs_alloc_path(); if (!path) return -ENOMEM; - atomic_inc(&block_group->space_info->caching_threads); + exclude_super_stripes(extent_root, block_group); + last = max_t(u64, block_group->key.objectid, BTRFS_SUPER_INFO_OFFSET); + /* * We don''t want to deadlock with somebody trying to allocate a new * extent for the extent root while also trying to search the extent @@ -277,74 +310,64 @@ static int caching_kthread(void *data) key.objectid = last; key.offset = 0; - btrfs_set_key_type(&key, BTRFS_EXTENT_ITEM_KEY); + key.type = BTRFS_EXTENT_ITEM_KEY; again: + mutex_lock(&caching_ctl->mutex); /* need to make sure the commit_root doesn''t disappear */ down_read(&fs_info->extent_commit_sem); - ret = btrfs_search_slot(NULL, fs_info->extent_root, &key, path, 0, 0); + ret = btrfs_search_slot(NULL, extent_root, &key, path, 0, 0); if (ret < 0) goto err; + leaf = path->nodes[0]; + nritems = btrfs_header_nritems(leaf); + while (1) { smp_mb(); - if (block_group->fs_info->closing > 1) { + if (fs_info->closing > 1) { last = (u64)-1; break; } - leaf = path->nodes[0]; - slot = path->slots[0]; - if (slot >= btrfs_header_nritems(leaf)) { - ret = btrfs_next_leaf(fs_info->extent_root, path); - if (ret < 0) - goto err; - else if (ret) + if (path->slots[0] < nritems) { + btrfs_item_key_to_cpu(leaf, &key, path->slots[0]); + } else { + ret = find_next_key(path, 0, &key); + if (ret) break; - if (need_resched() || - btrfs_transaction_in_commit(fs_info)) { - leaf = path->nodes[0]; - - /* this shouldn''t happen, but if the - * leaf is empty just move on. - */ - if (btrfs_header_nritems(leaf) == 0) - break; - /* - * we need to copy the key out so that - * we are sure the next search advances - * us forward in the btree. - */ - btrfs_item_key_to_cpu(leaf, &key, 0); - btrfs_release_path(fs_info->extent_root, path); - up_read(&fs_info->extent_commit_sem); + caching_ctl->progress = last; + btrfs_release_path(extent_root, path); + up_read(&fs_info->extent_commit_sem); + mutex_unlock(&caching_ctl->mutex); + if (btrfs_transaction_in_commit(fs_info)) schedule_timeout(1); - goto again; - } + else + cond_resched(); + goto again; + } + if (key.objectid < block_group->key.objectid) { + path->slots[0]++; continue; } - btrfs_item_key_to_cpu(leaf, &key, slot); - if (key.objectid < block_group->key.objectid) - goto next; if (key.objectid >= block_group->key.objectid + block_group->key.offset) break; - if (btrfs_key_type(&key) == BTRFS_EXTENT_ITEM_KEY) { + if (key.type == BTRFS_EXTENT_ITEM_KEY) { total_found += add_new_free_space(block_group, fs_info, last, key.objectid); last = key.objectid + key.offset; - } - if (total_found > (1024 * 1024 * 2)) { - total_found = 0; - wake_up(&block_group->caching_q); + if (total_found > (1024 * 1024 * 2)) { + total_found = 0; + wake_up(&caching_ctl->wait); + } } -next: path->slots[0]++; } ret = 0; @@ -352,33 +375,65 @@ next: total_found += add_new_free_space(block_group, fs_info, last, block_group->key.objectid + block_group->key.offset); + caching_ctl->progress = (u64)-1; spin_lock(&block_group->lock); + block_group->caching_ctl = NULL; block_group->cached = BTRFS_CACHE_FINISHED; spin_unlock(&block_group->lock); err: btrfs_free_path(path); up_read(&fs_info->extent_commit_sem); - atomic_dec(&block_group->space_info->caching_threads); - wake_up(&block_group->caching_q); + free_excluded_extents(extent_root, block_group); + + mutex_unlock(&caching_ctl->mutex); + wake_up(&caching_ctl->wait); + + put_caching_control(caching_ctl); + atomic_dec(&block_group->space_info->caching_threads); return 0; } static int cache_block_group(struct btrfs_block_group_cache *cache) { + struct btrfs_fs_info *fs_info = cache->fs_info; + struct btrfs_caching_control *caching_ctl; struct task_struct *tsk; int ret = 0; + smp_mb(); + if (cache->cached != BTRFS_CACHE_NO) + return 0; + + caching_ctl = kzalloc(sizeof(*caching_ctl), GFP_KERNEL); + BUG_ON(!caching_ctl); + + INIT_LIST_HEAD(&caching_ctl->list); + mutex_init(&caching_ctl->mutex); + init_waitqueue_head(&caching_ctl->wait); + caching_ctl->block_group = cache; + caching_ctl->progress = cache->key.objectid; + /* one for caching kthread, one for caching block group list */ + atomic_set(&caching_ctl->count, 2); + spin_lock(&cache->lock); if (cache->cached != BTRFS_CACHE_NO) { spin_unlock(&cache->lock); - return ret; + kfree(caching_ctl); + return 0; } + cache->caching_ctl = caching_ctl; cache->cached = BTRFS_CACHE_STARTED; spin_unlock(&cache->lock); + down_write(&fs_info->extent_commit_sem); + list_add_tail(&caching_ctl->list, &fs_info->caching_block_groups); + up_write(&fs_info->extent_commit_sem); + + atomic_inc(&cache->space_info->caching_threads); + tsk = kthread_run(caching_kthread, cache, "btrfs-cache-%llu\n", cache->key.objectid); if (IS_ERR(tsk)) { @@ -1656,7 +1711,6 @@ static int run_delayed_data_ref(struct btrfs_trans_handle *trans, parent, ref_root, flags, ref->objectid, ref->offset, &ins, node->ref_mod); - update_reserved_extents(root, ins.objectid, ins.offset, 0); } else if (node->action == BTRFS_ADD_DELAYED_REF) { ret = __btrfs_inc_extent_ref(trans, root, node->bytenr, node->num_bytes, parent, @@ -1782,7 +1836,6 @@ static int run_delayed_tree_ref(struct btrfs_trans_handle *trans, extent_op->flags_to_set, &extent_op->key, ref->level, &ins); - update_reserved_extents(root, ins.objectid, ins.offset, 0); } else if (node->action == BTRFS_ADD_DELAYED_REF) { ret = __btrfs_inc_extent_ref(trans, root, node->bytenr, node->num_bytes, parent, ref_root, @@ -1817,16 +1870,32 @@ static int run_one_delayed_ref(struct btrfs_trans_handle *trans, BUG_ON(extent_op); head = btrfs_delayed_node_to_head(node); if (insert_reserved) { + int mark_free = 0; + struct extent_buffer *must_clean = NULL; + + ret = pin_down_bytes(trans, root, NULL, + node->bytenr, node->num_bytes, + head->is_data, 1, &must_clean); + if (ret > 0) + mark_free = 1; + + if (must_clean) { + clean_tree_block(NULL, root, must_clean); + btrfs_tree_unlock(must_clean); + free_extent_buffer(must_clean); + } if (head->is_data) { ret = btrfs_del_csums(trans, root, node->bytenr, node->num_bytes); BUG_ON(ret); } - btrfs_update_pinned_extents(root, node->bytenr, - node->num_bytes, 1); - update_reserved_extents(root, node->bytenr, - node->num_bytes, 0); + if (mark_free) { + ret = btrfs_free_reserved_extent(root, + node->bytenr, + node->num_bytes); + BUG_ON(ret); + } } mutex_unlock(&head->mutex); return 0; @@ -3008,10 +3077,12 @@ static int update_block_group(struct btrfs_trans_handle *trans, num_bytes = min(total, cache->key.offset - byte_in_group); if (alloc) { old_val += num_bytes; + btrfs_set_block_group_used(&cache->item, old_val); + cache->reserved -= num_bytes; cache->space_info->bytes_used += num_bytes; + cache->space_info->bytes_reserved -= num_bytes; if (cache->ro) cache->space_info->bytes_readonly -= num_bytes; - btrfs_set_block_group_used(&cache->item, old_val); spin_unlock(&cache->lock); spin_unlock(&cache->space_info->lock); } else { @@ -3056,127 +3127,136 @@ static u64 first_logical_byte(struct btrfs_root *root, u64 search_start) return bytenr; } -int btrfs_update_pinned_extents(struct btrfs_root *root, - u64 bytenr, u64 num, int pin) +/* + * this function must be called within transaction + */ +int btrfs_pin_extent(struct btrfs_root *root, + u64 bytenr, u64 num_bytes, int reserved) { - u64 len; - struct btrfs_block_group_cache *cache; struct btrfs_fs_info *fs_info = root->fs_info; + struct btrfs_block_group_cache *cache; - if (pin) - set_extent_dirty(&fs_info->pinned_extents, - bytenr, bytenr + num - 1, GFP_NOFS); - - while (num > 0) { - cache = btrfs_lookup_block_group(fs_info, bytenr); - BUG_ON(!cache); - len = min(num, cache->key.offset - - (bytenr - cache->key.objectid)); - if (pin) { - spin_lock(&cache->space_info->lock); - spin_lock(&cache->lock); - cache->pinned += len; - cache->space_info->bytes_pinned += len; - spin_unlock(&cache->lock); - spin_unlock(&cache->space_info->lock); - fs_info->total_pinned += len; - } else { - int unpin = 0; + cache = btrfs_lookup_block_group(fs_info, bytenr); + BUG_ON(!cache); - /* - * in order to not race with the block group caching, we - * only want to unpin the extent if we are cached. If - * we aren''t cached, we want to start async caching this - * block group so we can free the extent the next time - * around. - */ - spin_lock(&cache->space_info->lock); - spin_lock(&cache->lock); - unpin = (cache->cached == BTRFS_CACHE_FINISHED); - if (likely(unpin)) { - cache->pinned -= len; - cache->space_info->bytes_pinned -= len; - fs_info->total_pinned -= len; - } - spin_unlock(&cache->lock); - spin_unlock(&cache->space_info->lock); + spin_lock(&cache->space_info->lock); + spin_lock(&cache->lock); + cache->pinned += num_bytes; + cache->space_info->bytes_pinned += num_bytes; + if (reserved) { + cache->reserved -= num_bytes; + cache->space_info->bytes_reserved -= num_bytes; + } + spin_unlock(&cache->lock); + spin_unlock(&cache->space_info->lock); - if (likely(unpin)) - clear_extent_dirty(&fs_info->pinned_extents, - bytenr, bytenr + len -1, - GFP_NOFS); - else - cache_block_group(cache); + btrfs_put_block_group(cache); - if (unpin) - btrfs_add_free_space(cache, bytenr, len); - } - btrfs_put_block_group(cache); - bytenr += len; - num -= len; + set_extent_dirty(fs_info->pinned_extents, + bytenr, bytenr + num_bytes - 1, GFP_NOFS); + return 0; +} + +static int update_reserved_extents(struct btrfs_block_group_cache *cache, + u64 num_bytes, int reserve) +{ + spin_lock(&cache->space_info->lock); + spin_lock(&cache->lock); + if (reserve) { + cache->reserved += num_bytes; + cache->space_info->bytes_reserved += num_bytes; + } else { + cache->reserved -= num_bytes; + cache->space_info->bytes_reserved -= num_bytes; } + spin_unlock(&cache->lock); + spin_unlock(&cache->space_info->lock); return 0; } -static int update_reserved_extents(struct btrfs_root *root, - u64 bytenr, u64 num, int reserve) +int btrfs_prepare_extent_commit(struct btrfs_trans_handle *trans, + struct btrfs_root *root) { - u64 len; - struct btrfs_block_group_cache *cache; struct btrfs_fs_info *fs_info = root->fs_info; + struct btrfs_caching_control *next; + struct btrfs_caching_control *caching_ctl; + struct btrfs_block_group_cache *cache; - while (num > 0) { - cache = btrfs_lookup_block_group(fs_info, bytenr); - BUG_ON(!cache); - len = min(num, cache->key.offset - - (bytenr - cache->key.objectid)); + down_write(&fs_info->extent_commit_sem); - spin_lock(&cache->space_info->lock); - spin_lock(&cache->lock); - if (reserve) { - cache->reserved += len; - cache->space_info->bytes_reserved += len; + list_for_each_entry_safe(caching_ctl, next, + &fs_info->caching_block_groups, list) { + cache = caching_ctl->block_group; + if (block_group_cache_done(cache)) { + cache->last_byte_to_unpin = (u64)-1; + list_del_init(&caching_ctl->list); + put_caching_control(caching_ctl); } else { - cache->reserved -= len; - cache->space_info->bytes_reserved -= len; + cache->last_byte_to_unpin = caching_ctl->progress; } - spin_unlock(&cache->lock); - spin_unlock(&cache->space_info->lock); - btrfs_put_block_group(cache); - bytenr += len; - num -= len; } + + if (fs_info->pinned_extents == &fs_info->freed_extents[0]) + fs_info->pinned_extents = &fs_info->freed_extents[1]; + else + fs_info->pinned_extents = &fs_info->freed_extents[0]; + + up_write(&fs_info->extent_commit_sem); return 0; } -int btrfs_copy_pinned(struct btrfs_root *root, struct extent_io_tree *copy) +static int unpin_extent_range(struct btrfs_root *root, u64 start, u64 end) { - u64 last = 0; - u64 start; - u64 end; - struct extent_io_tree *pinned_extents = &root->fs_info->pinned_extents; - int ret; + struct btrfs_fs_info *fs_info = root->fs_info; + struct btrfs_block_group_cache *cache = NULL; + u64 len; - while (1) { - ret = find_first_extent_bit(pinned_extents, last, - &start, &end, EXTENT_DIRTY); - if (ret) - break; + while (start <= end) { + if (!cache || + start >= cache->key.objectid + cache->key.offset) { + if (cache) + btrfs_put_block_group(cache); + cache = btrfs_lookup_block_group(fs_info, start); + BUG_ON(!cache); + } + + len = cache->key.objectid + cache->key.offset - start; + len = min(len, end + 1 - start); + + if (start < cache->last_byte_to_unpin) { + len = min(len, cache->last_byte_to_unpin - start); + btrfs_add_free_space(cache, start, len); + } + + spin_lock(&cache->space_info->lock); + spin_lock(&cache->lock); + cache->pinned -= len; + cache->space_info->bytes_pinned -= len; + spin_unlock(&cache->lock); + spin_unlock(&cache->space_info->lock); - set_extent_dirty(copy, start, end, GFP_NOFS); - last = end + 1; + start += len; } + + if (cache) + btrfs_put_block_group(cache); return 0; } int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans, - struct btrfs_root *root, - struct extent_io_tree *unpin) + struct btrfs_root *root) { + struct btrfs_fs_info *fs_info = root->fs_info; + struct extent_io_tree *unpin; u64 start; u64 end; int ret; + if (fs_info->pinned_extents == &fs_info->freed_extents[0]) + unpin = &fs_info->freed_extents[1]; + else + unpin = &fs_info->freed_extents[0]; + while (1) { ret = find_first_extent_bit(unpin, 0, &start, &end, EXTENT_DIRTY); @@ -3185,10 +3265,8 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans, ret = btrfs_discard_extent(root, start, end + 1 - start); - /* unlocks the pinned mutex */ - btrfs_update_pinned_extents(root, start, end + 1 - start, 0); clear_extent_dirty(unpin, start, end, GFP_NOFS); - + unpin_extent_range(root, start, end); cond_resched(); } @@ -3198,7 +3276,8 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans, static int pin_down_bytes(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct btrfs_path *path, - u64 bytenr, u64 num_bytes, int is_data, + u64 bytenr, u64 num_bytes, + int is_data, int reserved, struct extent_buffer **must_clean) { int err = 0; @@ -3230,15 +3309,15 @@ static int pin_down_bytes(struct btrfs_trans_handle *trans, } free_extent_buffer(buf); pinit: - btrfs_set_path_blocking(path); + if (path) + btrfs_set_path_blocking(path); /* unlocks the pinned mutex */ - btrfs_update_pinned_extents(root, bytenr, num_bytes, 1); + btrfs_pin_extent(root, bytenr, num_bytes, reserved); BUG_ON(err < 0); return 0; } - static int __btrfs_free_extent(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 bytenr, u64 num_bytes, u64 parent, @@ -3412,7 +3491,7 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans, } ret = pin_down_bytes(trans, root, path, bytenr, - num_bytes, is_data, &must_clean); + num_bytes, is_data, 0, &must_clean); if (ret > 0) mark_free = 1; BUG_ON(ret < 0); @@ -3543,8 +3622,7 @@ int btrfs_free_extent(struct btrfs_trans_handle *trans, if (root_objectid == BTRFS_TREE_LOG_OBJECTID) { WARN_ON(owner >= BTRFS_FIRST_FREE_OBJECTID); /* unlocks the pinned mutex */ - btrfs_update_pinned_extents(root, bytenr, num_bytes, 1); - update_reserved_extents(root, bytenr, num_bytes, 0); + btrfs_pin_extent(root, bytenr, num_bytes, 1); ret = 0; } else if (owner < BTRFS_FIRST_FREE_OBJECTID) { ret = btrfs_add_delayed_tree_ref(trans, bytenr, num_bytes, @@ -3584,19 +3662,33 @@ static noinline int wait_block_group_cache_progress(struct btrfs_block_group_cache *cache, u64 num_bytes) { + struct btrfs_caching_control *caching_ctl; DEFINE_WAIT(wait); - prepare_to_wait(&cache->caching_q, &wait, TASK_UNINTERRUPTIBLE); - - if (block_group_cache_done(cache)) { - finish_wait(&cache->caching_q, &wait); + caching_ctl = get_caching_control(cache); + if (!caching_ctl) return 0; - } - schedule(); - finish_wait(&cache->caching_q, &wait); - wait_event(cache->caching_q, block_group_cache_done(cache) || + wait_event(caching_ctl->wait, block_group_cache_done(cache) || (cache->free_space >= num_bytes)); + + put_caching_control(caching_ctl); + return 0; +} + +static noinline int +wait_block_group_cache_done(struct btrfs_block_group_cache *cache) +{ + struct btrfs_caching_control *caching_ctl; + DEFINE_WAIT(wait); + + caching_ctl = get_caching_control(cache); + if (!caching_ctl) + return 0; + + wait_event(caching_ctl->wait, block_group_cache_done(cache)); + + put_caching_control(caching_ctl); return 0; } @@ -3880,6 +3972,8 @@ checks: search_start - offset); BUG_ON(offset > search_start); + update_reserved_extents(block_group, num_bytes, 1); + /* we are all good, lets return */ break; loop: @@ -3972,12 +4066,12 @@ static void dump_space_info(struct btrfs_space_info *info, u64 bytes) up_read(&info->groups_sem); } -static int __btrfs_reserve_extent(struct btrfs_trans_handle *trans, - struct btrfs_root *root, - u64 num_bytes, u64 min_alloc_size, - u64 empty_size, u64 hint_byte, - u64 search_end, struct btrfs_key *ins, - u64 data) +int btrfs_reserve_extent(struct btrfs_trans_handle *trans, + struct btrfs_root *root, + u64 num_bytes, u64 min_alloc_size, + u64 empty_size, u64 hint_byte, + u64 search_end, struct btrfs_key *ins, + u64 data) { int ret; u64 search_start = 0; @@ -4043,25 +4137,8 @@ int btrfs_free_reserved_extent(struct btrfs_root *root, u64 start, u64 len) ret = btrfs_discard_extent(root, start, len); btrfs_add_free_space(cache, start, len); + update_reserved_extents(cache, len, 0); btrfs_put_block_group(cache); - update_reserved_extents(root, start, len, 0); - - return ret; -} - -int btrfs_reserve_extent(struct btrfs_trans_handle *trans, - struct btrfs_root *root, - u64 num_bytes, u64 min_alloc_size, - u64 empty_size, u64 hint_byte, - u64 search_end, struct btrfs_key *ins, - u64 data) -{ - int ret; - ret = __btrfs_reserve_extent(trans, root, num_bytes, min_alloc_size, - empty_size, hint_byte, search_end, ins, - data); - if (!ret) - update_reserved_extents(root, ins->objectid, ins->offset, 1); return ret; } @@ -4222,15 +4299,46 @@ int btrfs_alloc_logged_file_extent(struct btrfs_trans_handle *trans, { int ret; struct btrfs_block_group_cache *block_group; + struct btrfs_caching_control *caching_ctl; + u64 start = ins->objectid; + u64 num_bytes = ins->offset; block_group = btrfs_lookup_block_group(root->fs_info, ins->objectid); cache_block_group(block_group); - wait_event(block_group->caching_q, - block_group_cache_done(block_group)); + caching_ctl = get_caching_control(block_group); - ret = btrfs_remove_free_space(block_group, ins->objectid, - ins->offset); - BUG_ON(ret); + if (!caching_ctl) { + BUG_ON(!block_group_cache_done(block_group)); + ret = btrfs_remove_free_space(block_group, start, num_bytes); + BUG_ON(ret); + } else { + mutex_lock(&caching_ctl->mutex); + + if (start >= caching_ctl->progress) { + ret = add_excluded_extent(root, start, num_bytes); + BUG_ON(ret); + } else if (start + num_bytes <= caching_ctl->progress) { + ret = btrfs_remove_free_space(block_group, + start, num_bytes); + BUG_ON(ret); + } else { + num_bytes = caching_ctl->progress - start; + ret = btrfs_remove_free_space(block_group, + start, num_bytes); + BUG_ON(ret); + + start = caching_ctl->progress; + num_bytes = ins->objectid + ins->offset - + caching_ctl->progress; + ret = add_excluded_extent(root, start, num_bytes); + BUG_ON(ret); + } + + mutex_unlock(&caching_ctl->mutex); + put_caching_control(caching_ctl); + } + + update_reserved_extents(block_group, ins->offset, 1); btrfs_put_block_group(block_group); ret = alloc_reserved_file_extent(trans, root, 0, root_objectid, 0, owner, offset, ins, 1); @@ -4254,9 +4362,9 @@ static int alloc_tree_block(struct btrfs_trans_handle *trans, int ret; u64 flags = 0; - ret = __btrfs_reserve_extent(trans, root, num_bytes, num_bytes, - empty_size, hint_byte, search_end, - ins, 0); + ret = btrfs_reserve_extent(trans, root, num_bytes, num_bytes, + empty_size, hint_byte, search_end, + ins, 0); if (ret) return ret; @@ -4267,7 +4375,6 @@ static int alloc_tree_block(struct btrfs_trans_handle *trans, } else BUG_ON(parent > 0); - update_reserved_extents(root, ins->objectid, ins->offset, 1); if (root_objectid != BTRFS_TREE_LOG_OBJECTID) { struct btrfs_delayed_extent_op *extent_op; extent_op = kmalloc(sizeof(*extent_op), GFP_NOFS); @@ -7164,8 +7271,18 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info) { struct btrfs_block_group_cache *block_group; struct btrfs_space_info *space_info; + struct btrfs_caching_control *caching_ctl; struct rb_node *n; + down_write(&info->extent_commit_sem); + while (!list_empty(&info->caching_block_groups)) { + caching_ctl = list_entry(info->caching_block_groups.next, + struct btrfs_caching_control, list); + list_del(&caching_ctl->list); + put_caching_control(caching_ctl); + } + up_write(&info->extent_commit_sem); + spin_lock(&info->block_group_cache_lock); while ((n = rb_last(&info->block_group_cache_tree)) != NULL) { block_group = rb_entry(n, struct btrfs_block_group_cache, @@ -7179,8 +7296,7 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info) up_write(&block_group->space_info->groups_sem); if (block_group->cached == BTRFS_CACHE_STARTED) - wait_event(block_group->caching_q, - block_group_cache_done(block_group)); + wait_block_group_cache_done(block_group); btrfs_remove_free_space_cache(block_group); @@ -7250,7 +7366,6 @@ int btrfs_read_block_groups(struct btrfs_root *root) spin_lock_init(&cache->lock); spin_lock_init(&cache->tree_lock); cache->fs_info = info; - init_waitqueue_head(&cache->caching_q); INIT_LIST_HEAD(&cache->list); INIT_LIST_HEAD(&cache->cluster_list); @@ -7272,8 +7387,6 @@ int btrfs_read_block_groups(struct btrfs_root *root) cache->flags = btrfs_block_group_flags(&cache->item); cache->sectorsize = root->sectorsize; - remove_sb_from_cache(root, cache); - /* * check for two cases, either we are full, and therefore * don''t need to bother with the caching work since we won''t @@ -7282,13 +7395,17 @@ int btrfs_read_block_groups(struct btrfs_root *root) * time, particularly in the full case. */ if (found_key.offset == btrfs_block_group_used(&cache->item)) { + cache->last_byte_to_unpin = (u64)-1; cache->cached = BTRFS_CACHE_FINISHED; } else if (btrfs_block_group_used(&cache->item) == 0) { + exclude_super_stripes(root, cache); + cache->last_byte_to_unpin = (u64)-1; cache->cached = BTRFS_CACHE_FINISHED; add_new_free_space(cache, root->fs_info, found_key.objectid, found_key.objectid + found_key.offset); + free_excluded_extents(root, cache); } ret = update_space_info(info, cache->flags, found_key.offset, @@ -7345,7 +7462,6 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans, atomic_set(&cache->count, 1); spin_lock_init(&cache->lock); spin_lock_init(&cache->tree_lock); - init_waitqueue_head(&cache->caching_q); INIT_LIST_HEAD(&cache->list); INIT_LIST_HEAD(&cache->cluster_list); @@ -7354,12 +7470,15 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans, cache->flags = type; btrfs_set_block_group_flags(&cache->item, type); + cache->last_byte_to_unpin = (u64)-1; cache->cached = BTRFS_CACHE_FINISHED; - remove_sb_from_cache(root, cache); + exclude_super_stripes(root, cache); add_new_free_space(cache, root->fs_info, chunk_offset, chunk_offset + size); + free_excluded_extents(root, cache); + ret = update_space_info(root->fs_info, cache->flags, size, bytes_used, &cache->space_info); BUG_ON(ret); @@ -7428,8 +7547,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, up_write(&block_group->space_info->groups_sem); if (block_group->cached == BTRFS_CACHE_STARTED) - wait_event(block_group->caching_q, - block_group_cache_done(block_group)); + wait_block_group_cache_done(block_group); btrfs_remove_free_space_cache(block_group); diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index cdbb502..6ed6186 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -874,7 +874,6 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans, unsigned long timeout = 1; struct btrfs_transaction *cur_trans; struct btrfs_transaction *prev_trans = NULL; - struct extent_io_tree *pinned_copy; DEFINE_WAIT(wait); int ret; int should_grow = 0; @@ -915,13 +914,6 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans, return 0; } - pinned_copy = kmalloc(sizeof(*pinned_copy), GFP_NOFS); - if (!pinned_copy) - return -ENOMEM; - - extent_io_tree_init(pinned_copy, - root->fs_info->btree_inode->i_mapping, GFP_NOFS); - trans->transaction->in_commit = 1; trans->transaction->blocked = 1; if (cur_trans->list.prev != &root->fs_info->trans_list) { @@ -1019,6 +1011,8 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans, ret = commit_cowonly_roots(trans, root); BUG_ON(ret); + btrfs_prepare_extent_commit(trans, root); + cur_trans = root->fs_info->running_transaction; spin_lock(&root->fs_info->new_trans_lock); root->fs_info->running_transaction = NULL; @@ -1042,8 +1036,6 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans, memcpy(&root->fs_info->super_for_commit, &root->fs_info->super_copy, sizeof(root->fs_info->super_copy)); - btrfs_copy_pinned(root, pinned_copy); - trans->transaction->blocked = 0; wake_up(&root->fs_info->transaction_wait); @@ -1059,8 +1051,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans, */ mutex_unlock(&root->fs_info->tree_log_mutex); - btrfs_finish_extent_commit(trans, root, pinned_copy); - kfree(pinned_copy); + btrfs_finish_extent_commit(trans, root); /* do the directory inserts of any pending snapshot creations */ finish_pending_snapshots(trans, root->fs_info); diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 8661a73..f4a7b62 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -263,8 +263,8 @@ static int process_one_buffer(struct btrfs_root *log, struct walk_control *wc, u64 gen) { if (wc->pin) - btrfs_update_pinned_extents(log->fs_info->extent_root, - eb->start, eb->len, 1); + btrfs_pin_extent(log->fs_info->extent_root, + eb->start, eb->len, 0); if (btrfs_buffer_uptodate(eb, gen)) { if (wc->write) -- 1.6.4.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 17, 2009 at 04:17:14PM -0400, Chris Mason wrote:> [ crashes on runs involving unmounts ] > > The run is still going here, but it has survived longer than before. > I''m trying with Yan Zheng''s patch: > > From: Yan Zheng <zheng.yan@oracle.com> > Date: Fri, 11 Sep 2009 16:11:19 -0400 > Subject: [PATCH] Btrfs: improve async block group cachingQuick update, I got through a full run of Steve''s test with this applied. I''ll start a few more ;) -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason wrote:> On Thu, Sep 17, 2009 at 04:17:14PM -0400, Chris Mason wrote: > >> [ crashes on runs involving unmounts ] >> >> The run is still going here, but it has survived longer than before. >> I''m trying with Yan Zheng''s patch: >> >> From: Yan Zheng <zheng.yan@oracle.com> >> Date: Fri, 11 Sep 2009 16:11:19 -0400 >> Subject: [PATCH] Btrfs: improve async block group caching >> > > Quick update, I got through a full run of Steve''s test with this > applied. I''ll start a few more ;) >Seems to work for me too! Got through all the random write tests with no problems. Will kick off full run overnight. Steve> -chris > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 17, 2009 at 05:04:11PM -0500, Steven Pratt wrote:> Chris Mason wrote: > >On Thu, Sep 17, 2009 at 04:17:14PM -0400, Chris Mason wrote: > >>[ crashes on runs involving unmounts ] > >> > >>The run is still going here, but it has survived longer than before. > >>I''m trying with Yan Zheng''s patch: > >> > >>From: Yan Zheng <zheng.yan@oracle.com> > >>Date: Fri, 11 Sep 2009 16:11:19 -0400 > >>Subject: [PATCH] Btrfs: improve async block group caching > > > >Quick update, I got through a full run of Steve''s test with this > >applied. I''ll start a few more ;) > Seems to work for me too! Got through all the random write tests > with no problems. Will kick off full run overnight.Thanks again. I''ve updated the btrfs-unstable tree to include Yan Zheng''s fix. I''ve also included two more buffered writeback speedups, and I expect these to make a difference on the multi-threaded tests. I took a look at 1MB O_DIRECT writes, and the latencies of sending off checksumming to the checksum threads seem to be the biggest problem. I get full tput at 8MB O_DIRECT writes, so for now I''m going to leave this one alone. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason wrote:> On Thu, Sep 17, 2009 at 05:04:11PM -0500, Steven Pratt wrote: > >> Chris Mason wrote: >> >>> On Thu, Sep 17, 2009 at 04:17:14PM -0400, Chris Mason wrote: >>> >>>> [ crashes on runs involving unmounts ] >>>> >>>> The run is still going here, but it has survived longer than before. >>>> I''m trying with Yan Zheng''s patch: >>>> >>>> From: Yan Zheng <zheng.yan@oracle.com> >>>> Date: Fri, 11 Sep 2009 16:11:19 -0400 >>>> Subject: [PATCH] Btrfs: improve async block group caching >>>> >>> Quick update, I got through a full run of Steve''s test with this >>> applied. I''ll start a few more ;) >>> >> Seems to work for me too! Got through all the random write tests >> with no problems. Will kick off full run overnight. >> > > Thanks again. > > I''ve updated the btrfs-unstable tree to include Yan Zheng''s fix. I''ve > also included two more buffered writeback speedups, and I expect these > to make a difference on the multi-threaded tests. > > I took a look at 1MB O_DIRECT writes, and the latencies of sending off > checksumming to the checksum threads seem to be the biggest problem. I > get full tput at 8MB O_DIRECT writes, so for now I''m going to leave this > one alone. >Updated performance results are available. Ran both the 9/16 tree + async patches and the tree from 9/18. Results are a mixed bag, some faster some slower. http://btrfs.boxacle.net/repository/raid/history/History.html Steve> -chris > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html